SAP has reported a problem with ClassLoaders not getting
collected properly, leading to OOM exceptions due to
perm space being flooded by class definitions.
This is for SAP's NetWeaver 04 product, which is at the heart
of SAP's software strategy.
SAP uses complex class loading schemes with their own custom
ClassLoader implementations. This allows them for hot redeployment of
applications into their J2EE application server, and this is exactly
where the problem occurs: unloading and reloading of an application
establishes a new class loader instance, and the old one should be
GC'able. This does not happen under all circumstances, leading to the problem
described.
SAP has come up with a reproducable test scenario for which we have collected
dumps on Windows (see below). A stand-alone test case is not available since
this only occurs with the full NetWeaver stack in place. A test landscape can be
provided, however, should the need arise. Hopefully, the dumps already contain
useful information. Analysis of the VM at the point of OOM error using both OptimizeIt
and SAP's own JVMDI/JVMPI tool Sherlok indicated that the class loader is GC'able,
and still it does not happen.
Two dump files have been collected and are available
from my server at:
/net/tachyon.germany.sun.com/data/Tmp/CL_Problem_SAP
(or via ftp as guest/guest)
A reproducable test scenario has been identified where three applications
are unloaded and redeployed repeatedly. This is the sequence of events used to
create the dumps for this scenario:
1. fire up NetWeaver system
2. unload applications first time (there is three apps in question here)
3. get dump_01
4. start applications again (caused the CLs to be recreated)
5. unload applications second time
6. get dump_02
7. start applications again --> OOM
So technically dump_02 should be the interesting one.
To use SA on these:
If you check the Heap Profile, locate the
com.sap.engine.services.deploy.server.ApplicationLoader
instances (size 3360, count 35). This is the suspect.
in the instance list for this class, scroll all the way down
to
1. 0x16966378 sap.com/pcui_gp~xssfpm
2. 0x16882ea8 sap.com/ess~ben
3. 0x1696ae18 sap.com/pcui_gp~xssutils
(the sap.com stuff are the names that you can see in the "name"
field of the inspect window).
These are the CLs for the three apps (FPM, Utils, Benefits), and at least
1. and 2. should be GCable at this stage. When doing a liveness analysis,
I get problems with SA which - according to Ken Russell - should be fairly
straightforward to fix in SA (when you know what you're doing :-)
###@###.### 2004-04-19