JDK-5033614 : ClassLoaders do not get released by GC, causing OutOfMemory in Perm Space
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 1.4.2_04
  • Priority: P2
  • Status: Resolved
  • Resolution: Fixed
  • OS: windows_2000
  • CPU: x86
  • Submitted: 2004-04-19
  • Updated: 2004-10-25
  • Resolved: 2004-10-25
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other JDK 6
1.4.2_07Fixed 6 b10Fixed
Related Reports
Relates :  
Description
SAP has reported a problem with ClassLoaders not getting 
collected properly, leading to OOM exceptions due to 
perm space being flooded by class definitions. 

This is for SAP's NetWeaver 04 product, which is at the heart
of SAP's software strategy. 

SAP uses complex class loading schemes with their own custom
ClassLoader implementations. This allows them for hot redeployment of 
applications into their J2EE application server, and this is exactly 
where the problem occurs: unloading and reloading of an application
establishes a new class loader instance, and the old one should be
GC'able. This does not happen under all circumstances, leading to the problem
described. 

SAP has come up with a reproducable test scenario for which we have collected
dumps on Windows (see below). A stand-alone test case is not available since
this only occurs with the full NetWeaver stack in place. A test landscape can be 
provided, however, should the need arise. Hopefully, the dumps already contain
useful information. Analysis of the VM at the point of OOM error using both OptimizeIt
and SAP's own JVMDI/JVMPI tool Sherlok indicated that the class loader is GC'able,
and still it does not happen. 

Two dump files have been collected and are available
from my server at:

/net/tachyon.germany.sun.com/data/Tmp/CL_Problem_SAP

(or via ftp as guest/guest)

A reproducable test scenario has been identified where three applications
are unloaded and redeployed repeatedly. This is the sequence of events used to 
create the dumps for this scenario:

1. fire up NetWeaver system
2. unload applications first time (there is three apps in question here)
3. get dump_01
4. start applications again (caused the CLs to be recreated)
5. unload applications second time
6. get dump_02
7. start applications again --> OOM

So technically dump_02 should be the interesting one.

To use SA on these: 

If you check the Heap Profile, locate the
com.sap.engine.services.deploy.server.ApplicationLoader
instances (size 3360, count 35). This is the suspect.
in the instance list for this class, scroll all the way down
to

1. 0x16966378 sap.com/pcui_gp~xssfpm
2. 0x16882ea8 sap.com/ess~ben
3. 0x1696ae18 sap.com/pcui_gp~xssutils

(the sap.com stuff are the names that you can see in the "name"
field of the inspect window).

These are the CLs for the three apps (FPM, Utils, Benefits), and at least
1. and 2. should be GCable at this stage. When doing a liveness analysis,
I get problems with SA which - according to Ken Russell - should be fairly
straightforward to fix in SA (when you know what you're doing :-)

###@###.### 2004-04-19

Comments
SUGGESTED FIX See: /net/prt-archiver.sfbay/data/archived_workspaces/main/c2_baseline/2004/20041008165448.rasbold.c2_baseline/workspace/webrevs/webrev-2004.10.08/index.html for the PRT webrev of the _exception_oop fix. ###@###.### 10/11/04 15:37 GMT
11-10-2004

WORK AROUND None found so far. SAP tried different GC algorithms in 1.4.2, but to no avail. Increasing perm space only increases the time for the problem to occur, it does not solve it reliably. Most of the behavior is similar to 4957990. If it is the same a 4957990, then the workaround is to use a larger perm gen. Verification of this is still on going. An alternative to a larger heap is to use c1.
26-08-2004

PUBLIC COMMENTS no comment
26-08-2004

EVALUATION This looks like the same problem as 4957990. ###@###.### 2004-08-25 Using the flag -XX:-StackTraceInThrowable prevents class loaders from being kept alive longer than necessary. Still investigating why this is the case. ------------------------- The C2 runtime uses a field in a thread object, "_exception_oop", to pass an oop between setup_exception_blob() and handle_exception_C(). The oop, which is always a Throwable, usually has a backtrace. In the customer's case, the backtrace referenced a methodOop that was loaded by a classLoader that was soon to be dead. The _exception_oop field, never overwritten, was the only root with a path to the classLoader, keeping it from being unloaded. ###@###.### 10/7/04 23:20 GMT
07-10-0004