United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
JDK-5033614 : ClassLoaders do not get released by GC, causing OutOfMemory in Perm Space

Details
Type:
Bug
Submit Date:
2004-04-19
Status:
Resolved
Updated Date:
2004-10-25
Project Name:
JDK
Resolved Date:
2004-10-25
Component:
hotspot
OS:
windows_2000
Sub-Component:
compiler
CPU:
x86
Priority:
P2
Resolution:
Fixed
Affected Versions:
1.4.2_04
Fixed Versions:

Related Reports
Backport:
Backport:
Relates:

Sub Tasks

Description
SAP has reported a problem with ClassLoaders not getting 
collected properly, leading to OOM exceptions due to 
perm space being flooded by class definitions. 

This is for SAP's NetWeaver 04 product, which is at the heart
of SAP's software strategy. 

SAP uses complex class loading schemes with their own custom
ClassLoader implementations. This allows them for hot redeployment of 
applications into their J2EE application server, and this is exactly 
where the problem occurs: unloading and reloading of an application
establishes a new class loader instance, and the old one should be
GC'able. This does not happen under all circumstances, leading to the problem
described. 

SAP has come up with a reproducable test scenario for which we have collected
dumps on Windows (see below). A stand-alone test case is not available since
this only occurs with the full NetWeaver stack in place. A test landscape can be 
provided, however, should the need arise. Hopefully, the dumps already contain
useful information. Analysis of the VM at the point of OOM error using both OptimizeIt
and SAP's own JVMDI/JVMPI tool Sherlok indicated that the class loader is GC'able,
and still it does not happen. 

Two dump files have been collected and are available
from my server at:

/net/tachyon.germany.sun.com/data/Tmp/CL_Problem_SAP

(or via ftp as guest/guest)

A reproducable test scenario has been identified where three applications
are unloaded and redeployed repeatedly. This is the sequence of events used to 
create the dumps for this scenario:

1. fire up NetWeaver system
2. unload applications first time (there is three apps in question here)
3. get dump_01
4. start applications again (caused the CLs to be recreated)
5. unload applications second time
6. get dump_02
7. start applications again --> OOM

So technically dump_02 should be the interesting one.

To use SA on these: 

If you check the Heap Profile, locate the
com.sap.engine.services.deploy.server.ApplicationLoader
instances (size 3360, count 35). This is the suspect.
in the instance list for this class, scroll all the way down
to

1. 0x16966378 sap.com/pcui_gp~xssfpm
2. 0x16882ea8 sap.com/ess~ben
3. 0x1696ae18 sap.com/pcui_gp~xssutils

(the sap.com stuff are the names that you can see in the "name"
field of the inspect window).

These are the CLs for the three apps (FPM, Utils, Benefits), and at least
1. and 2. should be GCable at this stage. When doing a liveness analysis,
I get problems with SA which - according to Ken Russell - should be fairly
straightforward to fix in SA (when you know what you're doing :-)

###@###.### 2004-04-19

                                    

Comments
SUGGESTED FIX

See:
/net/prt-archiver.sfbay/data/archived_workspaces/main/c2_baseline/2004/20041008165448.rasbold.c2_baseline/workspace/webrevs/webrev-2004.10.08/index.html
for the PRT webrev of the _exception_oop fix.
###@###.### 10/11/04 15:37 GMT
                                     
2004-10-11
PUBLIC COMMENTS

no comment
                                     
2004-08-26
WORK AROUND

None found so far. SAP tried different GC algorithms in 1.4.2, but to no avail.
Increasing perm space only increases the time for the problem to occur, it does 
not solve it reliably. 

Most of the behavior is similar to 4957990.  If it is the same a 4957990,
then the workaround is to use a larger perm gen.  Verification of this
is still on going.

An alternative to a larger heap is to use c1.
                                     
2004-08-26
EVALUATION

This looks like the same problem as 4957990.

###@###.### 2004-08-25

Using the flag -XX:-StackTraceInThrowable prevents class loaders from
being kept alive longer than necessary.  Still investigating
why this is the case.

-------------------------

The C2 runtime uses a field in a thread object, "_exception_oop", to pass an oop between setup_exception_blob() and handle_exception_C().  The oop, which is always a Throwable, usually has a backtrace.  In the customer's case, the backtrace referenced a methodOop that was loaded by a classLoader that was soon to be dead. The _exception_oop field, never overwritten, was the only root
with a path to the classLoader, keeping it from being unloaded.
###@###.### 10/7/04 23:20 GMT
                                     
4-10-07 00:00:00



Hardware and Software, Engineered to Work Together