United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
Bug ID: JDK-6942989 Memory leak of java.lang.ref.WeakReference objects
JDK-6942989 : Memory leak of java.lang.ref.WeakReference objects

Details
Type:
Bug
Submit Date:
2010-04-12
Status:
Closed
Updated Date:
2011-09-08
Project Name:
JDK
Resolved Date:
2011-04-19
Component:
core-libs
OS:
linux,generic,solaris_10,windows_xp
Sub-Component:
java.util.logging
CPU:
x86,generic
Priority:
P2
Resolution:
Fixed
Affected Versions:
1.4.2_27,5.0u25,6,6u18,6u20,6u21,6u23,7
Fixed Versions:

Related Reports
Backport:
Backport:
Backport:
Backport:
Backport:
Duplicate:
Duplicate:
Duplicate:
Relates:
Relates:
Relates:
Relates:
Relates:

Sub Tasks

Description
The following simple code reproduces the growth of java.lang.ref.WeakReference objects in the heap:

public static void main(String[] args) throws Exception {

while (true) {
java.util.logging.Logger.getAnonymousLogger();
Thread.sleep(1);
}
}

Here is the output of jmap command within a few seconds interval:

user@t1007:~> jmap -d64 -histo:live 29201|grep WeakReference
8: 22493 1079664 java.lang.ref.WeakReference
31: 1 32144 [Ljava.lang.ref.WeakReference;
106: 17 952

com.sun.jmx.mbeanserver.WeakIdentityHashMap$IdentityWeakReference
user@t1007:~> jmap -d64 -histo:live 29201|grep WeakReference
8: 23191 1113168 java.lang.ref.WeakReference
31: 1 32144 [Ljava.lang.ref.WeakReference;
103: 17 952

com.sun.jmx.mbeanserver.WeakIdentityHashMap$IdentityWeakReference
user@t1007:~> jmap -d64 -histo:live 29201|grep WeakReference
8: 23804 1142592 java.lang.ref.WeakReference
31: 1 32144 [Ljava.lang.ref.WeakReference;
103: 17 952 com.sun.jmx.mbeanserver.WeakIdentityHashMap$IdentityWeakReference

Note that jmap command forces FullGC.

More on http://forums.java.net/jive/thread.jspa?threadID=75651&tstart=0
The forum thread has an additional posting that clarifies the APIs
used by the real application:


Just more information about the issue.
The code above is just an artificial piece of code which easily reproduces the problem.
In the live system the allocation call tree looks like:
...
com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(ObjectName name, String attribute)
...
com.sun.jmx.trace.TraceManager.getLogger(int type)
java.util.logging.Logger.getLogger(String name)

In 6 days the number of WeakReference instances grows up to 240 000 (11 Mb). So the increase is 2 Mb per day.

The issue is reproduced on Linux and Solaris.
Linux version:
Red Hat Enterprise Linux Server release 5.3 (Tikanga)
Linux rhone 2.6.18-128.1.14.el5 #1 SMP Mon Jun 1 15:52:58 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux

Solaris version:
Solaris 10 11/06 s10s_u3wos_10 SPARC
Copyright 2006 Sun Microsystems, Inc. All Rights Reserved.
Use is subject to license terms.
Assembled 14 November 2006

                                    

Comments
EVALUATION

A Logger object contains an ArrayList of WeakReferences to any
Logger objects that have the first Logger object as a parent,
i.e., a Logger object keeps track of its kids (if any). The root
Logger object has the "honor" of keeping track of all anonymous
Logger objects.

The private method Logger.doSetParent() does the work of associating
a child Logger object with its parent Logger. I don't see any code
to cleanup stale WeakReferences from the list of kids. The Logger
objects referenced by the WeakReferences will get GC'ed as needed,
but it looks like the ArrayList of WeakReferences will just keep
growing.
                                     
2010-04-14
EVALUATION

I think there also might be a leak in the WeakReferences held
by the LogManager.LogNode class, but I'll have to mull on that
one a bit more.

Update: I think the LogManager.LogNode class is fine.

Update: Based on my analysis of the new LoggerWeakRefLeak test
results, I have to change my mind again and say there is indeed
a WeakReferences leak in the LogNode class. Sigh.
                                     
2010-04-15
SUGGESTED FIX

Modify private method Logger.doSetParent():

- During the "remove self" search of the soon-to-be old
  parent hierarchy, remove any stale weak refs that are found.

  This portion of the fix will only remove stale weak refs that
  are found before we find ourself in the soon-to-be old parent
  hierarchy. This is an optimization that does some clean up
  since we have to walk that part of the soon-to-be old parent
  hierarchy anyway.

- After updating the effective level values, check the entire
  new parent hierarchy for stale weak refs and remove any that
  are found.

  This portion of the fix is the complete solution for the new
  parent and is executed anytime a new Logger object is created
  or anytime an existing Logger object is reparented.

Modify private method Logger.updateEffectiveLevel() to:

- During any level updates, remove any stale weak refs that
  are found.

  This portion of the fix will only remove stale weak refs that
  are found in Logger hierarchies that need to have the level
  value updated. This portion of the fix will come into play
  when a Logger hierarchy is reparented or when a Logger
  hierarchy has it's level explicitly changed.

So the proposed fix boils down to two parts:

- An explicit clean up pass of stale weak refs in the new parent 
  hierarchy during new Logger creation or reparenting of an
  existing Logger.
- Opportunistic clean ups of stale weak refs in other code paths
  where we have to walk the Logger hierarchy anyway.
                                     
2010-06-09
EVALUATION

The Logging API currently allows loops to be created in the Logger
hierarchy via the Logger.setParent() API. In particular, a new
Logger can be made the "parent" of the Root Logger. When the new
Logger is created, it is already in the Root Logger's hierarchy
somewhere. By becoming the parent of the Root Logger, there is
now a loop back to the Root Logger from the new Logger via its
kids list. This loop complicates the cleanup algorithm for stale
weak references.

Another issue: when the new Logger is set as the parent of the current
Root Logger, the new Logger does not become the new logical Root Logger.
The Logging subsystem still returns the original Root Logger via a
Logger.getLogger("") call.

And another issue: Logger.getParent() is defined to return NULL when
called on the Root Logger. After a parent is set on the Root Logger,
then RootLogger.getParent() will no longer return NULL. Seems like a
spec violation to me.

The looping issue and the sematics of setting a parent on the Root
Logger will have to be addressed in separate bugs. There may already
be open bugs for these issues; I'll have to check.
                                     
2010-06-09
SUGGESTED FIX

This portion of the new algorithm needs to be done carefully:

> - After updating the effective level values, check the entire
>   new parent hierarchy for stale weak refs and remove any that
>  are found.

because the Logging API currently allows loops to be defined in
the Logger hierarchy.
                                     
2010-06-09
SUGGESTED FIX

The LogManager.LogNode class contains a WeakReference to a
Logger object and needs similar changes.
                                     
2010-06-10
SUGGESTED FIX

The proposed fix for JDK7 is attached as 6942989-webrev-cr0.tgz.
                                     
2010-06-10
SUGGESTED FIX

A revised version of the proposed fix for JDK7 is attached as
6942989-webrev-cr1.tgz.
                                     
2010-06-18
EVALUATION

http://hg.openjdk.java.net/jdk7/tl/jdk/rev/4e76be6e9fe1
                                     
2010-06-23
EVALUATION

http://hg.openjdk.java.net/jdk6/jdk6/jdk/rev/2b1a7d4b9ac6
                                     
2010-06-26
PUBLIC COMMENTS

I ran the AnonLoggerWeakRefLeak test with the following env variables:

TESTJAVA=c:/work/shared/mirrors/jdks-win32/jdk/1.7.0/latest/binaries/windows-i586
TESTCLASSES=C:/cygwin/home/dcubed/Projects/6942989/XXX
TESTVMOPTS=-showversion -client -Xmx32m
TESTSRC=C:/cygwin/home/dcubed/Projects/6942989/XXX

along with a 30 minute duration (1800 seconds) using JDK7-B99 bits
(which does NOT have the fix).

Here's the last 15 lines of the log files:

$ tail -15 AnonLogger.jdk7-B99_30min.log
INFO: instance_cnt = 1032505
INFO: instance_cnt = 1038305
INFO: instance_cnt = 1044005
INFO: instance_cnt = 1049705
INFO: jmap exited with exit code = 1
INFO: The likely reason is that AnonLoggerWeakRefLeak has finished running.
INFO: increasing_cnt = 181
INFO: decreasing_cnt = 0
INFO: The instance count of java.lang.ref.WeakReference objects
INFO: is always increasing.
FAIL: This indicates that there is a memory leak.

real    18m10.296s
user    0m41.570s
sys     0m46.442s

$ tail -15 AnonLoggerWeakRefLeak.jdk7-B99_30min.log
INFO: call count = 1044000
INFO: call count = 1045000
INFO: call count = 1046000
INFO: call count = 1047000
INFO: call count = 1048000
INFO: call count = 1049000
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOf(Arrays.java:2239)
        at java.util.Arrays.copyOf(Arrays.java:2213)
        at java.util.ArrayList.grow(ArrayList.java:208)
        at java.util.ArrayList.ensureCapacity(ArrayList.java:182)
        at java.util.ArrayList.add(ArrayList.java:406)
        at java.util.logging.Logger.doSetParent(Logger.java:1401)
        at java.util.logging.Logger.getAnonymousLogger(Logger.java:374)
        at AnonLoggerWeakRefLeak.main(AnonLoggerWeakRefLeak.java:60)


So it took a little over 18 minutes for a Client VM with a 32MB
max heap size to throw an OOME with AnonLoggerWeakRefLeak.


I ran the LoggerWeakRefLeak test with the same environment variables
along with a 30 minute duration (1800 seconds) using JDK7-B99 bits
(which does NOT have the fix).

Here's the last 15 lines of the log files:

$ tail -15 Logger.jdk7-B99_30min.log
INFO: instance_cnt = 935705
INFO: instance_cnt = 938605
INFO: instance_cnt = 941205
INFO: instance_cnt = 943205
INFO: jmap exited with exit code = 1
INFO: The likely reason is that LoggerWeakRefLeak has finished running.
INFO: increasing_cnt = 160
INFO: decreasing_cnt = 0
INFO: The instance count of java.lang.ref.WeakReference objects
INFO: is always increasing.
FAIL: This indicates that there is a memory leak.

real    16m47.690s
user    0m37.910s
sys     0m40.002s

$ tail -15 LoggerWeakRefLeak.jdk7-B99_30min.log
INFO: call count = 881000
INFO: call count = 882000
INFO: call count = 883000
INFO: call count = 884000
INFO: call count = 885000
INFO: call count = 886000
INFO: call count = 887000
INFO: call count = 888000
INFO: call count = 889000
INFO: call count = 890000
INFO: call count = 891000
INFO: call count = 892000
INFO: call count = 893000
INFO: call count = 894000
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space


So it took a little less than 17 minutes for a Client VM with a 32MB
max heap size to throw an OOME with LoggerWeakRefLeak.
                                     
2010-06-28
PUBLIC COMMENTS

I ran the AnonLoggerWeakRefLeak test with the same environment
variables for a 60 minute duration using the fixed bits from
JPRT. Here are the last 15 lines of each log (much less
interesting info):

$ tail -15 AnonLogger.jdk7-B100+_60min.log
INFO: instance_cnt = 5801
INFO: instance_cnt = 5901
INFO: instance_cnt = 5701
INFO: instance_cnt = 5601
INFO: jmap exited with exit code = 1
INFO: The likely reason is that AnonLoggerWeakRefLeak has finished running.
INFO: increasing_cnt = 227
INFO: decreasing_cnt = 383
INFO: The instance count of java.lang.ref.WeakReference objects
INFO: is both increasing and decreasing.
PASS: This indicates that there is not a memory leak.

real    60m5.099s
user    2m18.882s
sys     2m31.127s

$ tail -15 AnonLoggerWeakRefLeak.jdk7-B100+_60min.log
INFO: call count = 3484000
INFO: call count = 3485000
INFO: call count = 3486000
INFO: call count = 3487000
INFO: call count = 3488000
INFO: call count = 3489000
INFO: call count = 3490000
INFO: call count = 3491000
INFO: call count = 3492000
INFO: call count = 3493000
INFO: call count = 3494000
INFO: call count = 3495000
INFO: call count = 3496000
INFO: call count = 3497000
INFO: final loop count = 3497200


I ran the LoggerWeakRefLeak test with the same environment
variables for a 60 minute duration using the fixed bits from
JPRT. Here are the last 15 lines of each log (much less
interesting info):

$ tail -15 Logger.jdk7-B100+_60min.log
INFO: instance_cnt = 1813
INFO: instance_cnt = 1638
INFO: instance_cnt = 1514
INFO: instance_cnt = 1511
INFO: jmap exited with exit code = 1
INFO: The likely reason is that LoggerWeakRefLeak has finished running.
INFO: increasing_cnt = 293
INFO: decreasing_cnt = 318
INFO: The instance count of java.lang.ref.WeakReference objects
INFO: is both increasing and decreasing.
PASS: This indicates that there is not a memory leak.

real    60m6.783s
user    2m20.592s
sys     2m31.997s

$ tail -15 LoggerWeakRefLeak.jdk7-B100+_60min.log
INFO: call count = 3502000
INFO: call count = 3503000
INFO: call count = 3504000
INFO: call count = 3505000
INFO: call count = 3506000
INFO: call count = 3507000
INFO: call count = 3508000
INFO: call count = 3509000
INFO: call count = 3510000
INFO: call count = 3511000
INFO: call count = 3512000
INFO: call count = 3513000
INFO: call count = 3514000
INFO: call count = 3515000
INFO: final loop count = 3515800
                                     
2010-06-29
PUBLIC COMMENTS

I ran the AnonLoggerWeakRefLeak test with the same environment
variables for a 12 hour duration using the fixed bits from JPRT.
For this run I saved the "Total" line from the jmap output from
every 10th sample:

$ diff ../AnonLoggerWeakRefLeak.sh AnonLoggerWeakRefLeak.sh
161a162
> > "$TEST_NAME.totals"
225a227,233
> 
> set +e
> mod=`expr "$loop_cnt" % 10`
> set -e
> if [ "$mod" = 0 ]; then
> tail -1 "$TEST_NAME.jmap" >> "$TEST_NAME.totals"
> fi

Here is an analysis of the .totals data:

$ sh analyze_totals < AnonLoggerWeakRefLeak.totals
          #objs      #bytes
first:    30537     2243648
lo:       30537     2243648
hi:       57072     3197904
last:     35676     2882488
avg:      36853     2929982
# samples: 647

The first sample is also the lowest set of values which isn't
a surprise given that the first sample is taken shortly after
the target program has started running. The hi value occurred
in sample #269 of 647 and the last sample was below the average.
This data indicates that the values are both rising and falling
over time which does not indicate any more memory leaks.



I also ran the LoggerWeakRefLeak test with the same environment
variables for a 12 hour duration using the fixed bits from JPRT.
For this run I saved the "Total" line from the jmap output from
every 10th sample:

$ diff ../LoggerWeakRefLeak.sh LoggerWeakRefLeak.sh
161a162
> > "$TEST_NAME.totals"
225a227,233
> 
> set +e
> mod=`expr "$loop_cnt" % 10`
> set -e
> if [ "$mod" = 0 ]; then
> tail -1 "$TEST_NAME.jmap" >> "$TEST_NAME.totals"
> fi

Here is an analysis of the .totals data:

$ sh analyze_totals < LoggerWeakRefLeak.totals
          #objs      #bytes
first:    48957     2832648
lo:       48957     2832648
hi:      148299     6299016
last:    133173     5451032
avg:     137578     5608654
# samples: 647

The first sample is also the lowest set of values which isn't
a surprise given that the first sample is taken shortly after
the target program has started running. The hi value occurred
in sample #76 of 647 and the last sample was below the average.
This data indicates that the values are both rising and falling
over time which does not indicate any more memory leaks.
                                     
2010-06-29



Hardware and Software, Engineered to Work Together