JDK-6545719 : Regression : Infinite GC occurs after fix in CR 6370163
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 5.0u11
  • Priority: P2
  • Status: Resolved
  • Resolution: Fixed
  • OS: windows_xp
  • CPU: x86
  • Submitted: 2007-04-13
  • Updated: 2010-12-03
  • Resolved: 2007-05-24
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other JDK 6 JDK 7 Other
5.0u14,hs10Fixed 6u4Fixed 7Fixed hs10Fixed
When VM uses PermHeap fully, GC interates infinite Full GC.
This come to occur in 5.0u10,  it does not in 5.0u8.

 WinXP(sp2, japanese)
 MEM : 512MB
 JRE/JDK : 5.0u10/5.0u11

1) compile the attached java program
2) Invoke he following command
   N:\gc-regression>java -Xmx512m -verbose:gc PermLeak
   You will see the following message shows up infinitely.
[Full GC 69765K->69765K(195512K), 0.0245129 secs]
[Full GC 69765K->69765K(195512K), 0.0244146 secs]
[Full GC 69765K->69765K(195512K), 0.0242774 secs]
[Full GC 69765K->69765K(195512K), 0.0272797 secs]
[Full GC 69765K->69765K(195512K), 0.0240545 secs]
[Full GC 69765K->69765K(195512K), 0.0281977 secs]
[Full GC 69765K->69765K(195512K), 0.0243053 secs]
[Full GC 69765K->69765K(195512K), 0.0240855 secs]
[Full GC 69765K->69765K(195512K), 0.0240751 secs]
[Full GC 69765K->69765K(195512K), 0.0243808 secs]
[Full GC 69765K->69765K(195512K), 0.0439710 secs]
[Full GC 69765K->69765K(195512K), 0.0432144 secs]
[Full GC 69765K->69765K(195512K), 0.0248107 secs]
[Full GC 69765K->69765K(195512K), 0.0301530 secs]
[Full GC 69765K->69765K(195512K), 0.0241363 secs]
[Full GC 69765K->69765K(195512K), 0.0245216 secs]
[Full GC 69765K->69765K(195512K), 0.0240100 secs]
[Full GC 69765K->69765K(195512K), 0.0247816 secs]
[Full GC 69765K->69765K(195512K), 0.0243092 secs]
[Full GC 69765K->69765K(195512K), 0.0244741 secs]
[Full GC 69765K->69765K(195512K), 0.0261402 secs]
[Full GC 69765K->69765K(195512K), 0.0251121 secs]
[Full GC 69765K->69765K(195512K), 0.0241321 secs]
[Full GC 69765K->69765K(195512K), 0.0242020 secs]
[Full GC 69765K->69765K(195512K), 0.0280566 secs]

SUGGESTED FIX Event: putback-to Parent workspace: /net/jano.sfbay/export/disk05/hotspot/ws/main/gc_baseline (jano.sfbay:/export/disk05/hotspot/ws/main/gc_baseline) Child workspace: /net/prt-web.sfbay/prt-workspaces/20070420220432.ysr.oom/workspace (prt-web:/net/prt-web.sfbay/prt-workspaces/20070420220432.ysr.oom/workspace) User: ysr Comment: --------------------------------------------------------- Job ID: 20070420220432.ysr.oom Original workspace: spot:/scratch/ysr/oom Submitter: ysr Archived data: /net/prt-archiver.sfbay/data/archived_workspaces/main/gc_baseline/2007/20070420220432.ysr.oom/ Webrev: http://prt-web.sfbay.sun.com/net/prt-archiver.sfbay/data/archived_workspaces/main/gc_baseline/2007/20070420220432.ysr.oom/workspace/webrevs/webrev-2007.04.21/index.html Fixed 6545719: Regression : Infinite GC occurs after fix in CR 6370163 webrev: http://analemma.sfbay/net/spot/scratch/ysr/oom/webrev The genesis of this bug actually predates 6370163 although the particular manifestation was indeed triggered by a change specifically in 6370163. More specifically, the logic in the permgen allocation retry loops was unaware of the fact that collections might end up resizing (shrinking) the heap so that a subsequent allocation attempt might fail even if sufficient space existed to accomodate the request; moreover, for certain sufficiently large allocation requests, the logic might lead to livelock behaviour through a cycle of fruitless collections, and heap resizing which is how Hitachi stumbled upon this bug. In the presence of a multi-threaded application, we may be able to eventually get out of the lovelock but the performance toll of the back-to-back collections would be huge. Note that this problem was specific to the so-called "framework" collectors only, not to the "interface" style collector(s). To fix the problem, the allocation retry code has been simplified, some unnecessary tests and an unnecessary loop removed. The crucial fix though hinges on the awareness of the new code of the fact that collections can resize and, in particular, shrink the heap, that expand_and_allocate() does expand the heap to the extent necessary to accomodate the allocation request, and that such allocation, collection and expansion are all protected by the Heap_lock (so that, for instance, another thread _cannot_ intefere with these attempts). Of course, that assumption will change with the changes slated in CR 6539517, and the loop body will become a tad more complicated on that account. Moreover, the loop that has disappeared in this delta will, like the Cheshire Cat, reappear, albeit in perhaps a slightly modified form, with CR 6539517 in the very near future. We are, however, separating the two putbacks in the interests of easing possible backport(s) as well as to reduce possible confusion of what are really, in a certain sense, orthogonal bugs. Reviewed by: Jon Masamitsu, Kevin Walls Fix verified: yes Verification Testing: PermLeak test from Hitachi (Kevin has also backported this to a 5uXX workspace and tested the fix) Other testing: runThese -quick w/serial and CMS. PRT refworkload Files: update: src/share/vm/memory/permGen.cpp Examined files: 3955 Contents Summary: 1 update 3954 no action (unchanged)

EVALUATION The fix has been integrated into gc_baseline en route to 7.0b14?(we think). It's available for backport to older releases, viz. 6ux and 5uxx. (As far as we can tell this regression did not make it back into 1.4.2_XX, this should be verified).

SUGGESTED FIX Here's a set of changes that address all the issues raised in this bug report: http://analemma.sfbay/net/spot/scratch/ysr/oom/webrev However, the changes related to CR 6539517 would make the loops a bit more complicated. So a further rethinking of design may be in order; that may be done as part of CR 6539517.

WORK AROUND Fixing the perm gen size -XX:PermSize=<n> -XX:MaxPermSize=<n> will prevent the resizing of the perm gen upon a collection and will workaround the bug in the allocation retry loop described in the Evaluation section.

EVALUATION In the fix for 6370163 we did not consider a possible malinteraction with generation resizing code which, under the right circumstances, causes the current logic in the allocation retry loop of PermGen::mem_allocate() to not do the purported "last ditch collection" and bail out, but instead oscillate between collection+resize/shrink and a fruitless expansion. See comments section for details. A fix is in progress and will apply to all of 5uXX, 6uXX and 7.0.

WORK AROUND tenured generation/compacting perm gen has the problem. PSOldGen/PSPermGen avoids problem (exits with OOM PermGen space), so a workaround is: -XX:+UseParallelGC Alternatively: -XX:PermSize=64M is a workaround, although -XX:PermSize=63M or -XX:PermSize=65M are NOT. (!!!)