JDK-5037027 : CMS: precleaning causes crash if perm gen collection enabled
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 1.4.2_05,5.0
  • Priority: P1
  • Status: Resolved
  • Resolution: Fixed
  • OS: generic
  • CPU: generic
  • Submitted: 2004-04-23
  • Updated: 2004-10-13
  • Resolved: 2004-07-28
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other JDK 6
1.4.2_07 b01Fixed 6Fixed
Description
The test case with bug 4957990 has revealed an interesting
crash that we are trying to understand. The crash happens
with -Xint, with the CMS collector, if we run with perm
gen collection and old gen precleaning enabled. The
crash goes away if either of these is disabled. Further
investigation is in progress. This may or may not be related
to long-standing bug 4756801. The comments section will
be updated as investigation proceeds. Priority may get
readjusted as we learn more about the problem.

Comments
CONVERTED DATA BugTraq+ Release Management Values COMMIT TO FIX: 1.4.2_07 mustang tiger-rc FIXED IN: 1.4.2_07 mustang tiger-rc INTEGRATED IN: 1.4.2_07 mustang tiger-b60 tiger-rc
02-10-2004

EVALUATION Fixed; see comments section for details.
02-10-2004

WORK AROUND The bug needs the following set of conditions to manifest: (1) CMSPermGenPrecleaningEnabled is false (this is the default in 1.5 and 1.4.2_05) (2) CMSPrecleaningEnabled is true (this is the default) (3) CMSPermGenSweepingEnabled and CMSClassUnloading is true (these are _not_ the defaults in Tiger or earlier) (4) There is a scavenge during the CMS concurrent marking phase (this will usually be the case for all but the smallest old gen's) (5) There is no concurrent mode failure before the end of the CMS remark phase The only known workarounds are: (1) to switch off all precleaning: -XX:-CMSPrecleaningEnabled But that workaround is not practically viable because it would CMS remark pauses very long and thus usually almost completely defeat CMS' primary purpose; OR (* see Note in (2) below) (2) to switch off perm gen collection: -XX:-CMSPermGenSweepingEnabled -XX:-CMSClassUnloadingEnabled [* NOTE: this would very greatky reduce, but not completely elinminate, the risk of a crash.] But that again is not practically viable at least for applications that have no bound on Perm space allocation (i.e. apps that always load new classes), since that would make the occasional full collection onevitable which would blow the GC pause times just like above.
02-10-2004

SUGGESTED FIX http://analemma.sfbay/net/spot/scratch/ysr/mut/webrev -------------------------- Fix put back to Mustang workspace: Event: putback-to Parent workspace: /net/jano.sfbay/export/disk05/hotspot/ws/main/gc_baseline (jano.sfbay:/export/disk05/hotspot/ws/main/gc_baseline) Child workspace: /prt-workspaces/20040722160055.ysr.dragon/workspace (prt-web:/prt-workspaces/20040722160055.ysr.dragon/workspace) User: ysr Comment: --------------------------------------------------------- Original workspace: neeraja:/net/spot/scratch/ysr/dragon Submitter: ysr Archived data: /net/prt-archiver.sfbay/export2/archived_workspaces/main/gc_baseline/2004/20040722160055.ysr.dragon/ Webrev: http://analemma.sfbay.sun.com/net/prt-archiver.sfbay/export2/archived_workspaces/main/gc_baseline/2004/20040722160055.ysr.dragon/workspace/webrevs/webrev-2004.07.22/index.html Fixed 5037027: CMS: precleaning causes crash if perm gen collection enabled http://analemma.sfbay/net/spot/scratch/ysr/mut/webrev or http://analemma.sfbay/net/spot/scratch/ysr/dragon (for some reason, the sccs comments show up mangled in this latter webrev; refer to the former webrev for clean sccs comments.) There was a coding bug in the precleaning loop in the method preclean_mod_union_table(), which, with CMSPermGenPrelceaningEnabled off, would clear mod-union-table entries for the perm gen without actually precleaning the corresponding objects. This can cause intra-generational oop-updates in the perm gen to be ignored by the concurrent collector and lead to perm gen (and in rare cases other gen) objects to be recycled prematurely. The bug was masked until CMSPermGenPrecleaningEnabled was switched off recently to workaround bug 5040363. Reviewed by: jmasa, pbk (some cleanups suggested by pbk deferred) Fix Verified: yes Verification testing: In Tiger: --------- runThese with -server -XX:+ShowMessageBoxOnError -XX:+PrintGCDetails -XX:+UseConcMarkSweepGC -XX:-CMSPermGenPrecleaningEnabled -XX:+CMSPrecleaningEnabled -XX:+CMSPermGenSweepingEnabled -XX:+CMSClassUnloadingEnabled -XX:CMSInitiatingOccupancyFraction=2 -XX:+DisableExplicitGC In Mustang: ----------- runThese with -server -XX:+ShowMessageBoxOnError -XX:+PrintGCDetails -XX:+UseC oncMarkSweepGC -XX:-CMSPermGenPrecleaningEnabled -XX:+CMSPrecleaningEnabled -XX:+CMSPermGenSweepingEnabled -XX:+CMSClassUnloadingEnabled -XX:ExplicitGCInvokesConcurrent Other Testing: CMS (with & without above "stress" option list) spec, PRT, refWorkload, runThese, cloudscape, HP's class unloading test Files: update: src/share/vm/memory/concurrentMarkSweepGeneration.cpp update: src/share/vm/memory/concurrentMarkSweepGeneration.hpp Examined files: 3222 Contents Summary: 2 update 3220 no action (unchanged) ----------------------------------------- Fix also put back to Tiger: Job submitted at: 10:01:38 AM Total job time: 1h 48m 49s Job state: success Job fail/kill comment: NoComment Job flags: PUTBACK ARCHIVE SYNC-WORKSPACE Original workspace: neeraja:/net/spot/scratch/ysr/mut Parent workspace: /net/jano.sfbay/export/disk05/hotspot/ws/1.5/tiger_baseline Submitter: ysr PRT data: /net/prt-web.sfbay/prt-workspaces/20040726100038.ysr.mut Archived data: ERROR, no archive file generated Webrev: No webrev was generated Fixed 5037027: CMS: precleaning causes crash if perm gen collection enabled http://analemma.sfbay/net/spot/scratch/ysr/mut/webrev There was a coding bug in the precleaning loop in the method preclean_mod_union_table(), which, with CMSPermGenPrelceaningEnabled off, would clear mod-union-table entries for the perm gen without actually precleaning the corresponding objects. This can cause intra-generational oop-updates in the perm gen to be ignored by the concurrent collector and lead to perm gen (and in rare cases other gen) objects to be recycled prematurely. The bug was masked until CMSPermGenPrecleaningEnabled was switched off recently to workaround bug 5040363. Thanks to June for demonstrating, using ATG and IMM/S1AS, that CMS/GC during start-up (when perm gen mutation rates are extremely high, thus increasing exposure to this bug) would expose the customers to this bug -- evidence that convinced the core team to approve this bug for Tiger after two initial rejections. Thanks to Francis Hsu for turning around the requisite PIT tests at short notice; and to Alan Bateman for interpreting some results. Thanks also to the Portal Server team (Russ Petruzzelli and Young Kwon) for making available test machines for running some load tests (which however did not exhibit this bug). Reviewed by: jmasa, pbk (some cleanups suggested by pbk deferred) Approved by: Server Core Team Fix Verified: yes Verification testing: In Tiger: --------- . runThese with -server -XX:+ShowMessageBoxOnError -XX:+PrintGCDetails -XX:+UseConcMarkSweepGC -XX:-CMSPermGenPrecleaningEnabled -XX:+CMSPrecleaningEnabled -XX:+CMSPermGenSweepingEnabled -XX:+CMSClassUnloadingEnabled -XX:CMSInitiatingOccupancyFraction=2 -XX:+DisableExplicitGC . Big apps testing by June: (with CMSClassUnloadingEnabled CMSPermGenSweepingEnabled) IMM/S1AS ATG In Mustang: ----------- runThese with -server -XX:+ShowMessageBoxOnError -XX:+PrintGCDetails -XX:+UseC oncMarkSweepGC -XX:-CMSPermGenPrecleaningEnabled -XX:+CMSPrecleaningEnabled -XX:+CMSPermGenSweepingEnabled -XX:+CMSClassUnloadingEnabled -XX:ExplicitGCInvokesConcurrent Other Testing: CMS (with & without above "stress" option list) spec, PRT, refWorkload, runThese, cloudscape, HP's class unloading test Big apps testing by June: IMM/S1AS, ATG PIT testing by Francis Hsu
02-10-2004