United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
Bug ID: JDK-5037027 CMS: precleaning causes crash if perm gen collection enabled
JDK-5037027 : CMS: precleaning causes crash if perm gen collection enabled

Details
Type:
Bug
Submit Date:
2004-04-23
Status:
Resolved
Updated Date:
2004-10-13
Project Name:
JDK
Resolved Date:
2004-07-28
Component:
hotspot
OS:
generic
Sub-Component:
gc
CPU:
generic
Priority:
P1
Resolution:
Fixed
Affected Versions:
1.4.2_05,5.0
Fixed Versions:
1.4.2_07 (b01)

Related Reports
Backport:
Backport:

Sub Tasks

Description
The test case with bug 4957990 has revealed an interesting
crash that we are trying to understand. The crash happens
with -Xint, with the CMS collector, if we run with perm
gen collection and old gen precleaning enabled. The
crash goes away if either of these is disabled. Further
investigation is in progress. This may or may not be related
to long-standing bug 4756801. The comments section will
be updated as investigation proceeds. Priority may get
readjusted as we learn more about the problem.

                                    

Comments
SUGGESTED FIX

http://analemma.sfbay/net/spot/scratch/ysr/mut/webrev

--------------------------
Fix put back to Mustang workspace:

Event:            putback-to
Parent workspace: /net/jano.sfbay/export/disk05/hotspot/ws/main/gc_baseline
                  (jano.sfbay:/export/disk05/hotspot/ws/main/gc_baseline)
Child workspace:  /prt-workspaces/20040722160055.ysr.dragon/workspace
                  (prt-web:/prt-workspaces/20040722160055.ysr.dragon/workspace)
User:             ysr

Comment:

---------------------------------------------------------

Original workspace:     neeraja:/net/spot/scratch/ysr/dragon
Submitter:              ysr
Archived data:          /net/prt-archiver.sfbay/export2/archived_workspaces/main/gc_baseline/2004/20040722160055.ysr.dragon/
Webrev:                 http://analemma.sfbay.sun.com/net/prt-archiver.sfbay/export2/archived_workspaces/main/gc_baseline/2004/20040722160055.ysr.dragon/workspace/webrevs/webrev-2004.07.22/index.html

Fixed 5037027: CMS: precleaning causes crash if perm gen collection enabled

http://analemma.sfbay/net/spot/scratch/ysr/mut/webrev

or 

http://analemma.sfbay/net/spot/scratch/ysr/dragon
(for some reason, the sccs comments show up mangled in
this latter webrev; refer to the former webrev for
clean sccs comments.)

There was a coding bug in the precleaning loop in the method
preclean_mod_union_table(), which, with CMSPermGenPrelceaningEnabled
off, would clear mod-union-table entries for the perm gen
without actually precleaning the corresponding objects.
This can cause intra-generational oop-updates in the
perm gen to be ignored by the concurrent collector
and lead to perm gen (and in rare cases other gen)
objects to be recycled prematurely.
The bug was masked until CMSPermGenPrecleaningEnabled
was switched off recently to workaround bug 5040363.

Reviewed by: jmasa, pbk (some cleanups suggested by pbk deferred)

Fix Verified: yes

Verification testing:
  In Tiger:
  ---------
  runThese with -server -XX:+ShowMessageBoxOnError -XX:+PrintGCDetails -XX:+UseConcMarkSweepGC -XX:-CMSPermGenPrecleaningEnabled -XX:+CMSPrecleaningEnabled -XX:+CMSPermGenSweepingEnabled -XX:+CMSClassUnloadingEnabled -XX:CMSInitiatingOccupancyFraction=2 -XX:+DisableExplicitGC
  In Mustang:
  -----------
  runThese with -server -XX:+ShowMessageBoxOnError -XX:+PrintGCDetails -XX:+UseC
oncMarkSweepGC -XX:-CMSPermGenPrecleaningEnabled -XX:+CMSPrecleaningEnabled -XX:+CMSPermGenSweepingEnabled -XX:+CMSClassUnloadingEnabled -XX:ExplicitGCInvokesConcurrent

Other Testing: CMS (with & without above "stress" option list)
  spec, PRT, refWorkload, runThese, cloudscape,
  HP's class unloading test

Files:
update: src/share/vm/memory/concurrentMarkSweepGeneration.cpp
update: src/share/vm/memory/concurrentMarkSweepGeneration.hpp

Examined files: 3222

Contents Summary:
       2   update
    3220   no action (unchanged)


-----------------------------------------
Fix also put back to Tiger:

Job submitted at:       10:01:38 AM
Total job time:         1h 48m 49s
Job state:              success
Job fail/kill comment:  NoComment
Job flags:              PUTBACK ARCHIVE SYNC-WORKSPACE
Original workspace:     neeraja:/net/spot/scratch/ysr/mut
Parent workspace:       /net/jano.sfbay/export/disk05/hotspot/ws/1.5/tiger_baseline
Submitter:              ysr
PRT data:               /net/prt-web.sfbay/prt-workspaces/20040726100038.ysr.mut
Archived data:          ERROR, no archive file generated

Webrev:                 No webrev was generated

Fixed 5037027: CMS: precleaning causes crash if perm gen collection enabled

http://analemma.sfbay/net/spot/scratch/ysr/mut/webrev

There was a coding bug in the precleaning loop in the method
preclean_mod_union_table(), which, with CMSPermGenPrelceaningEnabled
off, would clear mod-union-table entries for the perm gen
without actually precleaning the corresponding objects.
This can cause intra-generational oop-updates in the
perm gen to be ignored by the concurrent collector
and lead to perm gen (and in rare cases other gen)
objects to be recycled prematurely.
The bug was masked until CMSPermGenPrecleaningEnabled
was switched off recently to workaround bug 5040363.

Thanks to June for demonstrating, using ATG and IMM/S1AS, that
CMS/GC during start-up (when perm gen mutation rates are extremely
high, thus increasing exposure to this bug) would expose the customers
to this bug -- evidence that convinced the core team to approve
this bug for Tiger after two initial rejections.

Thanks to Francis Hsu for turning around the requisite PIT
tests at short notice; and to Alan Bateman for interpreting
some results.

Thanks also to the Portal Server team (Russ Petruzzelli and
Young Kwon) for making available test machines for running some
load tests (which however did not exhibit this bug).

Reviewed by: jmasa, pbk (some cleanups suggested by pbk deferred)
Approved by: Server Core Team

Fix Verified: yes

Verification testing:
  In Tiger:
  ---------
  .  runThese with -server -XX:+ShowMessageBoxOnError -XX:+PrintGCDetails -XX:+UseConcMarkSweepGC -XX:-CMSPermGenPrecleaningEnabled -XX:+CMSPrecleaningEnabled -XX:+CMSPermGenSweepingEnabled -XX:+CMSClassUnloadingEnabled -XX:CMSInitiatingOccupancyFraction=2 -XX:+DisableExplicitGC
  . Big apps testing by June: (with CMSClassUnloadingEnabled CMSPermGenSweepingEnabled)
    IMM/S1AS
    ATG
  In Mustang:
  -----------
  runThese with -server -XX:+ShowMessageBoxOnError -XX:+PrintGCDetails -XX:+UseC
oncMarkSweepGC -XX:-CMSPermGenPrecleaningEnabled -XX:+CMSPrecleaningEnabled -XX:+CMSPermGenSweepingEnabled -XX:+CMSClassUnloadingEnabled -XX:ExplicitGCInvokesConcurrent

Other Testing: CMS (with & without above "stress" option list)
  spec, PRT, refWorkload, runThese, cloudscape,
  HP's class unloading test
  Big apps testing by June: IMM/S1AS, ATG
  PIT testing by Francis Hsu


                                     
2004-10-02
WORK AROUND

The bug needs the following set of conditions to manifest:
(1) CMSPermGenPrecleaningEnabled is false (this is the default in
    1.5 and 1.4.2_05)
(2) CMSPrecleaningEnabled is true (this is the default)
(3) CMSPermGenSweepingEnabled and CMSClassUnloading is true
    (these are _not_ the defaults in Tiger or earlier)
(4) There is a scavenge during the CMS concurrent marking phase
    (this will usually be the case for all but the smallest
     old gen's)
(5) There is no concurrent mode failure before the end of the
    CMS remark phase

The only known workarounds are:

(1) to switch off all precleaning: -XX:-CMSPrecleaningEnabled
    But that workaround is not practically viable because it
    would CMS remark pauses very long and thus usually almost
    completely defeat CMS' primary purpose;

OR (* see Note in (2) below)

(2) to switch off perm gen collection: -XX:-CMSPermGenSweepingEnabled
    -XX:-CMSClassUnloadingEnabled

    [* NOTE: this would very greatky reduce, but not completely
     elinminate, the risk of a crash.]

    But that again is not practically viable at least for applications
    that have no bound on Perm space allocation (i.e. apps that
    always load new classes), since that would make the occasional
    full collection onevitable which would blow the GC pause times
    just like above.
                                     
2004-10-02
EVALUATION

Fixed; see comments section for details.
                                     
2004-10-02
CONVERTED DATA

BugTraq+ Release Management Values

COMMIT TO FIX:
1.4.2_07
mustang
tiger-rc

FIXED IN:
1.4.2_07
mustang
tiger-rc

INTEGRATED IN:
1.4.2_07
mustang
tiger-b60
tiger-rc


                                     
2004-10-02



Hardware and Software, Engineered to Work Together