Bug ID: JDK-6483690 CMS: assert(cur_val < top,"All recorded addresses should be less")

Type: Bug
Component: hotspot
Sub-Component: gc
Affected Version: 7

Priority: P3
Status: Resolved
Resolution: Fixed
OS: generic
CPU: generic

Submitted: 2006-10-18
Updated: 2010-05-09
Resolved: 2007-06-20

JDK 6	JDK 7	Other
6u4Fixed	7Fixed	hs10Fixed

Assertion failure during nightly testing on linux-amd64 with test
gc/memory/Churn/Churn2

[2006-10-18T04:43:50.36] # To suppress the following error report, specify this argument
[2006-10-18T04:43:50.36] # after -XX: or in .hotspotrc:  SuppressErrorAt=/concurrentMarkSweepGeneration.cpp:5124
[2006-10-18T04:43:50.36] #
[2006-10-18T04:43:50.36] # An unexpected error has been detected by Java Runtime Environment:
[2006-10-18T04:44:53.94] #
[2006-10-18T04:44:53.94] #  Internal Error (/PrtBuildDir/workspace/src/share/vm/memory/concurrentMarkSweepGeneration.cpp, 5124), pid=1312, tid=5126
[2006-10-18T04:44:53.94] #
[2006-10-18T04:44:53.94] # Java VM: Java HotSpot(TM) 64-Bit Server VM (20061016062331.jmasa.gc_baseline_merge-debug mixed mode)
[2006-10-18T04:44:53.94] #
[2006-10-18T04:44:53.94] # Error: assert(cur_val < top,"All recorded addresses should be less")
[2006-10-18T04:44:53.94] # An error report file with more information is saved as hs_err_pid1312.log
[2006-10-18T04:44:53.94] #
[2006-10-18T04:44:53.94] # If you would like to submit a bug report, please visit:
[2006-10-18T04:44:53.94] #   http://java.sun.com/webapps/bugreport/crash.jsp
[2006-10-18T04:44:53.94] #
[2006-10-18T04:44:53.94] VM option '-PrintVMOptions'
[2006-10-18T04:44:53.94] VM option '+UseConcMarkSweepGC'
[2006-10-18T04:44:53.94] VM option '+CMSPermGenSweepingEnabled'
[2006-10-18T04:44:53.94] VM option '+CMSClassUnloadingEnabled'
[2006-10-18T04:44:53.94] VM option '+ExplicitGCInvokesConcurrent'
Adding -XX:-UseCMSCompactAtFullCollection -XX:+PromotionFailureALot
to the mix is a sure way to expose this problem very reliably.

SUGGESTED FIX Event: putback-to Parent workspace: /net/jano.sfbay/export/disk05/hotspot/ws/main/gc_baseline (jano.sfbay:/export/disk05/hotspot/ws/main/gc_baseline) Child workspace: /net/prt-web.sfbay/prt-workspaces/20070525142420.ysr.mustang/workspace (prt-web:/net/prt-web.sfbay/prt-workspaces/20070525142420.ysr.mustang/workspace) User: ysr Comment: --------------------------------------------------------- Job ID: 20070525142420.ysr.mustang Original workspace: karachi:/net/jano.sfbay/export/hotspot/users1/ysr/mustang Submitter: ysr Archived data: /net/prt-archiver.sfbay/data/archived_workspaces/main/gc_baseline/2007/20070525142420.ysr.mustang/ Webrev: http://prt-web.sfbay.sun.com/net/prt-archiver.sfbay/data/archived_workspaces/main/gc_baseline/2007/20070525142420.ysr.mustang/workspace/webrevs/webrev-2007.05.26/index.html Fixed 6483690: CMS: assert(cur_val < top,"All recorded addresses should be less") Webrev: http://analemma.sfbay/net/jano/export/disk05/hotspot/users/ysr/mustang/webrev.6483690 The problem was that when there is a promotion failure there may be objects in both the survivor spaces. The parallelization of survivor space remark was not dealing properly with this situation. To recapitulate, at the end of a normal scavenge, the survivor space known as "from"-space holds sruvivors and the one known as "to"-space will, typically not. The from survivor space which will normally hold survivors is chunked at PLAB boundaries, and the chunking information saved to a well-known "survivor space chunking table" which the CMS collector maintains and which is updated during a scavenge as PLAB's are acquired. The CMS collector assumes for a remark pause that the values in this table represent block boundaries for the "from" space. This assumption is violated when, following a promotion failure, the survivor space names are not swapped, as done in the normal case. The bug is extremely rare because the assumed invariant is broken for the period between the promotion failure and the immediately imminent mark-compact collection that follows and restores said invariant -- CMS is affected only when a CMS parallel remark phase runs during that window prior to the baton being passed to the foreground collector that does the compaction. In the product builds where the assert is not hit, we'd end up end up trying to use the information in the chunking table to scan the "wrong" survivor space and usually crash. There were several possible fixes for this bug, including (but not limited to) maintaining the identity of the survivor space in the chunking table, but the smallest appeared to be to have the scavenge always flip the names of the spaces, so as to leave the CMS-assumed invariant intact. Note that while the chunking array does not come into play when using DefNew, the serial scavenger, we made an identical change in DefNewGeneration::collect(), for the sake of uniformity. We should really just suitably factor out the shared code here rather than duplicating it as currently done. Fix Verified: y Verification Testing: runThese -quick -testbase with CMS Other Testing: PRT, refworkload, runThese -quick -testbase Reviewed by: Andrey Petrusnko Files: update: src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp update: src/share/vm/gc_implementation/parNew/parNewGeneration.cpp update: src/share/vm/memory/defNewGeneration.cpp Examined files: 3964 Contents Summary: 3 update 3961 no action (unchanged)

30-05-2007

SUGGESTED FIX The following appears to be the smallest fix to deal with this issue:- ------- parNewGeneration.cpp ------- *** /tmp/sccs.FLayvw Mon May 21 16:17:00 2007 --- parNewGeneration.cpp Mon May 21 16:13:31 2007 *************** *** 785,790 **** --- 785,791 ---- gclog_or_tty->print(" (promotion failed)"); } // All the spaces are in play for mark-sweep. + swap_spaces(); // Make things simpler for CMS; see 6483690. from()->set_next_compaction_space(to()); gch->set_incremental_collection_will_fail(); } Basically what it does is to restore the CMS-assumed invariant that the data in the survivor plab chunking array always corresponds to the semi-space named "from". [That invariant would be broken for the brief window between a scavenge that resulted in a promotion failure and the subsequent mark-compact which would have restored that invariant. CMS can under some rare circumstances run during that window before the collection "baton" is passed to the foreground mark-compact collection following the failed scavenge.]

21-05-2007

EVALUATION Failure to clear survivor chunking array in case of promotion failure leaves it with obsolete data which the remark phase tries to use.

21-05-2007