Bug ID: JDK-7001033 CMS: assert(gch->gc_cause() == GCCause::_scavenge_alot || !gch->incremental_collection

Type: Bug
Component: hotspot
Sub-Component: gc
Affected Version: hs20

Priority: P3
Status: Closed
Resolution: Fixed
OS: generic
CPU: generic

Submitted: 2010-11-18
Updated: 2011-03-07
Resolved: 2011-03-07

JDK 6	JDK 7	Other
6u25Fixed	7Fixed	hs20Fixed

The assert mentioned in the synopsis was recently weakened as part of:-

changeset:   1810:8d81b4a1d3e1
user:        ysr
date:        Thu Nov 11 10:42:43 2010 -0800
summary:     6998802: ScavengeALot: assert(!gch->incremental_collection_failed()) failed: Twice in a row


But it appears as though there may still be an issue with it. There were several nightly failures, for example:-

http://sqeweb.sfbay/nfs/results/vm/gtee/JDK7/NIGHTLY/VM/2010-11-16/GC_Baseline-Xinc/vm/linux-i586/server/mixed/linux-i586_vm_server_mixed_vm.gc.testlist/ResultDir/Churn3a/

http://sqeweb.sfbay/nfs/results/vm/gtee/JDK7/NIGHTLY/VM/2010-11-16/GC_Baseline-Xconc/vm/solaris-sparc/client/mixed/solaris-sparc_vm_client_mixed_nsk.quick-monitoring.testlist/ResultDir/CollectionCounters001/hs_err_pid21782.log

http://sqeweb.sfbay/nfs/results/vm/gtee/JDK7/NIGHTLY/VM/2010-11-16/GC_Baseline-Xconc/vm/solaris-sparc/client/mixed/solaris-sparc_vm_client_mixed_nsk.quick-monitoring.testlist/ResultDir/CollectionCounters003/hs_err_pid21914.log

List of failures:
nsk/monitoring/GarbageCollectorMXBean/CollectionCounters/CollectionCounters001
nsk/monitoring/GarbageCollectorMXBean/CollectionCounters/CollectionCounters003
nsk/monitoring/GarbageCollectorMXBean/CollectionCounters/CollectionCounters004
nsk/monitoring/GarbageCollectorMXBean/CollectionCounters/CollectionCounters005

For a few more failures, see:-

http://sqeweb.sfbay/nfs/results/vm/gtee/JDK7/NIGHTLY/VM/2010-11-16/GC_Baseline-Xconc/index.html

It appears as though in all these cases it was a "concurrent full gc" (via +ExplicitGCInvokesConcurrent)
that resulted in the assertion failure.

Here's a typical stack retrace:-

-----------------  lwp# 6 / thread# 6  --------------------
 ff0cd71c _lwp_kill (6, ffffffef, ffffffec, ff167550, 5, 6) + 8
 ff0528a0 abort    (f0f7f498, ff164000, 6, 1, ff165780, 0) + 108
 fe7397a0 void os::abort(bool) (fe74c974, ffe88176, 177c00, feece724, f0f7f498, feece724) + 14c
 fe9d33b0 void VMError::report_and_die() (1, feeadadc, 1c8c00, feee3f70, 0, feee3fda) + cf8
 fe0e89a0 void report_vm_error(const char*,int,const char*,const char*) (feb6dfc0, 34a, feb6e00f, feb6e072, 2e800, fee7f140) + 6c
 fe0f9010 void DefNewGeneration::gc_epilogue(bool) (65d98, 628f8, feeaac00, 0, feea3800, feeaac00) + 100
 fe1a7b74 void GenCollectedHeap::gc_epilogue(bool) (628f8, feeac014, f0f7f708, 0, f0f7f708, 62958) + 58
 fe1a46d0 void GenCollectedHeap::do_collection(bool,bool,unsigned,bool,int) (628f8, ffffffff, fe010640, 62954, 0, 0) + 1258
 fe1a64c0 void GenCollectedHeap::do_full_collection(bool,int) (628f8, fee9934c, 0, 0, fe1757a8, feeabe54) + dc
 fe9d00a0 void VM_GenCollectFullConcurrent::doit() (fd97f508, feeadc00, 1, feddb18f, 628f8, f0f7fa04) + 100
 fea09fa0 void VM_Operation::evaluate() (fd97f508, 520, fd992240, 110a40, fee1081d, feeadafa) + 170
 fea082e8 void VMThread::loop() (0, fe9d0a80, feee3cb4, fee4abb1, fee0f8d0, feecd208) + 660
 fea07504 void VMThread::run() (1c8c00, 5b2a0, 10, 6e0f0, feeed2f0, 6e000) + 104
 fe736f1c java_start (1c8c00, 3, 1c9f98, fed0689b, fee7f140, 0) + 2e8
 ff0c8ca8 _lwp_start (0, 0, 0, 0, 0, 0)
gc/memory/Churn/Churn3a

EVALUATION http://hg.openjdk.java.net/jdk7/build/hotspot/rev/6cd6d394f280
25-12-2010
EVALUATION http://hg.openjdk.java.net/jdk7/hotspot/hotspot/rev/6cd6d394f280
10-12-2010
EVALUATION http://hg.openjdk.java.net/jdk7/hotspot-gc/hotspot/rev/6cd6d394f280
08-12-2010
EVALUATION A too-full heap can leave us in this state and, furthermore, can cause a mutator slowdown because of current allocation policy. This needs to be fixed, as the recent report of the large regression with specjbb2005 indicates; the problem, however, is limited to situations where the heap is too full so that promotion failure will continue to happen on multiple back-to-back collections. I'll raise this to a P3 for that reason since some badly tuned heaps will see a large performance regression, as noted recently by the performance testing of jdk7-b118.
23-11-2010
WORK AROUND -XX:SuppressErrorAt=.... because the assert is benign wrt correctness; it is there to protect against possible performance problems under circumstances that we may not have considered.
18-11-2010

Relates :	JDK-7005270 - CMS: cleanups related to concurrent mode failure, compaction, and prediction of promotion failure
Relates :	JDK-6998802 - ScavengeALot: assert(!gch->incremental_collection_failed()) failed: Twice in a row