Bug ID: JDK-8067500 ConcurrentMark::abort() is excessively slow

Type: Bug
Component: hotspot
Sub-Component: gc

Priority: P4
Status: Closed
Resolution: Won't Fix

Submitted: 2014-12-15
Updated: 2017-01-18
Resolved: 2016-03-08

For  a System.gc() the call to  ConcurrentMark::abort() can be excessively long.
Cut-and-pasted from a log (some parts left off).  I added timers to get these
numbers.

285.569: #0: [Full GC (System.gc()), 6.3754329 secs]
285.569: #0: [Abort concurrent mark (System.gc()), 5.4316610 secs]
285.569: #0: [_nextMarkBitMap->clearAll()  (System.gc()) , 2.6187331 secs]
288.188: #0: [clear_all_count_data()  (System.gc()) , 2.8120570 secs]

JDK-8151215 is about changing the storage mechanism to be more efficient with the clearing of otherwise unused data.
08-03-2016
JDK-8151069 will fix this by parallelizing the majority of ConcurrentMark::abort().
08-03-2016
Additional work can be considered if the abort time seems excessive when the majority of the heap is actually being used.
08-03-2016
There is an enhancement coming which will only clear the bitmaps if they have been used. That will fix the problem as I observed it (where the bitmap was in fact mostly unused).
08-03-2016
The collect/analyze data does not show any more detail than that clear_all_count_data() is taking the additional time.
17-12-2014
With only 10 GC threads times are better [Times: user=2.14 sys=0.16 real=1.46 secs] [Times: user=2.38 sys=0.10 real=1.39 secs] Corresponding CMS times are (default GC threads) [Times: user=0.53 sys=0.00 real=0.54 secs] [Times: user=0.50 sys=0.01 real=0.51 secs] For G1 the times to call invoke_at_safepoint() to do the actual full GC are close to the CMS times. 291.004: #0: [invoke_at_safepoint (System.gc()) , 0.3401121 secs] 294.961: #1: [invoke_at_safepoint (System.gc()) , 0.3092868 secs]
16-12-2014
This is running specjbb2005 with a ~120G heap.
16-12-2014
The patch to make AlwayPreTouch work for G1 was used for these measurements. [Times: user=11.02 sys=3.23 real=6.38 secs] [Times: user=12.73 sys=5.30 real=3.91 secs]
15-12-2014
The second System.gc() is a little different. System.gc() is shorter and only clear_all_count_data() is long (not _nextMarkBitMap->clearAll()) 291.949: #1: [Full GC (System.gc()), 3.9105153 secs] 291.950: #1: [Abort concurrent mark (System.gc()), 3.0081289 secs] 291.950: #1: [_nextMarkBitMap->clearAll() (System.gc()) , 0.1934346 secs] 292.143: #1: [clear_all_count_data() (System.gc()) , 2.8139040 secs]
15-12-2014
The code associated with the above traces are (ignore missing lines, it's where the GCTraceTime code was added). Abort concurrent mark 1282 // Abandon current iterations of concurrent marking and concurrent 1283 // refinement, if any are in progress. We have to do this before 1284 // wait_until_scan_finished() below. 1285 concurrent_mark()->abort(); _nextMarkBitMap->clearAll() 3288 // Clear all marks in the next bitmap for the next marking cycle. This will allow us to skip the next 3289 // concurrent bitmap clearing. 3292 _nextMarkBitMap->clearAll(); clear_all_count_data() 3299 // Clear the liveness counting data 3302 clear_all_count_data();
15-12-2014

Duplicate :	JDK-8151069 - Parallelize clearing the per-thread concurrent mark data structures
Relates :	JDK-8151215 - Modify layout of (large) Concurrent Mark data structures