Symptom: sporadic long marking phases (up to n times longer, where n is the number of HW threads) with parallel old GC, if the object graph has many linear subgraphs. Analysis: ParallelTaskTerminator::offer_termination() does not discover objects to scan on the marking stacks of co-worker threads, because it looks at the wrong TaskQueueSet. So instead of participating in the work, a gc thread that reaches offer_termination() will do SpinPause until the marking has been finished - potentially by just one co-worker thread. Fix: let ParallelTaskTerminator check ParCompactionManager::stack_array() instead of ParCompactionManager::region_array(). A specialized microbenchmark, TestStealMarkingTask (source attached and results below), shows about 3x shorter parallel old full GCs with the patch. Benchmark run without patch --------------------------- $ jdk9/bin/java -Xms2g -Xmx2g -XX:+UseParallelGC -cp /usr/work/d038402/jtests/QuickJavaTests/bin test.gc.TestStealMarkingTask 10 10000 10000 2 Building list of length 10000 holding sub-lists of length 10000 Heap stats: free:1921 MB (97.9 %) max:1963 MB done Heap stats: free:808 MB (41.2 %) max:1963 MB Calling System.gc() Heap stats: free:817 MB (41.6 %) max:1963 MB Full GC duration: 3359 ms Calling System.gc() Heap stats: free:817 MB (41.6 %) max:1963 MB Full GC duration: 3087 ms Calling System.gc() Heap stats: free:817 MB (41.6 %) max:1963 MB Full GC duration: 3024 ms Calling System.gc() Heap stats: free:817 MB (41.6 %) max:1963 MB Full GC duration: 3535 ms Calling System.gc() Heap stats: free:817 MB (41.6 %) max:1963 MB Full GC duration: 2831 ms Calling System.gc() Heap stats: free:817 MB (41.6 %) max:1963 MB Full GC duration: 3313 ms Calling System.gc() Heap stats: free:817 MB (41.6 %) max:1963 MB Full GC duration: 4919 ms Calling System.gc() Heap stats: free:817 MB (41.6 %) max:1963 MB Full GC duration: 3624 ms Calling System.gc() Heap stats: free:817 MB (41.6 %) max:1963 MB Full GC duration: 2689 ms Calling System.gc() Heap stats: free:817 MB (41.6 %) max:1963 MB Full GC duration: 3418 ms Benchmark run without patch: about 3x shorter full GC pauses ------------------------------------------------------------ $ jdk9_patched/bin/java -Xms2g -Xmx2g -XX:+UseParallelGC -cp /usr/work/d038402/jtests/QuickJavaTests/bin test.gc.TestStealMarkingTask 10 10000 10000 2 Building list of length 10000 holding sub-lists of length 10000 Heap stats: free:1921 MB (97.9 %) max:1963 MB done Heap stats: free:808 MB (41.2 %) max:1963 MB Calling System.gc() Heap stats: free:817 MB (41.6 %) max:1963 MB Full GC duration: 1536 ms Calling System.gc() Heap stats: free:817 MB (41.6 %) max:1963 MB Full GC duration: 1255 ms Calling System.gc() Heap stats: free:817 MB (41.6 %) max:1963 MB Full GC duration: 1351 ms Calling System.gc() Heap stats: free:817 MB (41.6 %) max:1963 MB Full GC duration: 1302 ms Calling System.gc() Heap stats: free:817 MB (41.6 %) max:1963 MB Full GC duration: 1281 ms Calling System.gc() Heap stats: free:817 MB (41.6 %) max:1963 MB Full GC duration: 1259 ms Calling System.gc() Heap stats: free:817 MB (41.6 %) max:1963 MB Full GC duration: 1257 ms Calling System.gc() Heap stats: free:817 MB (41.6 %) max:1963 MB Full GC duration: 1292 ms Calling System.gc() Heap stats: free:817 MB (41.6 %) max:1963 MB Full GC duration: 1305 ms Calling System.gc() Heap stats: free:817 MB (41.6 %) max:1963 MB Full GC duration: 1282 ms
|