JDK-8079128 : ParNewGC times doubled from Java SE 6 to Java SE 9
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 8u45,9
  • Priority: P2
  • Status: Closed
  • Resolution: Not an Issue
  • OS: solaris
  • CPU: x86
  • Submitted: 2015-04-30
  • Updated: 2019-12-14
  • Resolved: 2019-12-14
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdResolved
Related Reports
Relates :  
Description
In a proprietary application, minor collections (ParNewGC+CMS) are considerably slower 
in Java SE 8/Java SE 9 than in Java SE 6.

In an identical environment, Java SE 6 ParNewGC takes less than 3 msec,
while Java SE 8 ParNewGC or Java SE 9 ParNewGC take about 6-7 msec.

Environment used: SUN FIRE X4170 M2, 24 cpus, Solaris 10 x64 144489-05

Experimental results using a demo application:

 6u91 ParNewGC ParallelCMSThreads=1: 0.00263 sec (base line)
 7u76 ParNewGC ParallelCMSThreads=1: 0.00652 sec (+ 147 %)
 7u76 ParNewGC ParallelCMSThreads=1 +BlockOffsetArrayUseUnallocatedBlock : 0.00462 sec (+ 76 %)
 7u76 ParNewGC ParallelCMSThreads=1 +BlockOffsetArrayUseUnallocatedBlock +JavaObjectsInPerm: 0.00389 sec (+ 48 %)
 7u76 ParNewGC ParallelCMSThreads=1 +BlockOffsetArrayUseUnallocatedBlock +JavaObjectsInPerm ParGCCardsPerStrideChunk=4096: 0.00338 sec (+ 29 %)
 7u76 ParNewGC ParallelCMSThreads=16 +BlockOffsetArrayUseUnallocatedBlock +JavaObjectsInPerm ParGCCardsPerStrideChunk=4096: 0.00354 sec (+ 35 %)
 8u45 ParNewGC ConcGCThreads=1: 0.00600 sec (+ 128 %)
 8u45 ParNewGC ConcGCThreads=16: 0.00578 sec (+ 120 %)
 8u45 ParNewGC ConcGCThreads=1 +BlockOffsetArrayUseUnallocatedBlock: 0.00725 sec (+ 176 %)
 8u45 ParNewGC ConcGCThreads=16 +BlockOffsetArrayUseUnallocatedBlock: 0.00409 sec (+ 56 %)
 8u45 ParNewGC ConcGCThreads=1 +BlockOffsetArrayUseUnallocatedBlock ParGCCardsPerStrideChunk=4096: 0.00391 sec  (+ 49 %)
 9b61 ParNewGC ConcGCThreads=1: 0.00608 sec (+ 131 %)
 9b61 ParNewGC ConcGCThreads=1 +BlockOffsetArrayUseUnallocatedBlock: 0.00474 sec (+ 80 %)
 9b61 ParNewGC ConcGCThreads=1 +BlockOffsetArrayUseUnallocatedBlock ParGCCardsPerStrideChunk=4096: 0.00547 sec (+ 108 %)
 9b61 ParNewGC ConcGCThreads=16 +BlockOffsetArrayUseUnallocatedBlock: 0.00524 sec (+ 99%)

The following Java SE software was used:

Java HotSpot(TM) 64-Bit Server VM (20.101-b01 for solaris-amd64 JRE (1.6.0_91-b14)
Java HotSpot(TM) 64-Bit Server VM (24.76-b04) for solaris-amd64 JRE (1.7.0_76-b13)
Java HotSpot(TM) 64-Bit Server VM (25.45-b02) for solaris-amd64 JRE (1.8.0_45-b14)
Java HotSpot(TM) 64-Bit Server VM (1.9.0-ea-b61) for solaris-amd64 JRE (1.9.0-ea-b61) 

Comments
CMS has been removed - JDK-8229049
14-12-2019

The data (GC log) was evaluated the following way: % grep ParNew SLOW_GC_DEMO_screener_0_gc.j6.log | tail -100 | awk '{print $11}' | wc 86 86 860 % grep ParNew SLOW_GC_DEMO_screener_0_gc.j6.log | tail -50 | awk '{print $11}' | awk '{sum += $1} END {printf( "SUM: %f\n", sum)}' SUM: 0.131453 0.131453 / 50 = 0.00263
30-04-2015

The behaviour is reproducible. A demo application is available. ParNewGC is used in combination with the 'Concurrent Low Pause Collector' (CMS). Playing with -XX:ConcGCThreads=<n> did improve performance. The following flags helped improve performance in Java SE 7: +BlockOffsetArrayUseUnallocatedBlock +JavaObjectsInPerm ParGCCardsPerStrideChunk The following flags helped improve performance in Java SE 9: +BlockOffsetArrayUseUnallocatedBlock ParGCCardsPerStrideChunk The following flags helped improve performance in Java SE 9: +BlockOffsetArrayUseUnallocatedBlock
30-04-2015