JDK-8350130 : Performance Degradation with Default ParallelGC in Hotspot JDK on Ubuntu
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 8
  • Priority: P3
  • Status: Closed
  • Resolution: Not an Issue
  • OS: linux
  • CPU: x86_64
  • Submitted: 2025-02-12
  • Updated: 2025-02-21
  • Resolved: 2025-02-21
Related Reports
Relates :  
Relates :  
Description
A DESCRIPTION OF THE PROBLEM :
I have encountered a significant performance issue when running a Java test case on Ubuntu 20.04.4 LTS using Hotspot JDK 1.8.0_431. Approximately 97% of the user time is spent executing garbage collection (GC) algorithms with the default ParallelGC, resulting in a real execution time of around 2 minutes and 51 seconds. However, when using OpenJ9 JDK 8u432-b06 or Hotspot JDK 1.8.0_431 with the G1GC garbage collector, the execution time drops to just a few seconds. Additionally, running the same test on Windows yields short execution times, further indicating that the issue is specific to the Linux environment with ParallelGC.I suspect that the default ParallelGC in this version of the Hotspot JVM has performance issues on Linux systems. I can provide GC logs for further analysis.

REGRESSION : Last worked in version 8

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Using Hotspot JDK with default ParallelGC:   time /root/hotspot/jdk1.8.0_431/bin/java -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/root/example/gc.log   Test
Using OpenJ9 JDK:    time /root/openj9/jdk8u432-b06/bin/java Test
Using Hotspot JDK with G1GC:   time /root/hotspot/jdk1.8.0_431/bin/java -XX:+UseG1GC Test
Using Hotspot JDK   with  windows and default ParallelGC  : Measure-Command { D:\development_tools\JDK8_HOT\bin\java.exe -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:D:\code\java\test\gc.log test }  


ACTUAL -
Hotspot JDK 1.8.0_431 with default ParallelGC: 

real    3m29.283s
user    283m14.400s
sys     0m52.860s


OpenJ9 JDK 8u432-b06:

real    0m15.510s
user    1m59.276s
sys     0m17.718s


Hotspot JDK 1.8.0_431 with G1GC:

real    0m7.403s
user    2m23.107s
sys     0m16.017s


On Windows using Hotspot JDK with default ParallelGC:

Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 22
Milliseconds      : 396
Ticks             : 223960214
TotalDays         : 0.000259213210648148
TotalHours        : 0.00622111705555555
TotalMinutes      : 0.373267023333333
TotalSeconds      : 22.3960214
TotalMilliseconds : 22396.0214

---------- BEGIN SOURCE ----------
import java.util.Arrays;

public class test {
    static final int LOOP = 30000000;

    static class V16qi {
        byte[] values = new byte[16];

        V16qi() {}

        V16qi(byte[] values) {
            this.values = values;
        }
    }
    public static void main(String[] args) {


        V16qi[] p0 = new V16qi[LOOP];
        V16qi[] p1 = new V16qi[LOOP];

        for (int i = 0; i < LOOP; i++) {
            byte[] array0 = new byte[16];
            byte[] array1 = new byte[16];
            for (int j = 0; j < 16; j++) {
                array0[j] = (byte) (1 + i + j);
                array1[j] = (byte) (1 + i * i + j * j);
            }
            p0[i] = new V16qi(array0);
            p1[i] = new V16qi(array1);
        }

    }
}
---------- END SOURCE ----------
Comments
Closing as "Not an Issue" as most of these observations can be explained by different heap sizing.
21-02-2025

See also https://tschatzl.github.io/2024/02/06/jdk22-g1-parallel-gc-changes.html for some details about these changes mentioned above and why they help in this case.
17-02-2025

The reason for the performance difference between Parallel Gc with or without -Xms is simply ergonomics: in the case without -Xms, Parallel gc starts off with a fairly small initial heap size, which causes additional garbage collections; depending on -Xms/-Xmx values the amount of garbage collections is much smaller because with a larger total heap size (via -Xms), the young generation size will be much larger. The sample application is also basically the worst case for a generational collector: not a single application object dies during garbage collection (all stay live), and since garbage collection time depends on the amount of live objects, garbage collection time may be very long. Although in this case I think Parallel GC is severely handicapped by what has been fixed in JDK-8310031 and JDK-8321013. At least JDK 22 with Parallel GC which contains both fixes is much much faster. As for OpenJ9, their GC is completely different so there is no point to compare. On Windows, default (initial) heap sizing may be different as well.
17-02-2025

The issue is not specific to Ubuntu. Similar result was observed on Oracle Linux. On Oracle Linux 6.8, 7.8GB is the default MaxHeapSize for 30 GB RAM. MaxHeapSize depends on some factors as: -free RAM -swap space -others ./java -XX:+PrintFlagsFinal -version | grep MaxHeap uintx MaxHeapFreeRatio = 100 {manageable} uintx MaxHeapSize := 7849639936 {product} java version "1.8.0_431" Java(TM) SE Runtime Environment (build 1.8.0_431-b26) Java HotSpot(TM) 64-Bit Server VM (build 25.431-b26, mixed mode) Check the results when running with default maxheapsize (7.8G) and when setting the same value explicitely: 1. ParallelGC, default heap size (7.8G) real 0m26.787s user 0m38.756s sys 0m11.105s 2. ParallelGC -Xms7800M -Xmx7800M real 0m8.711s user 0m9.320s sys 0m4.060s Here is the core of that test: echo "1. ParallelGC, default heap size (7.8G)" time $JAVA_HOME/bin/java -cp . Test echo 2. ParallelGC -Xms7800M -Xmx7800M time $JAVA_HOME/bin/java -cp . -Xms7800M -Xmx7800M Test It's weird that setting heap size explicitly to the default value, it takes 3 times less in "real time" than running without setting Xms, and Xmx, that's in theory would use the default MaxHeapSize. Moving to JDK for further investigation.
14-02-2025