JDK-8322479 : Regression in SPECjvm2008-MonteCarlo-ParGC on Linux-x64
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 21.0.2,22,23
  • Priority: P3
  • Status: Closed
  • Resolution: Duplicate
  • OS: linux
  • CPU: x86_64
  • Submitted: 2023-12-19
  • Updated: 2024-04-02
  • Resolved: 2024-01-08
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 23
23Resolved
Related Reports
Duplicate :  
Relates :  
Description
Integration of JDK-8318562 into 22-b25 caused a ~ 2% regression in SPECjvm2008-MonteCarlo-ParGC on  Linux-x64. The regression also shows up 21.0.2-b9.

The regression also exhibits at a larger size (5-7%) with G1 and ZGC, but with more noise in the data-set. 

SPECjvm2008 options:
      scimark.monte_carlo -ikv

Java options:
      -server -XX:+UseParallelGC -XX:-PrintWarnings -XX:+UseLargePages  

Regression was isolated by running CI builds for 22-b25 (graph attached). The performance drop shown is from jdk-22+25-1953 to jdk-22+25-1954 - putting the regression introduction in jdk-22+25-1954. That CI build contains only the change-set for JDK-8318562.
Comments
Thanks a lot @ecaspole for running these benchmarks.
02-04-2024

I can generally repro the thread count results as Sandhya has shown above, in our own benchmark lab. For example on our x64 platforms: SPECjvm2008-MonteCarlo-ParGC with 4 threads: platform, basline, patch ---------------------------------- linux_x64_oci_server, 262.24, 371.55 macosx_x64, 353.25, 417.59 windows_x64_oci_server, 315.23, 447.44 SPECjvm2008-MonteCarlo-ParGC with 12 threads: linux_x64_oci_server, 788.06, 1103.68 macosx_x64, 506.63, 582.39 windows_x64_oci_server, 944.31, 1336.46 There is still the regression with the fully loaded system linux_x64_oci_server, 1717.20, 1585.96 But since every other configuration shows improvement, this change looks good.
02-04-2024

Using the fix submitted in REDO JDK-8323116, the performance regression reported for SPECjvm2008 MonteCarlo is gone as seen from the data below using a TigerLake Linux machine. Threads Base(ops/m) CVT (ops/m) 1 96.46 123.83 4 335.16 402.44 8 562.79 726.56 12 685.67 748.66 16 793.26 795.55
28-03-2024

Thanks for the analysis, [~sviswanathan]. Since the problematic fix was backed out by JDK-8322985 due to other issues, let's address this performance issue with the REDO JDK-8323116.
08-01-2024

I did a set of runs on my Rocketlake desktop (8 cores, 16 threads) and below are my findings: Java command: java -server -XX:+UseParallelGC -XX:-PrintWarnings -XX:+UseLargePages -jar SPECjvm2008.jar -ikv -ict -wt 30 -it 60 -bt num_threads scimark.monte_carlo with num_threads = 1,4,8,12,16 Performance (ops/m, higher is better): num_threads: 1 4 8 12 16 Base ops/m: 102.69 381.50 580.42 742.31 850.82 With 8318562 ops/m: 154.64 546.95 849.54 844.50 844.14 Where Base is the JVM built with sources just prior to 8318562 integration. The 8318562 optimization is a simple code gen change: Base: vcvtsi2sd xmm2, xmm2, r10d With optimization: vxorpd xmm2, xmm2, xmm2 vcvtsi2sd xmm2, xmm2, r10d As you can see in the perf table above, with benchmark threads (num_threads) set as 1, we see significant perf improvement with 8318562 optimizations enabled. The perf improvement over base continues till num_threads = 8 and then saturates. This indicates that the codegen change is correct and beneficial. At 16 threads I see that the performance is sometimes lower, sometimes similar, and sometimes better vs base. I think what is being pointed out in this bug report is that the statistical trend shows the performance with optimizations is slightly lower than without and it varies with GC type. It is hard to root cause the drop. The codegen change being so small and local that it should not impact the GC behavior or be impacted by it.
04-01-2024

Sandhya, could you please have a look? Thanks!
03-01-2024

ILW = Minor performance regression, on Linux x64 with single benchmark, no known workaround = MMH = P3
03-01-2024