JDK-8281565 : Regression 5-7% in SPECjvm2008-Compress-G1 on x64 Linux and Mac in 19-b8
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 19
  • Priority: P3
  • Status: Closed
  • Resolution: Won't Fix
  • CPU: x86
  • Submitted: 2022-02-10
  • Updated: 2022-05-06
  • Resolved: 2022-05-06
Related Reports
Relates :  
Relates :  
Relates :  
Description
Triage experiments show this is related to JDK-8278518.
Comments
OK we see that the performance regression reported here was coincidentally fixed by JDK-8279888, but Roland has identified the real problem here as a day 1 RA issue in C2. So we accept to close this as WNF.
06-05-2022

I looked at the code generated with JDK-8279888 and register allocation of a common loop is better. I also have an experimental patch that tries to improve register allocation in loops and when I run it without JDK-8279888, the performance regression goes away as well. This all seems to indicate a register allocation issue for which there's no simple fix. With JDK-8279888 the regression is gone (which I believe is pure luck). So I'd like to close this as WNF. [~ecaspole] would you be ok with that?
04-05-2022

Trend report looks like performance recovered in JDK 19 b20 (when JDK-8279888 was integrated).
04-05-2022

Roland's branch https://github.com/rwestrel/jdk/tree/JDK-8279888 seems to recover the performance back to the 19-b7 level.
09-03-2022

The regression has stayed consistent from 19-b8 to b12, and also in the nightly. So it seems like more than bad luck in code layout. We discussed this JBS in compiler staff today, and agreed to re open it. Let me know if there any specific runs to help find the problem. FYI these benchmarks are run on OL 8 on the OCI bm 03.36 bare metal platform with IceLake cores.
08-03-2022

I agree with Roland, let's close this as Won't Fix.
01-03-2022

Compressor::compress has several loops: first a counted loop that's not where time is spent then a loop nest with an inner loop which, according to profiling, is executed the largest number of times and an outer loop with 3 different backedges. That outer loop is the one affected by JDK-8278518. I went over the high level optimizations applied to the method with and without JDK-8278518 and I don't see anything wrong (there are few opportunities for optimizations). The code for the inner loop is different in the 2 cases (even though the IR after optimizations is the same). I see extra spills and an extra branch for the code with JDK-8278518. Hard to tell if that explains the regression but with JDK-8278518 we seem to be unlucky with the generated code. My recommendation is to close this as WNF.
17-02-2022

The method that's affected by JDK-8278518 is Compressor::compress
16-02-2022

Here is what I measure: current head: Compress.test thrpt 15 1110.578 ± 5.302 ops/min with JDK-8278518 backed out: Compress.test thrpt 15 1135.486 ± 9.209 ops/min I also tried with the patch for 8279888 (https://github.com/openjdk/jdk/pull/7352) on top of current head and get: Compress.test thrpt 15 1134.008 ± 8.911 ops/min [~ecaspole] Could you give the patch of 8279888 a try?
16-02-2022

In these runs we use only these options: -server -XX:+UseG1GC -XX:-PrintWarnings -XX:+UseLargePages compress -ikv It ran on an Ice Lake O3.36 OCI server.
10-02-2022

ILW = smallish perf regression, one benchmark, 2 platforms = MMH = P3
10-02-2022