JDK-8312011 : C2: Runtime increase for a double while loop code
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 19,20,21,22
  • Priority: P4
  • Status: In Progress
  • Resolution: Unresolved
  • OS: generic
  • CPU: generic
  • Submitted: 2023-07-07
  • Updated: 2025-05-27
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdUnresolved
Related Reports
Causes :  
Description
ADDITIONAL SYSTEM INFORMATION :
Java 17.0.7 and Java 21.ea.29 on Linux, tested on a Lenovo T14s and Google Cloud c2-computer-8

A DESCRIPTION OF THE PROBLEM :
Significant performance change for a simple parser implementation. The implementation itself already challenges JDK 17 and leads to strange runtime behavior, but when going to 21, it seems to produce even more elevated runtimes.

It is about the runtimes of B03a (70 --> 100 ns) and B05f (1,000 --> 1,837 ns)

REGRESSION : Last worked in version 17.0.7

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
See this public repo and execute JMH benchmarks org.xceptance.B03* and org.xceptance.B05f_QuotedWarmupAndUnquotedTest

There are other test cases, which belong to another defect that will be opened soon.

https://github.com/rschwietzke/jmh-C2-compile

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
JDK 17.0.7 provides these runtimes: 

B03a: 70 ns
B03b: 447 ns
B03c: 669 ns
B05f: 1,000 ns
ACTUAL -
JDK 21.ea.20 provides these runtimes: 

B03a: 100 ns
B03b: 430 ns
B03c: 690 ns
B05f: 1,837 ns

---------- BEGIN SOURCE ----------
https://github.com/rschwietzke/jmh-C2-compile

https://github.com/rschwietzke/jmh-C2-compile/blob/main/src/main/java/org/xceptance/B03a_ShortWarmupAndTest.java
https://github.com/rschwietzke/jmh-C2-compile/blob/main/src/main/java/org/xceptance/B03b_UnquotedWarmupAndTest.java
https://github.com/rschwietzke/jmh-C2-compile/blob/main/src/main/java/org/xceptance/B03c_QuotedWarmupAndTest.java
https://github.com/rschwietzke/jmh-C2-compile/blob/main/src/main/java/org/xceptance/B05f_QuotedWarmupAndUnquotedTest.java
---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
Rewrite the code under tests change the runtime behavior.

FREQUENCY : always



Comments
I've bisected it to JDK-8279888, both in performance, and in the trace mentioned by [~chagedorn].
16-05-2025

ILW = Performance regression in some JMH tests due to applying different loop opts, probably edge case, no known workaround = MLH = P4
14-07-2023

Performance drop is noticed first with JDK 19 while JDK 18 is faster: JDK 18: Benchmark Mode Cnt Score Error Units B03a_ShortWarmupAndTest.parse avgt 10 90.274 ± 2.857 ns/op JDK 19: Benchmark Mode Cnt Score Error Units B03a_ShortWarmupAndTest.parse avgt 10 97.998 ± 2.959 ns/op When looking at applied optimization in C2, we see that in JDK 18, it first performs Parallel IV and then it can hoist a range check and do unrolling: $ java18 -XX:+TraceLoopOpts -Xbatch -XX:CompileCommand=compileonly,*Csv*::parse -XX:CompileCommand=dontinline,*Blackhole::consume -XX:CompileCommand=dontinline,com.xceptance.common.util.SimpleArrayList::clear -jar target/benchmarks.jar -i 3 -wi 5 -f 0 org.xceptance.B03a_ShortWarmupAndTest.parse Output of -XX:+TraceLoopOpts: Parallel IV: 180 Loop: N1280/N937 limit_check profile_predicated predicated counted [int,int),+1 (-1 iters) has_sfpt strip_mined Loop: N0/N0 has_sfpt Loop: N1270/N902 limit_check profile_predicated predicated sfpts={ 902 } Loop: N1279/N1278 limit_check profile_predicated predicated Loop: N1280/N937 limit_check profile_predicated predicated counted [int,int),+1 (-1 iters) has_sfpt strip_mined Loop: N0/N0 has_sfpt Loop: N1270/N902 limit_check profile_predicated predicated sfpts={ 902 } Loop: N1279/N1278 limit_check profile_predicated predicated sfpts={ 1281 } Loop: N1280/N937 limit_check profile_predicated predicated counted [int,int),+1 (-1 iters) has_sfpt strip_mined Predicate IC Loop: N1280/N937 limit_check profile_predicated predicated counted [int,int),+1 (5 iters) has_sfpt rce strip_mined Predicate RC Loop: N1280/N937 limit_check profile_predicated predicated counted [int,int),+1 (5 iters) has_sfpt rce strip_mined Predicate IC Loop: N1270/N902 limit_check profile_predicated predicated sfpts={ 902 } Loop: N0/N0 has_sfpt Loop: N1270/N902 limit_check profile_predicated predicated sfpts={ 902 } Loop: N1279/N1278 limit_check profile_predicated predicated sfpts={ 1281 } Loop: N1280/N937 limit_check profile_predicated predicated counted [int,int),+1 (5 iters) has_sfpt strip_mined PreMainPost Loop: N1280/N937 limit_check profile_predicated predicated counted [int,int),+1 (5 iters) has_sfpt strip_mined Unroll 2 Loop: N1280/N937 counted [int,int),+1 (5 iters) main has_sfpt strip_mined Loop: N0/N0 has_sfpt Loop: N1270/N902 limit_check profile_predicated predicated sfpts={ 902 } Loop: N1444/N1452 limit_check profile_predicated predicated counted [int,int),+1 (4 iters) pre has_sfpt Loop: N1279/N1278 sfpts={ 1281 } Loop: N1594/N937 counted [int,int),+2 (5 iters) main has_sfpt strip_mined Loop: N1387/N1395 counted [int,int),+1 (4 iters) post has_sfpt Unroll 4 Loop: N1594/N937 counted [int,int),+2 (5 iters) main has_sfpt strip_mined While in JDK 19, it is not able to perform Parallel IV and cannot apply any loop opts: $ java19 -XX:+TraceLoopOpts -Xbatch -XX:CompileCommand=compileonly,*Csv*::parse -XX:CompileCommand=dontinline,*Blackhole::consume -XX:CompileCommand=dontinline,com.xceptance.common.util.SimpleArrayList::clear -jar target/benchmarks.jar -i 3 -wi 5 -f 0 org.xceptance.B03a_ShortWarmupAndTest.parse Output of -XX:+TraceLoopOpts: Loop: N0/N0 has_sfpt Loop: N1230/N848 limit_check profile_predicated predicated counted [0,int),+1 (-1 iters) sfpts={ 831 804 } Loop: N0/N0 has_sfpt Loop: N1230/N848 limit_check profile_predicated predicated counted [0,int),+1 (-1 iters) sfpts={ 831 804 } SplitIf SplitIf Loop: N0/N0 has_sfpt Loop: N1230/N848 limit_check profile_predicated predicated counted [0,int),+1 (-1 iters) sfpts={ 831 804 } Predicate IC Loop: N1230/N848 limit_check profile_predicated predicated counted [0,int),+1 (47 iters) sfpts={ 831 804 } Loop: N0/N0 has_sfpt Loop: N1230/N848 limit_check profile_predicated predicated counted [0,int),+1 (47 iters) sfpts={ 831 804 } Loop: N0/N0 has_sfpt Loop: N1230/N848 limit_check profile_predicated predicated counted [0,int),+1 (47 iters) sfpts={ 831 804 } PredicatesOff The missing application of these additional loop opts could explain why we are seeing this regression. We would need to find out why starting with JDK 19, we are not able to perform them anymore.
14-07-2023

Issue is reproduced. Significant performance change from JDK 17 to JDK 20 and JDK 21 is observed. OS: windows 10 =================================================== # VM version: JDK 17.0.7, Java HotSpot(TM) 64-Bit Server VM, 17.0.7+8-LTS-224 # Benchmark: org.xceptance.B05f_QuotedWarmupAndUnquotedTest.parse # Warmup Iteration 1: 916.708 ns/op # Warmup Iteration 2: 949.242 ns/op # Warmup Iteration 3: 915.266 ns/op Iteration 1: 1363.880 ns/op Iteration 2: 1363.845 ns/op Iteration 3: 1362.241 ns/op # VM version: JDK 20.0.1, Java HotSpot(TM) 64-Bit Server VM, 20.0.1+9-29 # Benchmark: org.xceptance.B05f_QuotedWarmupAndUnquotedTest.parse # Warmup Iteration 1: 886.903 ns/op # Warmup Iteration 2: 845.185 ns/op # Warmup Iteration 3: 861.557 ns/op Iteration 1: 2179.506 ns/op Iteration 2: 2188.361 ns/op Iteration 3: 2145.915 ns/op ====================================================== # VM version: JDK 17.0.7, Java HotSpot(TM) 64-Bit Server VM, 17.0.7+8-LTS-224 # Benchmark: org.xceptance.B03a_ShortWarmupAndTest.parse # Warmup Iteration 1: 126.419 ns/op # Warmup Iteration 2: 111.357 ns/op # Warmup Iteration 3: 108.033 ns/op Iteration 1: 105.403 ns/op Iteration 2: 107.246 ns/op Iteration 3: 111.811 ns/op # VM version: JDK 20.0.1, Java HotSpot(TM) 64-Bit Server VM, 20.0.1+9-29 # Benchmark: org.xceptance.B03a_ShortWarmupAndTest.parse # Warmup Iteration 1: 173.503 ns/op # Warmup Iteration 2: 167.434 ns/op # Warmup Iteration 3: 166.300 ns/op Iteration 1: 163.346 ns/op Iteration 2: 163.038 ns/op Iteration 3: 163.097 ns/op ========================================================== # VM version: JDK 17.0.7, Java HotSpot(TM) 64-Bit Server VM, 17.0.7+8-LTS-224 # Benchmark: org.xceptance.B03b_UnquotedWarmupAndTest.parse # Warmup Iteration 1: 583.004 ns/op # Warmup Iteration 2: 487.016 ns/op # Warmup Iteration 3: 480.868 ns/op Iteration 1: 481.986 ns/op Iteration 2: 479.383 ns/op Iteration 3: 480.579 ns/op # VM version: JDK 20.0.1, Java HotSpot(TM) 64-Bit Server VM, 20.0.1+9-29 # Benchmark: org.xceptance.B03b_UnquotedWarmupAndTest.parse # Warmup Iteration 1: 650.866 ns/op # Warmup Iteration 2: 614.327 ns/op # Warmup Iteration 3: 593.359 ns/op Iteration 1: 597.822 ns/op Iteration 2: 595.568 ns/op Iteration 3: 607.043 ns/op ================================================= # VM version: JDK 17.0.7, Java HotSpot(TM) 64-Bit Server VM, 17.0.7+8-LTS-224 # Benchmark: org.xceptance.B03c_QuotedWarmupAndTest.parse # Warmup Iteration 1: 993.691 ns/op # Warmup Iteration 2: 925.224 ns/op # Warmup Iteration 3: 909.576 ns/op Iteration 1: 926.479 ns/op Iteration 2: 972.614 ns/op Iteration 3: 918.208 ns/op # VM version: JDK 20.0.1, Java HotSpot(TM) 64-Bit Server VM, 20.0.1+9-29 # Benchmark: org.xceptance.B03c_QuotedWarmupAndTest.parse # Warmup Iteration 1: 891.051 ns/op # Warmup Iteration 2: 889.083 ns/op # Warmup Iteration 3: 869.117 ns/op Iteration 1: 962.605 ns/op Iteration 2: 863.993 ns/op Iteration 3: 881.668 ns/op ILW = issue in GA build, reproducible with single test , no workaround available = MLM = P4 Moving it to dev team for further analysis.
13-07-2023