JDK-8220374 : C2: LoopStripMining doesn't strip as expected
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 11,12,13
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2019-03-08
  • Updated: 2020-08-10
  • Resolved: 2019-03-14
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11 JDK 12 JDK 13
11.0.4Fixed 12.0.2Fixed 13 b13Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Description
During work on JDK-8219584, I noticed that we still get safepoint timeouts when running the simple loop below with -XX:+UseCountedLoopSafepoints -XX:LoopStripMiningIter=1000.
OptoAssembly shows that LoopStripMining has stripped the loop, but uses the same loop exit condition for both, the inner and outer loop:
03b     cmpl    R11, #2147483647
04e     cmpl    R11, #2147483647
So the inner loop runs through the full int range without safepoint check leading to a timeout.


TestLoop.java:
public class TestLoop{

  public static int test_loop(int x) {
      int sum = 0;
      if (x != 0) {
          for (int y = 1; y < Integer.MAX_VALUE; ++y) {
              if (y % x == 0) ++sum;
          }
      }
      return sum;
  }

  public static void main(String args[]) {
    int sum = test_loop(3);
    System.out.println("sum: " + sum);
  }
}

Full command line:
jdk-jdk-fastdebug/images/jdk/bin/java -XX:+UnlockDiagnosticVMOptions -XX:+SafepointTimeout -XX:+SafepointALot -XX:+AbortVMOnSafepointTimeout -XX:SafepointTimeoutDelay=500 -XX:GuaranteedSafepointInterval=500 -XX:+PrintOptoAssembly -XX:-TieredCompilation -XX:+UseCountedLoopSafepoints -XX:LoopStripMiningIter=1000 -XX:LoopUnrollLimit=0 -XX:CompileCommand=compileonly,TestLoop::test_loop -Xcomp TestLoop

Full OptoAssembly:
000   B1: #     B8 B2 <- BLOCK HEAD IS JUNK   Freq: 1
000     # stack bang (136 bytes)
        pushq   rbp     # Save rbp
        subq    rsp, #16        # Create frame

00c     testl   RSI, RSI
00e     je,s   B8  P=0,100000 C=-1,000000
00e
010   B2: #     B3 <- B1  Freq: 0,9
010     # TLS is in R15
010     xorl    R8, R8  # int
013     movl    R11, #1 # int
        nop     # 7 bytes pad for loops and calls

020   B3: #     B7 B4 <- B2 B5 B4       Loop: B3-B4 inner  strip mined Freq: 90
020     movl    RAX, R11        # spill
023     cmpl    rax, 0x80000000 # irem
        jne,s   normal
        xorl    rdx, rdx
        cmpl    RSI, -1
        je,s    done
normal: cdql
        idivl   RSI
done:
034     testl   RDX, RDX
036     je,s   B7  P=0,100000 C=-1,000000
036
038   B4: #     B3 B5 <- B7 B3  Freq: 90
038     incl    R11     # int
03b     cmpl    R11, #2147483647
042     jl,s   B3       # loop end  P=0,900000 C=-1,000000
042
044   B5: #     B3 B6 <- B4  Freq: 9
044     movq    R10, [R15 + #296 (32-bit)]      # ptr
04b     testl  rax, [R10]       # Safepoint: poll for GC        # TestLoop::test_loop @ bci:26  L[0]=RSI L[1]=R8 L[2]=R11
        # OopMap{off=75}
04e     cmpl    R11, #2147483647
055     jl,s   B3  P=0,900000 C=-1,000000
055
057   B6: #     N1 <- B5 B8  Freq: 1
057     movl    RAX, R8 # spill
05a     addq    rsp, 16 # Destroy frame
        popq   rbp
        movq   rscratch1, poll_offset[r15_thread] #polling_page_address
        testl  rax, [rscratch1] # Safepoint: poll for GC

069     ret
069
06a   B7: #     B4 <- B3  Freq: 9
06a     incl    R8      # int
06d     jmp,s   B4
06d
06f   B8: #     B6 <- B1  Freq: 0,1
06f     xorl    R8, R8  # int
072     jmp,s   B6

Comments
Fix Request This enables the accidentally disabled LoopStripMining (LSM), the latency-improvement optimization that is needed for the low-latency garbage collectors. LSM is opt-in by collectors, and currently G1, ZGC and Shenandoah enable it. Patch applies cleanly to 11u and 12u, passes the new regression tests (fails it without the product patch), passes tier1 and tier2. I spot-checked TTSP benchmarks, and they are improving significantly, as one would expect from working LSM. SPECjvm2008 does not seem to have statistically significant regressions with G1 (default collector).
28-03-2019

We are targeting to have this backported to 12u and 11u, after the change gets enough testing in jdk/jdk.
21-03-2019

ILW = Safepoint is not reached (in time) because loop strip mining optimization fails (no safepoint check), regression test with -XX:+UseCountedLoopSafepoints -XX:LoopStripMiningIter=1000, no workaround = MMH = P3
12-03-2019