JDK-8191016 : C2 optimization causing longer delay in loops
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 8,9.0.1,10
  • Priority: P2
  • Status: Closed
  • Resolution: Won't Fix
  • OS: generic
  • CPU: x86_64
  • Submitted: 2017-11-07
  • Updated: 2017-11-12
  • Resolved: 2017-11-12
Description
FULL PRODUCT VERSION :
java version "1.8.0_162-ea"
Java(TM) SE Runtime Environment (build 1.8.0_162-ea-b01)
Java HotSpot(TM) 64-Bit Server VM (build 25.162-b01, mixed mode)

FULL OS VERSION :
Microsoft Windows [Version 10.0.16299.19]

A DESCRIPTION OF THE PROBLEM :
The attached code calls a function 1000000 times and does this in a loop of 5. The first time through the outer loop, the inner loop executes in < 60ms. But all subsequent runs (2-5), take almost 160ms. This is confusing. I thought once the code is JIT'd, it will run fast but in this case, it actually is slower. Not sure what the issue is. But it will good for the runtime team to check out if there is a weird pattern that causes this odd behavior.


class CsTest {
  private int caught = 0;
  private boolean x;
  public final boolean f() { x = false; return !g() || x; }
  public final boolean g() { return h(); }
  public final boolean  h() { if ((caught++ & 1) == 1) {x = true;return true; } return false; }

  public static void main(String[] args) {
    CsTest t = new CsTest();
    for (int j = 0; j < 5; j++) {
      long start = System.currentTimeMillis();
      for (int i  = 0; i < 100000000; i++) t.f();
      System.err.println("Time taken: " + (System.currentTimeMillis() - start) + "ms.");
    }
  }
}

THE PROBLEM WAS REPRODUCIBLE WITH -Xint FLAG: No

THE PROBLEM WAS REPRODUCIBLE WITH -server FLAG: Did not try

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Just compile and run the attached class!

class CsTest {
  private int caught = 0;
  private boolean x;
  public final boolean f() { x = false; return !g() || x; }
  public final boolean g() { return h(); }
  public final boolean  h() { if ((caught++ & 1) == 1) {x = true;return true; } return false; }

  public static void main(String[] args) {
    CsTest t = new CsTest();
    for (int j = 0; j < 5; j++) {
      long start = System.currentTimeMillis();
      for (int i  = 0; i < 100000000; i++) t.f();
      System.err.println("Time taken: " + (System.currentTimeMillis() - start) + "ms.");
    }
  }
}

EXPECTED VERSUS ACTUAL BEHAVIOR :
Expect the numbers printed (time taken) in  lines 2-5 to be <= the one on line one.

Actual:

Time taken: 62ms.
Time taken: 160ms.
Time taken: 156ms.
Time taken: 155ms.
Time taken: 162ms.
REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
Just compile and run the attached class!

class CsTest {
  private int caught = 0;
  private boolean x;
  public final boolean f() { x = false; return !g() || x; }
  public final boolean g() { return h(); }
  public final boolean  h() { if ((caught++ & 1) == 1) {x = true;return true; } return false; }

  public static void main(String[] args) {
    CsTest t = new CsTest();
    for (int j = 0; j < 5; j++) {
      long start = System.currentTimeMillis();
      for (int i  = 0; i < 100000000; i++) t.f();
      System.err.println("Time taken: " + (System.currentTimeMillis() - start) + "ms.");
    }
  }
}
---------- END SOURCE ----------


Comments
I agree with closing this bug as corner case. We are moving to use Graal as JIT compiler in a future and it generates reasonable code in this case. Original test performance with different compilers and jdk 10. Code generated by C1 compiler: $JAVA_HOME/bin/java -XX:TieredStopAtLevel=1 -XX:+TieredCompilation CsTest Time taken: 142ms. Time taken: 138ms. Time taken: 140ms. Time taken: 148ms. Time taken: 152ms. C2: $JAVA_HOME/bin/java -XX:TieredStopAtLevel=4 -XX:-TieredCompilation CsTest Time taken: 54ms. Time taken: 144ms. Time taken: 142ms. Time taken: 142ms. Time taken: 142ms. Tiered: $JAVA_HOME/bin/java -XX:TieredStopAtLevel=4 -XX:+TieredCompilation CsTest Time taken: 53ms. Time taken: 144ms. Time taken: 145ms. Time taken: 144ms. Time taken: 143ms. Tiered Graal (default mode): $JAVA_HOME/bin/java -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -XX:+UseJVMCICompiler -XX:+TieredCompilation CsTest Time taken: 586ms. Time taken: 170ms. Time taken: 124ms. Time taken: 68ms. Time taken: 68ms.
11-11-2017

Actually the generated loop should be empty because result of f() is not used. Unfortunately it is not easy with OSR compilation and we can't compile main() other way. There is very simple change - move inner loop into separate method to compile without OSR: static void test(CsTest t) { for (int i = 0; i < 100000000; i++) t.f(); } and you get: java -XX:-TieredCompilation CsTest Time taken: 66ms. Time taken: 74ms. Time taken: 41ms. Time taken: 41ms. Time taken: 40ms.
09-11-2017

This is expected behavior of the compilation system. Closing as "Not an Issue".
09-11-2017

The reporter wrote: "... I thought once the code is JIT'd, it will run fast but in this case, it actually is slower" In general, the code generated by the JIT will run "fast" but there is no guarantee that there aren't any drops in performance. In fact, C2 performs optimistic optimizations based on certain assumptions that may not hold during the programs execution and as a result we need to back up and deoptimize. The method will be re-compiled but our initial assumptions do not hold anymore and as a result performance may be worse. In this case, C2 optimistically assumes that the inner loop runs forever (because profiling determined that "i" is always smaller than 100000000) and adds an uncommon trap to verify this. The inner loop is heavily unrolled, resulting in peak performance. However, as soon as we hit the loop limit, we deoptimize and re-compile. The re-compiled version of the inner loop is not unrolled because the loop is not treated as counted loop anymore, resulting in lower performance. Here's the log that shows this: 567 31 3 CsTest::f (25 bytes) 568 32 3 CsTest::g (5 bytes) 568 34 4 CsTest::f (25 bytes) 568 33 2 CsTest::h (26 bytes) 569 31 3 CsTest::f (25 bytes) made not entrant 570 35 % 3 CsTest::main @ 22 (63 bytes) 572 36 % 4 CsTest::main @ 22 (63 bytes) 578 35 % 3 CsTest::main @ 22 (63 bytes) made not entrant Deoptimizing thread 0x00007f4a5801b800 DEOPT UNPACKING thread 0x00007f4a5801b800 vframeArray 0x00007f4a58459d50 mode 0 {method} {0x00007f4a116cf320} 'main' '([Ljava/lang/String;)V' in 'CsTest' - goto @ bci 37 sp = 0x00007f4a62812908 578 37 4 CsTest::main (63 bytes) Uncommon trap bci=26 pc=0x00007f4a50ad19a4, relative_pc=0x00000000000002e4, method=CsTest.main([Ljava/lang/String;)V, debug_id=0 Uncommon trap occurred in CsTest::main compiler=c2 compile_id=36 (@0x00007f4a50ad19a4) thread=11129 reason=unstable_if action=reinterpret unloaded_class_index=-1 debug_id=0 627 36 % 4 CsTest::main @ 22 (63 bytes) made not entrant DEOPT UNPACKING thread 0x00007f4a5801b800 vframeArray 0x00007f4a5845d890 mode 2 {method} {0x00007f4a116cf320} 'main' '([Ljava/lang/String;)V' in 'CsTest' - if_icmpge @ bci 26 sp = 0x00007f4a628128f8 Time taken: 60ms. 710 78 % 3 CsTest::main @ 22 (63 bytes) 716 79 % 4 CsTest::main @ 22 (63 bytes) 726 78 % 3 CsTest::main @ 22 (63 bytes) made not entrant Deoptimizing thread 0x00007f4a5801b800 DEOPT UNPACKING thread 0x00007f4a5801b800 vframeArray 0x00007f4a584913a0 mode 0 {method} {0x00007f4a116cf320} 'main' '([Ljava/lang/String;)V' in 'CsTest' - goto @ bci 37 sp = 0x00007f4a62812908 Time taken: 170ms. Time taken: 157ms. Time taken: 155ms. Time taken: 154ms.
09-11-2017

This is a C2 issue, i could confirm with below observations //Interpreter mode - No issue observed jdk/8u162/ea/b04/binaries/linux-x64/bin/java -Xint CsTest Time taken: 26987ms. Time taken: 26633ms. Time taken: 26484ms. Time taken: 26675ms. Time taken: 26898ms. //running with C1- No issue observed jdk/8u162/ea/b04/binaries/linux-x64/bin/java -XX:+TieredCompilation -XX:TieredStopAtLevel=1 CsTest Time taken: 228ms. Time taken: 218ms. Time taken: 217ms. Time taken: 217ms. Time taken: 237ms. //running with both C1+C2 default (-XX:+TieredCompilation) - second loop onwards takes 2X times more compared to first one jdk/8u162/ea/b04/binaries/linux-x64/bin/java CsTest Time taken: 128ms. Time taken: 248ms. Time taken: 242ms. Time taken: 230ms. Time taken: 230ms. //running with only C2 - second loop onwards takes 3X times more compared to first one. jdk/8u162/ea/b04/binaries/linux-x64/bin/java -XX:-TieredCompilation CsTest Time taken: 83ms. Time taken: 238ms. Time taken: 230ms. Time taken: 247ms. Time taken: 229ms. This issue is observed on latest version of 8 9 and 10 8u162 - Fail 9.0.1 GA - Fail 10 ea b26 - Fail
09-11-2017