JDK-8218721 : C1's CEE optimization produces safepoint poll with invalid debug information
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 8u201,11,11.0.2,12,13
  • Priority: P2
  • Status: Closed
  • Resolution: Fixed
  • OS: generic
  • CPU: generic
  • Submitted: 2019-01-29
  • Updated: 2019-10-04
  • Resolved: 2019-02-19
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11 JDK 13 JDK 8 Other
11.0.5-oracleFixed 13 b09Fixed 8u231Fixed openjdk8u232Fixed
Related Reports
Duplicate :  
Description
ADDITIONAL SYSTEM INFORMATION :
Primarily observed on:
macOS High Sierra 10.13.6 (17G4015)
MacBook Pro (Retina, 13-inch, Early 2015)
Intel(R) Core(TM) i7-5557U CPU @ 3.10GHz
8GB RAM

java version "1.8.0_201"
Java(TM) SE Runtime Environment (build 1.8.0_201-b09)
Java HotSpot(TM) 64-Bit Server VM (build 25.201-b09, mixed mode)

java version "9.0.1"
Java(TM) SE Runtime Environment (build 9.0.1+11)
Java HotSpot(TM) 64-Bit Server VM (build 9.0.1+11, mixed mode)

openjdk 11.0.2 2019-01-15
OpenJDK Runtime Environment 18.9 (build 11.0.2+9)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.2+9, mixed mode)

openjdk version "12-ea" 2019-03-19
OpenJDK Runtime Environment (build 12-ea+29)
OpenJDK 64-Bit Server VM (build 12-ea+29, mixed mode, sharing)


A DESCRIPTION OF THE PROBLEM :
A Java process occasionally terminates with Illegal Instruction (SIGILL), exit code 132 when running obfuscated code. We originally observed this issue on Windows 7 with Java 8, but most of our testing of the issue has been on macOS with Java 8. The issue appears in Java 8-12 (have not tested 13). It has not been observed in Java 7. Even examining the .crash file the core file from macOS, it is not clear exactly why the termination happens (It happens on thread zero, but that appears to be as part of an error handler).

Testing suggests that available computational resources may have an effect on this: this happens on Mac Book Pro with 2 cores, but not a Mac Book Pro with 4 cores (but does happen on a 2-core VM on the 4-core mac). Some experimentation with Java 8 suggested that the issue happened when compilation was queued but not when the compilation happened immediately.

Running Java with -Xint the issue does not happen. JIT compilation of a particular method, RandomizedIdentifierGenerator.b(), appears to affect the behavior. This method calls, in addition to other methods, n.e() and n.h(). Excluding b() from JIT compilation prevents the error from happening. Alternatively, preventing both e() and h() from being inlined into b() also prevents the issue from happening.

Reviewing the assembly produced when a run terminates unexpectedly, sometimes bad instruction emitted (this is the level of detail of the assembly provided by hsdis-amd64.dylib):

  0x000000011ad807bd: data16 xchg %ax,%ax
  0x000000011ad807c0: jmpq   0x000000011ad80f90  ;   {no_reloc}
  0x000000011ad807c5: add    %al,(%rax)
  0x000000011ad807c7: add    %al,(%rax)
  0x000000011ad807c9: add    %ch,%cl
  0x000000011ad807cb: lret   
  0x000000011ad807cc: (bad)  
  0x000000011ad807cd: add    %al,(%rax)         ;*new {reexecute=0 rethrow=0 return_oop=0}
                                                ; - java.lang.StringLatin1::charAt@10 (line 47)
                                                ; - java.lang.String::charAt@12 (line 693)
                                                ; - java.lang.Integer::parseInt@184 (line 650)

In testing the larger, original program, the issue would happen roughly 1% - 10% of the time. Unfortunately the frequency with the smaller reproduction is less, roughly 0.03% - 0.5%.


STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
The following is a link to a zip file that contains:
    reproduction.jar - obfuscated jar that reproduces the issue
    reproduction.sh - script to run project, record the results, and reports progress to the terminal
    short.pattern - file read by the project

    run-with-assembly.log - log of the failure with disassembly being recorded
    java_2019-01-29-160606_Mikes-MacBook-Pro-3.crash - macOS crash file

http://files.preemptive.com.s3.amazonaws.com/Support/jit_bug/reprod.zip?AWSAccessKeyId=AKIAJMVMID2GL2JYZZ6Q&Expires=1580940869&Signature=moVY4HPbV%2fgSph2fzseVbHHN7RI%3d

The issue can be observed by running 10k iterations of the reproduction project, which takes about 40 minutes:

    ./reproduction.sh

The test script provided defaults to running 10k iterations. The reproduction project generates a random sequence of identifiers (that is discarded), occasionally resetting internal structures. The included short.pattern file is a record of the number of identifiers generated before each reset. This file is read by the reproduction project, effectively replaying the function call sequences that lead to the issue in testing.

run-with-assembly.log is a record of the disassembly of compiled methods including instructions hsdis-amd64.dylib listed as "(bad)". java_2019-01-29-160606_Mikes-MacBook-Pro-3.crash, the macOS crash file, is included for this same run.


EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Java would run and then terminate with a zero return code every time.
ACTUAL -
Java terminates with exit code 132 on rare occasion.

---------- BEGIN SOURCE ----------
I do not know of a reproduction directly from source code.
---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
This issue can be worked around by instructing the JIT compiler to exclude b() from compilation.

FREQUENCY : rarely



Comments
Fix Request (8u) This fixes C1 miscompilation and keeps codebases in sync (I see 8u231). The patch applies with reshuffling to 8u. New testcase fails without product fix, and passes with it. Additionally, patched build passes tier1-like tests.
04-07-2019

I added the jdk11u-fix-request label which seemed to be forgotten...?
22-06-2019

Fix Request (11u) This fixes C1 miscompilation and keeps codebases in sync (I see 11.0.5-oracle). The patch applies cleanly to 11u. New testcase fails without product fix, and passes with it. Additionally, patched build passes tier1 and tier2 tests.
19-06-2019

The problem is that C1's Conditional Expression Elimination (CEE) replaces the if in block B0 by a safepoint goto to block B3: B0 (SV) [0, 10] -> B2 B1 sux: B2 B1 pred: B4 empty stack inlining depth 0 __bci__use__tid____instr____________________________________ . 1 0 i5 a4._12 (I) f1 . 5 0 i6 a4._16 (I) f2 8 0 i7 1 9 0 i8 i6 - i7 . 10 0 9 if i5 > i8 then B2 else B1 B1 (V) [13, 14] -> B3 sux: B3 pred: B0 empty stack inlining depth 0 __bci__use__tid____instr____________________________________ 13 0 i10 1 . 14 0 11 goto B3 stack [0:i10] B2 (V) [15, 16] -> B3 sux: B3 pred: B0 empty stack inlining depth 0 __bci__use__tid____instr____________________________________ 15 0 i12 0 . 16 0 13 goto B3 (safepoint) stack [0:i12] B3 (V) [14, 14] pred: B1 B2Stack: 0 i14 [ i10 i12] stack [0:i14] inlining depth 0 __bci__use__tid____instr____________________________________ . 14 0 i15 ireturn i14 The IR after CEE then looks like this: B0 (SV) [0, 10] -> B3 sux: B3 pred: B4 empty stack inlining depth 0 __bci__use__tid____instr____________________________________ . 1 0 i5 a4._12 (I) f1 . 5 0 i6 a4._16 (I) f2 8 0 i7 1 9 0 i8 i6 - i7 10 0 i18 0 10 0 i19 1 10 0 i20 i5 > i8 ? i18 : i19 . 10 0 21 goto B3 (safepoint) stack [0:i20] B3 (V) [14, 14] pred: B0 stack [0:i20] inlining depth 0 __bci__use__tid____instr____________________________________ . 14 0 i15 ireturn i20 The debug information for the goto instruction and therefore for the safepoint poll is wrong. It refers to bci 10 which is the if instruction in the original bytecodes. This if has no safepoint and therefore no valid state_before. We crash after deoptimization because we try to continue execution at the if with an invalid state.
15-02-2019

The Java code is: public int test() { if (field1 <= (field2 - 1)) { return 1; } else { return 0; } } The bytecodes are: 0: aload_0 1: getfield #36 4: aload_0 5: getfield #22 8: iconst_1 9: isub 10: if_icmpgt 15 13: iconst_1 14: ireturn 15: iconst_0 16: goto 14 Javac would not generate a goto at 16 but this instead: 0: aload_0 1: getfield #2 4: aload_0 5: getfield #3 8: iconst_1 9: isub 10: if_icmpgt 15 13: iconst_1 14: ireturn 15: iconst_0 16: ireturn But I can reproduce the problem with jasm: http://cr.openjdk.java.net/~thartmann/8218721/webrev.00/ The root cause is a safepoint poll in C1 compiled code that has invalid debug information which leads to a corrupted stack after deoptimization. Failure modes: # Internal Error (/oracle/jdk_jdk/open/src/hotspot/share/c1/c1_LinearScan.cpp:2382), pid=4912, tid=4924 # assert(stack_end >= -Bytecodes::depth(code)) failed: must have non-empty expression stack at if bytecode # Internal Error (/oracle/jdk_jdk/open/src/hotspot/share/runtime/deoptimization.cpp:766), pid=10106, tid=10107 # guarantee(false) failed: wrong number of expression stack elements during deopt Update ILW = Safepoint in C1 compiled code has wrong debug info which leads to corrupted stack after deopt, easy to reproduce with regression test but should be rare in production due to javac not generating these bytecode patterns, disable C1 compilation of affected method = HMM = P2
14-02-2019

Executing the reproducer with latest JDK 13, I immediately hit: # Internal Error (/oracle/jdk_jdk/open/src/hotspot/share/c1/c1_LinearScan.cpp:2382), pid=4912, tid=4924 # assert(stack_end >= -Bytecodes::depth(code)) failed: must have non-empty expression stack at if bytecode
11-02-2019

We have only reproducer.jar file to reproduce the Segmentation fault, it is not clear what code is generating the illegal instructions. There are other test performed to reproduces Segmentation fault. Running "./reproduction.sh" will execute the test for 10k iterations. 1. Interpreter mode -Xint <= No Segmentation fault 2. running with C1 -XX:+TieredCompilation -XX:TieredStopAtLevel=1 <= No Segmentation fault 3. running with only C2 XX:-TieredCompilation <= No Segmentation fault 4. running with default options <= Segmentation fault
11-02-2019

I could able to reproduce this issue on 13 ea b07 == Using: java version "13-ea" 2019-09-17 Java(TM) SE Runtime Environment (build 13-ea+7) Java HotSpot(TM) 64-Bit Server VM (build 13-ea+7, mixed mode, sharing) JVMARGS= Iteration: 7002 (Fri Feb 8 06:09:52 PST 2019) Iteration: 7003 (Fri Feb 8 06:09:53 PST 2019) Iteration: 7004 (Fri Feb 8 06:09:53 PST 2019) reproduction.sh: line 51: 86315 Segmentation fault (core dumped) java $JVMARGS -jar reproduction.jar short.pattern Iteration: 7005 (Fri Feb 8 06:09:56 PST 2019) Iteration: 7006 (Fri Feb 8 06:09:57 PST 2019) Iteration: 7007 (Fri Feb 8 06:09:57 PST 2019) This has generated Segmentation fault
08-02-2019