JDK-8198909 : [Graal] compiler/codecache/stress/UnexpectedDeoptimizationTest.java crashed with SIGSEGV
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 11
  • Priority: P2
  • Status: Closed
  • Resolution: Fixed
  • Submitted: 2018-03-01
  • Updated: 2019-09-13
  • Resolved: 2018-06-25
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11 JDK 12
11 b20Fixed 12Fixed
Related Reports
Duplicate :  
Relates :  
Relates :  
Description
compiler/codecache/stress/UnexpectedDeoptimizationTest.java executed with Graal as JIT
crashed with SIGSEGV

# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f64928d61c7, pid=3697, tid=3722
#
# JRE version: Java(TM) SE Runtime Environment (11.0) (fastdebug build 11-internal+0-2018-02-28-2224130.ekaterina.pavlova.jdk.hs)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 11-internal+0-2018-02-28-2224130.ekaterina.pavlova.jdk.hs, mixed mode, tiered, jvmci, jvmci compiler, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0xedc1c7]  java_lang_Class::as_Klass(oop)+0xc7
#

---------------  T H R E A D  ---------------

Current thread (0x00007f648c3a3000):  JavaThread "JVMCI CompilerThread1" daemon [_thread_in_vm, id=3722, stack(0x00007f6454549000,0x00007f645464a000)]


Current CompileTask:
JVMCI: 407097 45082 % !   4       compiler.codecache.stress.Helper::callMethod @ 4 (64 bytes)

Stack: [0x00007f6454549000,0x00007f645464a000],  sp=0x00007f64546471b0,  free space=1016k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0xedc1c7]  java_lang_Class::as_Klass(oop)+0xc7
V  [libjvm.so+0x1036eae]  JVM_IsInterface+0x10e
J 2632  java.lang.Class.isInterface()Z java.base@11-internal (0 bytes) @ 0x00007f646eb3768d [0x00007f646eb37560+0x000000000000012d]
j  jdk.vm.ci.hotspot.HotSpotResolvedObjectTypeImpl.isInterface()Z+4 jdk.internal.vm.ci@11-internal

...

Steps to run the test:
> jtreg -vt -jdk:JDK_HS_fastdebug 
-vmoptions:-XX:MaxRAMPercentage=8 -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -XX:+TieredCompilation -XX:+UseJVMCICompiler -Djvmci.Compiler=graal 
compiler/codecache/stress/UnexpectedDeoptimizationTest.java


The failure occurred only once in automatic testing.
I tried to reproduce it but I was not able. Hopefully attached hs_err file will help.


Comments
List of failed jck tests: api/java_util/Arrays/spliterators/index.html#SpliteratorFromDoubleSubArray_Empty api/java_util/WeakHashMap/EntrySetParallelStream.html#EntrySetParallelStream api/java_util/concurrent/atomic/AtomicReferenceArray/AccumulateAndGet.html#AccumulateAndGet
11-06-2018

Should be resolved by JDK-8204231
07-06-2018

Thanks for the heads up. I think this fix will take care of that case as well. Any construction of a HotSpotResolvedObjectTypeImpl will go through the logic to enqueue a reference to the Class and getResolvedJavaType uses that pathway. private HotSpotResolvedObjectTypeImpl getSubklass() { return compilerToVM().getResolvedJavaType(this, config().subklassOffset, false); }
30-05-2018

I believe reading the _subklass field also results in a weak link. There is a new InstanceKlass::holder_phantom() function that ciInstanceKlass is now using to make G1 happy.
30-05-2018

I'm confident of this fix as I've reproduced the crash with some extra logging that shows that while concurrent scanning is running we're reading the Java mirror from an MDO that is unloaded in the next GC. In the VerifyBeforeGC output below 75fae28d0 is the class which is dead. $ grep 75fae28d0 ../../JTwork/compiler/codecache/stress/OverloadCompileQueueTest.jtr ../../JTwork/compiler/codecache/stress/OverloadCompileQueueTest.jtr:[37.235s][error][gc,verify] - private final strict 'javaClass' 'Ljava/lang/Class;' @16 a 'java/lang/Class'{0x000000075fae28d0} = 'compiler/codecache/stress/Helper$TestCaseImpl$$Lambda$704' (ebf5c51a) ../../JTwork/compiler/codecache/stress/OverloadCompileQueueTest.jtr:[37.235s][error][gc,verify] points to dead obj 0x000000075fae28d0 in region [0x000000075fa00000, 0x000000075fb00000) ../../JTwork/compiler/codecache/stress/OverloadCompileQueueTest.jtr:[37.239s][error][gc,verify] {0x000000075fae28d0} - klass: 'java/lang/Class' In the log for the crash you can see that the mirror was read from raw memory, because the base was 0, which means it was read from an MDO. $ grep 75fae28d0 hs_err_pid15950.log hs_err_pid15950.log:Event: 23.920 Thread 0x00007f1590170800 At 365 reading mirror 0x000000075fae28d0 from Klass* 0x0000000800c77c40 with base 0x0000000000000000 hs_err_pid15950.log:Event: 23.923 Thread 0x00007f1590170800 At 365 reading mirror 0x000000075fae28d0 from Klass* 0x0000000800c77c40 with base 0x0000000000000000 So adding a G1BarrierSet::enqueue call in this path should resolve the issue. I think this is kind of like what has to be done when Reference.referent is read while marking is active. I'm rerunning the test with that fix and will leave it overnight to confirm that it resolves this issue.
30-05-2018

I don't there's any way to tell. It's a pretty non-specific GC crash. Either one could result in that kind of failure.
30-05-2018

[~never] do you think that JDK-8202356 is a duplicate of this issue?
30-05-2018

I've run it 80 times with my proposed fix and it hasn't crashed so I think this might be it. I'm going to try to reproduce this crash but with some logging to catch the failing pathway.
30-05-2018

As noted in https://bugs.openjdk.java.net/browse/JDK-8201821 I reproduced a similar crash with compiler/codecache/stress/OverloadCompileQueueTest and the problem was that a jdk.vm.ci.hotspot.HotSpotResolvedObjectTypeImpl has a reference to a dead java.lang.Class. I suspect what's happening is that we're reading a Klass* from a profile, which is a weak link and creating a HotSpotResolvedObjectTypeImpl from by reading the java_mirror from the Klass*. If G1 marking is occurring during this phase maybe we are somehow missing that there are strong roots to Class after this point? My guess is that getResolvedJavaType in jvmciCompilerToVM.cpp needs an explicit post barrier like the one in Reference.get. I'm going to see if I can confirm this is the problem.
29-05-2018

Similar crash in compiler/codecache/stress/RandomAllocationTest.java # A fatal error has been detected by the Java Runtime Environment: # # EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x00007ffe160dfbe1, pid=5472, tid=11988 # # JRE version: Java(TM) SE Runtime Environment (11.0) (fastdebug build 11-internal+0-2018-05-11-0109061.ekaterina.pavlova.jdk.jdk) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 11-internal+0-2018-05-11-0109061.ekaterina.pavlova.jdk.jdk, mixed mode, tiered, jvmci, jvmci compiler, compressed oops, g1 gc, windows-amd64) # Problematic frame: # V [jvm.dll+0x6ffbe1] java_lang_Class::as_Klass+0xe1 #
11-05-2018

May be likelihood. It passed for me even with this option.
07-05-2018

Thanks Vladimir. Is re-running with -Djdk.test.lib.random.seed=6888852057882589151 guaranteed to reproduce the same crash or just increase its likelihood?
07-05-2018

The above URL link does not link to a hs_err file (looks like an "empty" Mesos page). Please attach any hs_err files directly to this issue for any subsequent crashes as I'm having no luck reproducing this crash locally and broken links don't provide much extra information.
07-05-2018

From first link we can get test output: "To re-run test with same seed value please add "-Djdk.test.lib.random.seed=6888852057882589151" to command line." Compiled method (nm) 160521 2971 n 0 java.lang.Class::isInterface (native) total in heap [0x00007f73c6d03810,0x00007f73c6d03c48] = 1080 relocation [0x00007f73c6d039a0,0x00007f73c6d039e0] = 64 main code [0x00007f73c6d039e0,0x00007f73c6d03c48] = 616 And I attached hs_err file.
07-05-2018

Hi [~dnsimon], request your help with this bug. Thank you.
02-03-2018

initial ILW = crash for UnexpectedDeoptimizationTest stress test; with Graal as jit, intermittent, reported once as of now; disable Graal = HMM = P2
02-03-2018