JDK-8308583 : SIGSEGV in GraphKit::gen_checkcast
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 21
  • Priority: P1
  • Status: Closed
  • Resolution: Fixed
  • Submitted: 2023-05-22
  • Updated: 2023-07-11
  • Resolved: 2023-05-26
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 21
21 b25Fixed
Related Reports
Relates :  
Description
While testing a recent snapshot of JDK21 with Graal native image we started seeing a repeatable crash with a null pointer in GraphKit::gen_instanceof. 

# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f0fffeee7ac, pid=7111, tid=7120
#
# JRE version: Java(TM) SE Runtime Environment Oracle GraalVM 21-dev+19.1 (21.0+19) (build 21+19-jvmci-23.1-b02)
# Java VM: Java HotSpot(TM) 64-Bit Server VM Oracle GraalVM 21-dev+19.1 (21+19-jvmci-23.1-b02, mixed mode, tiered, jvmci, compressed oops, compressed class ptrs, parallel gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0x8457ac]  GraphKit::gen_instanceof(Node*, Node*, bool)+0x2ec

Current CompileTask:
C2: 198273 50775       4       com.oracle.svm.core.StaticFieldsSupport$StaticFieldBaseNode::lower (83 bytes)

Stack: [0x00007f0f8860a000,0x00007f0f8870b000],  sp=0x00007f0f88707580,  free space=1013k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x8457ac]  GraphKit::gen_instanceof(Node*, Node*, bool)+0x2ec
V  [libjvm.so+0xce25fd]  Parse::do_instanceof()+0x1bd
V  [libjvm.so+0xccf0b0]  Parse::do_one_block()+0x210
V  [libjvm.so+0xccf4b5]  Parse::do_all_blocks()+0xe5
V  [libjvm.so+0xcd1c05]  Parse::Parse(JVMState*, ciMethod*, float)+0x725
V  [libjvm.so+0x54e18b]  ParseGenerator::generate(JVMState*)+0x8b
V  [libjvm.so+0x54faf9]  PredictedCallGenerator::generate(JVMState*)+0x2f9
V  [libjvm.so+0x54faf9]  PredictedCallGenerator::generate(JVMState*)+0x2f9
V  [libjvm.so+0x7074d8]  Parse::do_call()+0x1f8
V  [libjvm.so+0xccf0b0]  Parse::do_one_block()+0x210
V  [libjvm.so+0xccf4b5]  Parse::do_all_blocks()+0xe5
V  [libjvm.so+0xcd1c05]  Parse::Parse(JVMState*, ciMethod*, float)+0x725
V  [libjvm.so+0x54e18b]  ParseGenerator::generate(JVMState*)+0x8b
V  [libjvm.so+0x61e8e5]  Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0xd95
V  [libjvm.so+0x54d2a0]  C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x120
V  [libjvm.so+0x625c37]  CompileBroker::invoke_compiler_on_method(CompileTask*)+0xa87
V  [libjvm.so+0x626b48]  CompileBroker::compiler_thread_loop()+0x6a8
V  [libjvm.so+0x8d47e8]  JavaThread::thread_main_inner() [clone .part.0]+0xb8
V  [libjvm.so+0xe77646]  Thread::call_run()+0xa6
V  [libjvm.so+0xc9fd88]  thread_native_entry(Thread*)+0xd8

The problem is that top of stack is top which leads to a crash when trying to emit the checkcast.  It appears the top is injected by a previous instanceof that uses maybe_cast_profiled_receiver and replace_in_map.  The top is produced at https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/graphKit.cpp#L2890.  It's unclear to me whether this code is never supposed to produce top because the callers guard against it or if the returned top value is somehow supposed to be handled.

Since this is occurring in the context of Graal and labsjdk I can't give you something which will reproduce against master.  I've added some debug code to type_check_receiver to dump the involved values in GraphKit::type_check_receiver:

rreceiver_type=bottom[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull *
recvx_type=org/graalvm/compiler/word/HostedWord (org/graalvm/word/WordBase,org/graalvm/word/ComparableWord,org/graalvm/word/UnsignedWord,org/graalvm/word/PointerBase,org/graalvm/word/SignedWord,org/graalvm/word/Pointer):exact *
 1859  Phi  === 1074 1835 1047  [[ 255 1867 1885 1898 1906 1908 1912 1923 1908 1928 1928 1937 ]]  #bottom[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull *
 1929  LoadNKlass  === _ 7 1928  [[ 1930 ]]  @bottom[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull+8 * [narrowklass], idx=4; #narrowklass: java/lang/Object: 0x00007fe6cb015bf8 *
 1931  ConP  === 0  [[ 1932 ]]  #precise org/graalvm/compiler/word/HostedWord: 0x00007fe6eaa8a680 (org/graalvm/word/WordBase,org/graalvm/word/ComparableWord,org/graalvm/word/UnsignedWord,org/graalvm/word/PointerBase,org/graalvm/word/SignedWord,org/graalvm/word/Pointer):Constant:exact *  Klass:precise org/graalvm/compiler/word/HostedWord: 0x00007fe6eaa8a680 (org/graalvm/word/WordBase,org/graalvm/word/ComparableWord,org/graalvm/word/UnsignedWord,org/graalvm/word/PointerBase,org/graalvm/word/SignedWord,org/graalvm/word/Pointer):Constant:exact *
 1930  DecodeNKlass  === _ 1929  [[ 1932 ]]  #java/lang/Object: 0x00007fe6cb015bf8 *  Klass:java/lang/Object: 0x00007fe6cb015bf8 *
 1932  CmpP  === _ 1930 1931  [[ 1933 ]]
 1933  Bool  === _ 1932  [[ 1934 ]] [eq]

The reason the cmp doesn't fold is because while the receiver has a precise type, the DecodeNKlass has been erased to object.  The type folding in the CheckCastPPNode is working with the visible types so it's able see that the types are disjoint.  How exactly is this code protected from this kind of problem?

This is occurring in the context of labsjdk which is currently based on jdk21+19 but I don't see any C2 fixes which might address this problem.  I did try out 8303512 but that didn't help.

I can provide instructions setting up a Graal build to reproduce this but can also test out any fixes. I added a guarantee that this code never produces a top return value and I'm currently running a mach5 gate with it but haven't seen any failures yet.
Comments
Changeset: 199b1bf5 Author: Roland Westrelin <roland@openjdk.org> Date: 2023-05-26 07:03:35 +0000 URL: https://git.openjdk.org/jdk/commit/199b1bf5009120efd1fd37a1ddabc0c6fb84f62c
26-05-2023

Great, thanks for confirming and thanks to Roland for the quick fix.
25-05-2023

I can confirm that it appears to resolve the SIGSEGV crashes. I tested both product and fastdebug builds. Thanks for the quick fix [~roland]!
25-05-2023

[~dnsimon], [~never] could you verify that Roland's fix also resolves the SIGSEGV crashes that you were observing? Thanks.
25-05-2023

A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/14123 Date: 2023-05-24 12:31:37 +0000
25-05-2023

This seems to be a regression from JDK-8297933 in JDK 21. I verified that with Tom's reproducer and build-search with -XX:+AbortVMOnCompilationFailure because we assert only after JDK-8303951.
24-05-2023

[~never] Thanks for the reproducer! I'll look into it. We can restrict compilation to "cc::test": ./java -Xbatch -XX:-TieredCompilation -XX:CompileCommand=printcompilation,cc::* -XX:CompileCommand=compileonly,cc::test cc.java
24-05-2023

Thanks for the reproducer, Tom. Looks similar to JDK-8308504 / JDK-8308392. We'll investigate this asap.
24-05-2023

I've attached a test case cc.java which creates similar conditions. It doesn't SEGV, I think because the control flow and hierarchy is slightly different, but it does cause type_check_receiver to replace the object with top. In fastdebug it dies with malformed control flow: $ ~/Downloads/jdk-21/fastdebug/bin/java -showversion cc java version "21-ea" 2023-09-19 LTS Java(TM) SE Runtime Environment (fastdebug build 21-ea+23-LTS-1988) Java HotSpot(TM) 64-Bit Server VM (fastdebug build 21-ea+23-LTS-1988, mixed mode, sharing) dist dump --------------------------------------------- 0 141 If === 133 140 [[ 143 ]] P=1.000000, C=-1.000000 !jvms: cc$SubSnippetReflection::forObject @ bci:1 (line 29) cc::test @ bci:14 (line 64) 1 143 IfFalse === 141 [[ 148 ]] #0 !jvms: cc$SubSnippetReflection::forObject @ bci:1 (line 29) cc::test @ bci:14 (line 64) # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/System/Volumes/Data/mesos/work_dir/slaves/741e9afd-8c02-45c3-b2e2-9db1450d0832-S102723/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/52417fc5-ae11-4950-9454-86253738dc5f/runs/b868ccc5-005a-40ca-8ded-eac7b2aafcf5/workspace/open/src/hotspot/share/opto/compile.cpp:4003), pid=53021, tid=23555 # assert(false) failed: malformed control flow # # JRE version: Java(TM) SE Runtime Environment (21.0+23) (fastdebug build 21-ea+23-LTS-1988) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 21-ea+23-LTS-1988, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, bsd-amd64) # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again # # An error report file with more information is saved as: # /Users/tkrodrig/hs_err_pid53021.log # # Compiler replay data is saved as: # /Users/tkrodrig/replay_pid53021.log # # If you would like to submit a bug report, please visit: # https://bugreport.java.com/bugreport/crash.jsp # [1] 53021 abort ~/Downloads/jdk-21/fastdebug/bin/java -showversion cc It fails on current master as well. The callers of type_check_receiver should probably be using static_subtype_check to guard calls to this method.
24-05-2023

I did try out the replay file but it died trying to initialize JVMCI so I'm not sure if replay needs to do something special to work with JVMCI. It's always possible for a gvn.transform of CheckCastPPNode to produce top if the types are unrelated so whose responsibility is it to avoid it in this call chain? type_check_receiver could simply avoid returning top: diff --git a/src/hotspot/share/opto/graphKit.cpp b/src/hotspot/share/opto/graphKit.cpp index 58c4191f7ff..18163f991bc 100644 --- a/src/hotspot/share/opto/graphKit.cpp +++ b/src/hotspot/share/opto/graphKit.cpp @@ -2886,8 +2886,10 @@ Node* GraphKit::type_check_receiver(Node* receiver, ciKlass* klass, if (!receiver_type->higher_equal(recvx_type)) { // ignore redundant casts // Subsume downstream occurrences of receiver with a cast to // recv_xtype, since now we know what the type will be. - Node* cast = new CheckCastPPNode(control(), receiver, recvx_type); - (*casted_receiver) = _gvn.transform(cast); + Node* cast = _gvn.transform(new CheckCastPPNode(control(), receiver, recvx_type)); + if (!cast->is_top()) { + (*casted_receiver) = cast; + } // (User must make the replace_in_map call.) } }
23-05-2023

We are seeing replay logs being generated for these crashes but our CI is not currently preserving these logs. We will resolved this (GR-45830) and hopefully have some replay logs soon. I'm also not sure if this crash happens (or can happen) when generating libgraal. I'll discuss more with Tom.
23-05-2023

We need a reproducer (either a Graal build setup or a working compilation replay file) to investigate. Please also add the hs_err file.
23-05-2023

Hi [~never], have you tried to reproduce it with a replay file? That might be easier to investigate if Graal is not set up.
23-05-2023

ILW = Crash during C2 parsing, only seen with Graal so far, disable compilation of affected methods = HLM = P3
23-05-2023

For reference, this is my modified version of type_check_receiver: // Profile-driven exact type check: Node* GraphKit::type_check_receiver(Node* receiver, ciKlass* klass, float prob, Node* *casted_receiver) { assert(!klass->is_interface(), "no exact type check on interfaces"); const TypeKlassPtr* tklass = TypeKlassPtr::make(klass, Type::trust_interfaces); Node* recv_klass = load_object_klass(receiver); Node* want_klass = makecon(tklass); Node* cmp = _gvn.transform(new CmpPNode(recv_klass, want_klass)); Node* bol = _gvn.transform(new BoolNode(cmp, BoolTest::eq)); IfNode* iff = create_and_xform_if(control(), bol, prob, COUNT_UNKNOWN); set_control( _gvn.transform(new IfTrueNode (iff))); Node* fail = _gvn.transform(new IfFalseNode(iff)); if (!stopped()) { const TypeOopPtr* receiver_type = _gvn.type(receiver)->isa_oopptr(); const TypeOopPtr* recvx_type = tklass->as_instance_type(); assert(recvx_type->klass_is_exact(), ""); if (!receiver_type->higher_equal(recvx_type)) { // ignore redundant casts // Subsume downstream occurrences of receiver with a cast to // recv_xtype, since now we know what the type will be. Node* cast = new CheckCastPPNode(control(), receiver, recvx_type); (*casted_receiver) = _gvn.transform(cast); if ((*casted_receiver)->is_top()) { tty->print("receiver_type="); receiver_type->dump(); tty->cr(); tty->print("recvx_type="); recvx_type->dump(); tty->cr(); receiver->dump(); bol->dump(4); } guarantee(!(*casted_receiver)->is_top(), "top"); // (User must make the replace_in_map call.) } } return fail; } It seems to me this code should be performing the higher_equal part of the logic first and then using the result of that to determine whether to emit any code.
22-05-2023