While testing a recent snapshot of JDK21 with Graal native image we started seeing a repeatable crash with a null pointer in GraphKit::gen_instanceof.
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007f0fffeee7ac, pid=7111, tid=7120
#
# JRE version: Java(TM) SE Runtime Environment Oracle GraalVM 21-dev+19.1 (21.0+19) (build 21+19-jvmci-23.1-b02)
# Java VM: Java HotSpot(TM) 64-Bit Server VM Oracle GraalVM 21-dev+19.1 (21+19-jvmci-23.1-b02, mixed mode, tiered, jvmci, compressed oops, compressed class ptrs, parallel gc, linux-amd64)
# Problematic frame:
# V [libjvm.so+0x8457ac] GraphKit::gen_instanceof(Node*, Node*, bool)+0x2ec
Current CompileTask:
C2: 198273 50775 4 com.oracle.svm.core.StaticFieldsSupport$StaticFieldBaseNode::lower (83 bytes)
Stack: [0x00007f0f8860a000,0x00007f0f8870b000], sp=0x00007f0f88707580, free space=1013k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V [libjvm.so+0x8457ac] GraphKit::gen_instanceof(Node*, Node*, bool)+0x2ec
V [libjvm.so+0xce25fd] Parse::do_instanceof()+0x1bd
V [libjvm.so+0xccf0b0] Parse::do_one_block()+0x210
V [libjvm.so+0xccf4b5] Parse::do_all_blocks()+0xe5
V [libjvm.so+0xcd1c05] Parse::Parse(JVMState*, ciMethod*, float)+0x725
V [libjvm.so+0x54e18b] ParseGenerator::generate(JVMState*)+0x8b
V [libjvm.so+0x54faf9] PredictedCallGenerator::generate(JVMState*)+0x2f9
V [libjvm.so+0x54faf9] PredictedCallGenerator::generate(JVMState*)+0x2f9
V [libjvm.so+0x7074d8] Parse::do_call()+0x1f8
V [libjvm.so+0xccf0b0] Parse::do_one_block()+0x210
V [libjvm.so+0xccf4b5] Parse::do_all_blocks()+0xe5
V [libjvm.so+0xcd1c05] Parse::Parse(JVMState*, ciMethod*, float)+0x725
V [libjvm.so+0x54e18b] ParseGenerator::generate(JVMState*)+0x8b
V [libjvm.so+0x61e8e5] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0xd95
V [libjvm.so+0x54d2a0] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x120
V [libjvm.so+0x625c37] CompileBroker::invoke_compiler_on_method(CompileTask*)+0xa87
V [libjvm.so+0x626b48] CompileBroker::compiler_thread_loop()+0x6a8
V [libjvm.so+0x8d47e8] JavaThread::thread_main_inner() [clone .part.0]+0xb8
V [libjvm.so+0xe77646] Thread::call_run()+0xa6
V [libjvm.so+0xc9fd88] thread_native_entry(Thread*)+0xd8
The problem is that top of stack is top which leads to a crash when trying to emit the checkcast. It appears the top is injected by a previous instanceof that uses maybe_cast_profiled_receiver and replace_in_map. The top is produced at https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/graphKit.cpp#L2890. It's unclear to me whether this code is never supposed to produce top because the callers guard against it or if the returned top value is somehow supposed to be handled.
Since this is occurring in the context of Graal and labsjdk I can't give you something which will reproduce against master. I've added some debug code to type_check_receiver to dump the involved values in GraphKit::type_check_receiver:
rreceiver_type=bottom[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull *
recvx_type=org/graalvm/compiler/word/HostedWord (org/graalvm/word/WordBase,org/graalvm/word/ComparableWord,org/graalvm/word/UnsignedWord,org/graalvm/word/PointerBase,org/graalvm/word/SignedWord,org/graalvm/word/Pointer):exact *
1859 Phi === 1074 1835 1047 [[ 255 1867 1885 1898 1906 1908 1912 1923 1908 1928 1928 1937 ]] #bottom[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull *
1929 LoadNKlass === _ 7 1928 [[ 1930 ]] @bottom[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull+8 * [narrowklass], idx=4; #narrowklass: java/lang/Object: 0x00007fe6cb015bf8 *
1931 ConP === 0 [[ 1932 ]] #precise org/graalvm/compiler/word/HostedWord: 0x00007fe6eaa8a680 (org/graalvm/word/WordBase,org/graalvm/word/ComparableWord,org/graalvm/word/UnsignedWord,org/graalvm/word/PointerBase,org/graalvm/word/SignedWord,org/graalvm/word/Pointer):Constant:exact * Klass:precise org/graalvm/compiler/word/HostedWord: 0x00007fe6eaa8a680 (org/graalvm/word/WordBase,org/graalvm/word/ComparableWord,org/graalvm/word/UnsignedWord,org/graalvm/word/PointerBase,org/graalvm/word/SignedWord,org/graalvm/word/Pointer):Constant:exact *
1930 DecodeNKlass === _ 1929 [[ 1932 ]] #java/lang/Object: 0x00007fe6cb015bf8 * Klass:java/lang/Object: 0x00007fe6cb015bf8 *
1932 CmpP === _ 1930 1931 [[ 1933 ]]
1933 Bool === _ 1932 [[ 1934 ]] [eq]
The reason the cmp doesn't fold is because while the receiver has a precise type, the DecodeNKlass has been erased to object. The type folding in the CheckCastPPNode is working with the visible types so it's able see that the types are disjoint. How exactly is this code protected from this kind of problem?
This is occurring in the context of labsjdk which is currently based on jdk21+19 but I don't see any C2 fixes which might address this problem. I did try out 8303512 but that didn't help.
I can provide instructions setting up a Graal build to reproduce this but can also test out any fixes. I added a guarantee that this code never produces a top return value and I'm currently running a mach5 gate with it but haven't seen any failures yet.