JDK-8303279 : C2: crash in SubTypeCheckNode::sub() at IGVN split if
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 17.0.6,18.0.1.1,21,22
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • OS: generic
  • Submitted: 2023-02-28
  • Updated: 2024-07-21
  • Resolved: 2023-07-11
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 17 JDK 21 JDK 22
17.0.10-oracleFixed 21Fixed 22 b06Fixed
Related Reports
Relates :  
Description
Originally reported at https://github.com/adoptium/containers/issues/336 and at  https://youtrack.jetbrains.com/issue/KT-54693/SIGSEGV-0xb-at-pc0x0000000000000000-C2-CompilerThread0

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x0000000000000000, pid=1, tid=14
#
# JRE version: OpenJDK Runtime Environment Temurin-17.0.6+10 (17.0.6+10) (build 17.0.6+10)
# Java VM: OpenJDK 64-Bit Server VM Temurin-17.0.6+10 (17.0.6+10, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, serial gc, linux-amd64)
# Problematic frame:
# C  0x0000000000000000
#
# Core dump will be written. Default location: /core.%e.1.%t
#
# JFR recording file will be written. Location: //hs_err_pid1.jfr
#
# If you would like to submit a bug report, please visit:
#   https://github.com/adoptium/adoptium-support/issues
#

---------------  S U M M A R Y ------------

Command Line: -Xms192m -Xmx192m -XX:ErrorFile=/jvm-dumps/hs_err_pid%p.log -javaagent:/opt/dd-java-agent.jar -Ddd.version=20230219.1 -Ddd.service=svc-ui-sync dev.r36.mercury.uisync.MainKt

Host: Intel(R) Xeon(R) CPU @ 2.20GHz, 4 cores, 728M, Ubuntu 22.04.1 LTS
Time: Fri Feb 24 23:40:19 2023 UTC elapsed time: 510178.339438 seconds (5d 21h 42m 58s)

---------------  T H R E A D  ---------------

Current thread (0x00007f30a806d090):  JavaThread "C2 CompilerThread0" daemon [_thread_in_native, id=14, stack(0x00007f30ac417000,0x00007f30ac517000)]


Current CompileTask:
C2:510178339 18521   !   4       io.grpc.kotlin.ServerCalls$serverCallListener$requests$1::invokeSuspend (285 bytes)

Stack: [0x00007f30ac417000,0x00007f30ac517000],  sp=0x00007f30ac512988,  free space=1006k

siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x0000000000000000

Register to memory mapping:

RAX=0x0000000000000011 is an unknown value
RBX=0x00007f3078023228 points into unknown readable memory: 0x00007f30aeefebb8 | b8 eb ef ae 30 7f 00 00
RCX=0x0 is NULL
RDX=0x00007f30786db101 points into unknown readable memory: b0 6d 78 30 7f 00 00
RSP=0x00007f30ac512988 is pointing into the stack for thread: 0x00007f30a806d090
RBP=0x00007f30ac5129d0 is pointing into the stack for thread: 0x00007f30a806d090
RSI=0x00007f30ae0ca250: <offset 0x00000000003df250> in /opt/java/openjdk/lib/server/libjvm.so at 0x00007f30adceb000
RDI=0x00007f3078023228 points into unknown readable memory: 0x00007f30aeefebb8 | b8 eb ef ae 30 7f 00 00
R8 =0x00007f30786db1a0 points into unknown readable memory: 0x00007f30aeeff288 | 88 f2 ef ae 30 7f 00 00
R9 =0x00007f3078ae62d8 points into unknown readable memory: 0x00007f30aef9c0d0 | d0 c0 f9 ae 30 7f 00 00
R10=0x00007f30784623e8 points into unknown readable memory: 0x00007f30aef9c1a0 | a0 c1 f9 ae 30 7f 00 00
R11=0x00007f3078c0a4a0 points into unknown readable memory: 0x00007f30aef86bd8 | d8 6b f8 ae 30 7f 00 00
R12=0x00007f3078699160 points into unknown readable memory: 0x00007f30aef79aa0 | a0 9a f7 ae 30 7f 00 00
R13=0x00007f3079237610 points into unknown readable memory: 0x00007f30aef9c1a0 | a0 c1 f9 ae 30 7f 00 00
R14=0x00007f3078ae6320 points into unknown readable memory: 0x00007f30aef81d50 | 50 1d f8 ae 30 7f 00 00
R15=0x00007f30786db1a0 points into unknown readable memory: 0x00007f30aeeff288 | 88 f2 ef ae 30 7f 00 00

Comments
A pull request was submitted for review. Branch: master URL: https://git.openjdk.org/jdk17u-dev/pull/1580 Date: 2023-07-13 21:44:45 +0000
21-07-2024

Fix request 17u - fixes a c2 crash, change passes jtreg tiers 1, 2, 3 and 4
18-07-2023

A pull request was submitted for review. URL: https://git.openjdk.org/jdk21u/pull/9 Date: 2023-07-13 21:45:10 +0000
13-07-2023

A pull request was submitted for review. URL: https://git.openjdk.org/jdk21/pull/119 Date: 2023-07-13 05:47:04 +0000
13-07-2023

Changeset: caadad4f Author: Roland Westrelin <roland@openjdk.org> Date: 2023-07-11 15:59:17 +0000 URL: https://git.openjdk.org/jdk/commit/caadad4fdc78799dab2d492dba9b9f74b22d036e
11-07-2023

I didn't try replay file myself. There were multiple failures observed across different platforms during test run. All of the failures happened in `-Xcomp` mode.
28-06-2023

@vlivanov, I can't reproduce the crash with the replay_pid2467188.log file. Can you? Also, the test passes for me. Does it always crash for you or just occasionally?
28-06-2023

Failing test: serviceability/jvmti/DynamicCodeGenerated/DynamicCodeGeneratedTest.java Logs: hs_err_pid2467188.log replay_pid2467188.log # Internal Error (src/hotspot/share/opto/subtypenode.cpp:37), pid=2467188, tid=2467203 # assert(sub_t != Type::TOP && !TypePtr::NULL_PTR->higher_equal(sub_t)) failed: should be not null Current CompileTask: C2: 27178 6483 b 4 java.lang.invoke.StringConcatFactory::foldInLastMixers (313 bytes) Stack: [0x00007f1805621000,0x00007f1805721000], sp=0x00007f180571c210, free space=1004k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x16ec008] SubTypeCheckNode::sub(Type const*, Type const*) const+0x5e8 (subtypenode.cpp:37) V [libjvm.so+0x15112d7] PhaseIterGVN::transform_old(Node*)+0x317 (phaseX.cpp:1234) V [libjvm.so+0x15083f8] PhaseIterGVN::optimize()+0x78 (phaseX.cpp:1045) V [libjvm.so+0x9ebc48] Compile::inline_incrementally_cleanup(PhaseIterGVN&)+0x1b8 (compile.cpp:2067) V [libjvm.so+0x9ebf15] Compile::inline_boxing_calls(PhaseIterGVN&)+0x135 (compile.cpp:2010) V [libjvm.so+0x9efbcb] Compile::Optimize()+0x70b (compile.cpp:2252) V [libjvm.so+0x9f263a] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x1c4a (compile.cpp:851) V [libjvm.so+0x84d7ce] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x10e (c2compiler.cpp:115) V [libjvm.so+0x9fe460] CompileBroker::invoke_compiler_on_method(CompileTask*)+0xa00 (compileBroker.cpp:2265) V [libjvm.so+0x9ff2e8] CompileBroker::compiler_thread_loop()+0x618 (compileBroker.cpp:1944) V [libjvm.so+0xeb0e4c] JavaThread::thread_main_inner()+0xcc (javaThread.cpp:719) V [libjvm.so+0x178797a] Thread::call_run()+0xba (thread.cpp:217) V [libjvm.so+0x148adbc] thread_native_entry(Thread*)+0x11c (os_linux.cpp:778)
27-06-2023

A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/14678 Date: 2023-06-27 14:40:49 +0000
27-06-2023

I was thinking this, maybe, as a fix: https://github.com/openjdk/jdk/compare/master...rwestrel:jdk:JDK-8303279 A SubTypeCheck node should never see a possibly null input. The crash happens at split if with a dead branch that's not been entirely destroyed yet. The patch delays split if in that case. I ran into a number of cases where the assert I added to SubTypeCheckNode::sub fires for things that can't be null but didn't have their type set to non null.
27-06-2023

Attached test case fails when run with: java -XX:-TieredCompilation -XX:-BackgroundCompilation -XX:-UseOnStackReplacement -XX:+PrintCompilation -XX:CompileOnly=TestCrashAtIGVNSplitIfSubType::test -XX:CompileCommand=quiet -XX:+StressIGVN -XX:StressSeed=598200189 TestCrashAtIGVNSplitIfSubType
23-06-2023

Nice, great work Roland!
23-06-2023

A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/14600 Date: 2023-06-21 17:25:38 +0000
21-06-2023

While browsing JBS, I found JDK-8303513 which seems similar to this issue (i.e. also caused by a SubTypeCheckNode with an input of the TOP constant node). While looking at `SubTypeCheckNode::Ideal()` I found that it already has exactly the same safeguard I proposed for `SubTypeCheckNode::sub()`, namely: ``` if (!super_t->isa_klassptr() || (!sub_t->isa_klassptr() && !sub_t->isa_oopptr())) { return NULL; } ``` Because this seems to affect a lot of people, I've created a preliminary PR (https://github.com/openjdk/jdk/pull/14600) so others can check if this fix solves their problems. But I'm obviously open to better solutions (and digging deeper in order to find a more complete explanation and/or a better reproducer for the problem).
21-06-2023

I think we should attempt to extract a standalone (jasm) reproducer and investigate more thoroughly. Such issues usually depend on the sequence in which nodes are processed by IGVN, so -XX:+StressIGVN -XX:RepeatCompilation=... might help.
21-06-2023

Thanks for your comments [~thartmann]. I've just uploaded the complete ideal graph (3992644_ideal.txt.gz) and inlining tree (3992644_inlining.txt) dumped from an instrumented JVM just before a crash which correspond to hs_err_pid3992644.log. Unfortunately I've never managed to get a crash when running with `-XX:PrintIdealGraphLevel` to get all the graphs before the crash. Do you think the proposed fix in `SubTypeCheckNode::sub()` is OK or do you think we need a more elaborate fix which prevents this situation in the first place?
21-06-2023

> I'm only not sure if the unusual graph which leads to this crash is caused by the *uncommon* bytecode generated by the Kotlin compiler or if it is the result of another problem in an earlier optimization stage? [~simonis] It's hard to tell but most likely it's bytecode that would not be generated by javac that creates an uncommon C2 IR shape. From your example, it looks like a dying subgraph because the region input is top, probably because the null check was folded because the input is always null. Now we are in an intermittent state where the data path did not yet fold and we therefore observe a null input that should never happen. This looks like a regression from JDK-8238691 in JDK 15 (paging [~roland]).
21-06-2023

The problem is the following: `SubTypeCheckNode::sub()` expects that it's `sub_t` input `Type` is either a Klasspointer (i.e. `Type::KlassPtr`) or an Ooppointer (i.e. `Type::OopPtr`, `Type::InstPtr` or `Type::AryPtr`). It only checks for a Klasspointer and if that's not the case it assumes an Ooppointer. However, in the crashing case, `sub_t` has the generic pointer type `Type::AnyPtr` so debug builds will run into an assertion and product builds will just crash. The `SubTypeCheckNode` in question has the following shape in `split_if()`: ``` Con (#top) | | __IfTrue |/ || __IfFalse |// Region | __ ConP (#NULL) | / | __/ _ Phi (Oop:kotlinx/coroutines/internal/LockFreeLinkedListNode:NotNull) || ___/ ||| ____ Phi (Oop:kotlinx/coroutines/internal/LockFreeLinkedListNode:NotNull) |||| |/// Phi | ConP (Klass:precise klass kotlinx/coroutines/channels/Send) | | \ / SubTypeCheck ``` `split_if()` then searches for the first contstant input pf `SubTypeCheck` `Phi`-node and finds `ConP (#NULL)`. It then calls `SubTypeCheckNode::sub()` with `sub_t` as `ConP (#NULL)`'s type which is `Type::AnyPtr` and crashes. I've verified that returning `bottom_type()` from `SubTypeCheckNode::sub` for the `(!sub_t->isa_klassptr() && !sub_t->isa_oopptr())` case fixes the crash (by instrumenting the VM to ensure that the compilation as well as the further program execution succeeds if we take the new branch). I'm only not sure if the unusual graph which leads to this crash is caused by the *uncommon* bytecode generated by the Kotlin compiler or if it is the result of another problem in an earlier optimization stage? Unfortunately the attached replay files can't be used to reproduce the crash. I've tried both, using the original classes from `grpc-server-1.0-SNAPSHOT.jar` as well as using the classe dumped with the SA agent from a core file of the crash. Even if running with `-XX:ReplaySuppressInitializers=0` the inlining tree during replay is different from the original one during the crash (I've verified that with an instrumented VM which dumps the inlining tree before crashing). This may be caused by an issue of the replay functionality described in [this mail](https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2023-June/065936.html) on hotspot-compiler-dev.
20-06-2023

I managed to get the attached reproducer crashing with product and fastdebug builds of JDK 17, a product build of JDK 18 and a slowdebug build of JDK 21 although the crash frequency seems to be the highest with JDK 17. The stack traces for all versions look the same (except for product builds which have no stack trace in the hs_err file): ``` # Internal Error (/priv/simonisv/OpenJDK/Git/jdk17u-dev/src/hotspot/share/opto/type.hpp:1735), pid=3992644, tid=3993484 # assert(_base >= OopPtr && _base <= AryPtr) failed: Not a Java pointer # # JRE version: OpenJDK Runtime Environment (17.0.9) (fastdebug build 17.0.9-internal+0-adhoc.simonisv.jdk17u-dev) ... Current CompileTask: C2: 45915 5854 ! 4 io.grpc.kotlin.ServerCalls$serverCallListener$requests$1::invokeSuspend (285 bytes) Stack: [0x00007ffef67e8000,0x00007ffef68e9000], sp=0x00007ffef68e3f20, free space=1007k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x195deb2] SubTypeCheckNode::sub(Type const*, Type const*) const+0x3b2 V [libjvm.so+0xe330e7] split_if(IfNode*, PhaseIterGVN*)+0x3f7 V [libjvm.so+0xe350d8] IfNode::Ideal_common(PhaseGVN*, bool) [clone .part.0]+0x828 V [libjvm.so+0xe401ea] IfNode::Ideal(PhaseGVN*, bool)+0x3a V [libjvm.so+0x161ff78] PhaseIterGVN::transform_old(Node*)+0xb8 V [libjvm.so+0x16190ce] PhaseIterGVN::optimize()+0x7e V [libjvm.so+0xa5303b] Compile::Optimize()+0x90b V [libjvm.so+0xa54fb6] Compile::Compile(ciEnv*, ciMethod*, int, bool, bool, bool, bool, bool, DirectiveSet*)+0x1306 V [libjvm.so+0x87bc66] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x646 V [libjvm.so+0xa66daa] CompileBroker::invoke_compiler_on_method(CompileTask*)+0xd1a V [libjvm.so+0xa67b98] CompileBroker::compiler_thread_loop()+0x628 V [libjvm.so+0x1a191fc] JavaThread::thread_main_inner()+0x36c V [libjvm.so+0x1a194cb] JavaThread::run()+0x25b V [libjvm.so+0x1a1ed54] Thread::call_run()+0x104 V [libjvm.so+0x1597d8c] thread_native_entry(Thread*)+0x10c ... ``` See attached hs_err_pid3992644.log ``` # Internal Error (/priv/simonisv/OpenJDK/Git/jdk/src/hotspot/share/opto/type.hpp:2059), pid=1152816, tid=1154124 # assert(_base >= OopPtr && _base <= AryPtr) failed: Not a Java pointer # # JRE version: OpenJDK Runtime Environment (21.0) (slowdebug build 21-internal-adhoc.simonisv.jdk) ... Current CompileTask: C2: 91009 8214 ! 4 io.grpc.kotlin.ServerCalls$serverCallListener$requests$1::invokeSuspend (285 bytes) Stack: [0x00007fff1306b000,0x00007fff1316c000], sp=0x00007fff13166fe0, free space=1007k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x61d636] Type::is_oopptr() const+0x4e (type.hpp:2059) V [libjvm.so+0x14cee55] SubTypeCheckNode::sub(Type const*, Type const*) const+0x53 (subtypenode.cpp:37) V [libjvm.so+0x14c66b0] SubNode::Value(PhaseGVN*) const+0xa6 (subnode.cpp:107) V [libjvm.so+0xcdacb3] split_if(IfNode*, PhaseIterGVN*)+0x2ce (ifnode.cpp:111) V [libjvm.so+0xce044c] IfNode::Ideal_common(PhaseGVN*, bool)+0x128 (ifnode.cpp:1438) V [libjvm.so+0xce0496] IfNode::Ideal(PhaseGVN*, bool)+0x30 (ifnode.cpp:1448) V [libjvm.so+0x1298244] PhaseGVN::apply_ideal(Node*, bool)+0x70 (phaseX.cpp:667) V [libjvm.so+0x129a0fd] PhaseIterGVN::transform_old(Node*)+0x12d (phaseX.cpp:1196) V [libjvm.so+0x12998df] PhaseIterGVN::optimize()+0x16b (phaseX.cpp:1045) V [libjvm.so+0x93f89e] Compile::Optimize()+0xce0 (compile.cpp:2378) V [libjvm.so+0x9385fa] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x16ca (compile.cpp:842) V [libjvm.so+0x806ab4] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x1a0 (c2compiler.cpp:118) V [libjvm.so+0x958bc8] CompileBroker::invoke_compiler_on_method(CompileTask*)+0xa04 (compileBroker.cpp:2265) V [libjvm.so+0x9576fa] CompileBroker::compiler_thread_loop()+0x462 (compileBroker.cpp:1944) V [libjvm.so+0x97b14a] CompilerThread::thread_entry(JavaThread*, JavaThread*)+0x84 (compilerThread.cpp:58) V [libjvm.so+0xd434ce] JavaThread::thread_main_inner()+0x15c (javaThread.cpp:719) V [libjvm.so+0xd43368] JavaThread::run()+0x258 (javaThread.cpp:704) V [libjvm.so+0x15481ea] Thread::call_run()+0x1a8 (thread.cpp:217) V [libjvm.so+0x1230036] thread_native_entry(Thread*)+0x1a5 (os_linux.cpp:778) ... ``` See attached hs_err_pid1152816.log
20-06-2023

I've attached a stripped down reproducer from https://github.com/corretto/corretto-17/issues/110 provided by Umut Kocasarac [1]. Building: ``` $ unzip grpc-test.zip $ cd grpc-test $ mvn clean package spring-boot:repackage ``` To build and run the test you need JDK 17+ and Maven 3.9.2+. Running: ``` terminal1:$ java -XX:CompileCommand='PrintInlining,io.grpc.kotlin.ServerCalls$serverCallListener$requests$1::invokeSuspend' -XX:CICompilerCount=16 -Dport=8008 -jar grpc-server/target/grpc-server-1.0-SNAPSHOT.jar ... >>> server started ... >>> received 100011 >>> time diff PT1M32.67601825S >>> rate 6207 ... ``` ``` terminal2:$ java -showversion -Dport=8008 -jar grpc-client/target/grpc-client-1.0-SNAPSHOT.jar ``` Usually the server started in `terminal1` will crash in about one out of ten runs. Once you get no more compiles of `ServerCalls$serverCallListener$requests$1::invokeSuspend` (indicated by the output of `>>> received ..` lines without interleaving compilations) in the first terminal, you can kill and restart the server and hope for a crash in the next run :) [1] https://github.com/umutkocasarac
20-06-2023

This seems to be related to the following two Corretto issues: https://github.com/corretto/corretto-17/issues/110 https://github.com/corretto/corretto-17/issues/57 Especially the second one has a pretty long discussion thread and some more hs_err files. For some people, excluding the following methods from JIT-compilation seems to have helped to mitigate the problem: ``` -XX:CompileCommand="exclude,kotlinx.coroutines.flow.AbstractFlow::collect" -XX:CompileCommand="exclude,kotlinx.coroutines.flow.SafeFlow::collectSafely" -XX:CompileCommand="exclude,kotlinx.coroutines.reactive.PublisherAsFlow::collectImpl" -XX:CompileCommand="exclude,io.grpc.kotlin.ServerCalls*::*" ```
12-06-2023

Thanks, Martijn. It would also be interesting to know if this is a recent regression in JDK 17u or an old issue. Pre-ILW = Crash during C2 compilation, intermittent and non-reproducible with Kotlin workload, no known workaround but disable compilation of affected method = HLM = P3
01-03-2023

I'll contact the Kotlin folks and see if they can produce a reproducer or at least that debug build UPDATE: I asked for more details at https://youtrack.jetbrains.com/issue/KT-54693/
01-03-2023

In addition, could the reporter(s) try to reproduce with a debug VM build? Given, that this is with Kotlin and the reporter mentioned "we generate code, which is not possible to write in Java - we have non-canonical loops or jumps to catch blocks, which are likely to not be covered by tests", it could be a problem with the handling of irreducible loops (JDK-8280126, [~epeter]).
28-02-2023

There is not much we can do here without a reproducer. [~karianna], was a replay compilation file (replay_pid...log) generated that could be shared?
28-02-2023

Kotlin folks suspect it is a genuine C2 compiler crash - also see commentary here: https://youtrack.jetbrains.com/issue/KT-54693/SIGSEGV-0xb-at-pc0x0000000000000000-C2-CompilerThread0
28-02-2023