JDK-8287432 : C2: assert(tn->in(0) != __null) failed: must have live top node
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 11.0.15,17.0.3,18.0.1,19
  • Priority: P2
  • Status: Closed
  • Resolution: Fixed
  • OS: linux_ubuntu
  • CPU: x86
  • Submitted: 2022-05-27
  • Updated: 2024-02-15
  • Resolved: 2022-06-08
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11 JDK 17 JDK 19 JDK 8
11.0.17-oracleFixed 17.0.5-oracleFixed 19 b26Fixed 8u421Fixed
Related Reports
Duplicate :  
Relates :  
Relates :  
Relates :  
Description
Running a specific test of the Deephaven project leads to the following segmentation fault: 

# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f9d10dc3d8a, pid=39978, tid=39998
#
# JRE version: Java(TM) SE Runtime Environment (19.0+24) (build 19-ea+24-1832)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (19-ea+24-1832, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0x593d8a]  PhaseAggressiveCoalesce::coalesce(Block*)+0x6a

HOW TO REPRODUCE ON JDK 19 (RELEASE BUILD)

(Note: these instructions run gradle itself on JDK 11. This can be achieved setting the JAVA_HOME environment variable and/or passing the option -Dorg.gradle.java.home=$JAVA_HOME to all ./gradlew commands.)

1. git clone --depth 1 --branch nightly/phase-aggressive-sigsegv git@github.com:deephaven/deephaven-core.git
2. cd deephaven-core
3. printf 'org.gradle.java.installations.paths=$JDK19_RELEASE_HOME\n' >> gradle.properties
(optionally, run $ ./gradlew -q javaToolchains to verify that the JDK 19 build is recognized by gradle)
4. ./gradlew -PtestRuntimeVersion=18 -PforceTest=true engine-table:testOutOfBand --tests io.deephaven.engine.table.impl.QueryTableAggregationTest.testMedianByIncremental
(..)
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f9d10dc3d8a, pid=39978, tid=39998
#
# JRE version: Java(TM) SE Runtime Environment (19.0+24) (build 19-ea+24-1832)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (19-ea+24-1832, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0x593d8a]  PhaseAggressiveCoalesce::coalesce(Block*)+0x6a
(..)
(if step 4 succeeds, re-run it a few times until the crash is triggered)

The error log and replay files are attached (hs_err_pid21380.log, replay_pid21380.log).

HOW TO REPLAY IT ON JDK 19 (DEBUG BUILD)

The issue seems to be hard to reproduce directly on a debug JDK build. Luckily, it can be replayed on a debug build from the replay file generated from the release build crash:

1. run steps 1-3 above
2. download the attached replay file (replay_pid21380.log)
3. build the classpath required to replay the crash, e.g. by extracting it from the gradle debug information:
3.1. ./gradlew --info --debug -PtestRuntimeVersion=18 -PforceTest=true engine-table:testOutOfBand --tests io.deephaven.engine.table.impl.QueryTableAggregationTest.testMedianByIncremental | grep "Using application classpath" | tail -1 > tmp
3.2. REPLAY_CLASSPATH=$(cat tmp | cut -d "[" -f 4- | cut -d "]" -f 1 | sed 's/, /:/g')
4. $JDK19_DEBUG_HOME/bin/java -XX:+ReplayCompiles -XX:+ReplayIgnoreInitErrors -XX:ReplayDataFile=replay_pid21380.log -cp "$REPLAY_CLASSPATH"
(..)
# To suppress the following error report, specify this argument
# after -XX: or in .hotspotrc:  SuppressErrorAt=/compile.cpp:1214
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/opt/mach5/mesos/work_dir/slaves/779adf21-f3e5-4e6a-a889-8cc0f9bc6fbb-S66914/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/f438564a-997a-4f93-8215-28dc0c0bef6d/runs/f41da18c-9f23-49a7-ab67-ad61fa19003a/workspace/open/src/hotspot/share/opto/compile.cpp:1214), pid=42338, tid=42351
#  assert(tn->in(0) != __null) failed: must have live top node
#
# JRE version: Java(TM) SE Runtime Environment (19.0+24) (fastdebug build 19-ea+24-1832)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 19-ea+24-1832, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0xaa884c]  Compile::verify_top(Node*) const+0x17c

The error log file is attached (hs_err_pid36556.log).

ORIGINAL REPORT

Originally posted at: https://github.com/adoptium/adoptium-support/issues/516

The issue is exhibited from multiple methods, potentially involving array / vectorization optimizations. We've so far worked around it by setting up a compiler directives file with excludes, but that's rather fragile and we are finding more places that eventually hit this error.

Steps to reproduce
Currently, we are only able to reproduce using our junit test suite. We've also seen it in our running application, but we don't currently have a framework to easily reproduce that setup. I'm working on creating a more minimal reproduction. Some of our developers are able to reproduce the issue frequently, some are able to reproduce it infrequently, and others appear to not be able to reproduce it. I'm guessing there may be hardware or environmental issues at play. The issue is reproducible within the standard Github Actions runner environment.

Here's the branch that is meant to reproduce the issue - https://github.com/deephaven/deephaven-core/tree/nightly/phase-aggressive-sigsegv.

./gradlew -PtestRuntimeVersion=17 -PtestRuntimeVendor=adoptopenjdk -PforceTest=true engine-table:testOutOfBand --tests io.deephaven.engine.table.impl.QueryTableAggregationTest.testMedianByIncremental
The above command may need to be run multiple times (10+) to get the SIGSEGV. By default, it's set to run against a Java 11 (specific version depends on OS and gradle) by default. On my local machine, I can reproduce much more consistently w/ java 17 by setting -PtestRuntimeVersion=17. The nightly/phase-aggressive-sigsegv branch is also setup to run a GH workflow to run these specific tests.

Triaging info
The issue is reproducible on the latest versions of OpenJDK 11 and 17 (and have also been reproduced on earlier versions of 11 and 17).

# JRE version: OpenJDK Runtime Environment Temurin-11.0.15+10 (11.0.15+10) (build 11.0.15+10)
# Java VM: OpenJDK 64-Bit Server VM Temurin-11.0.15+10 (11.0.15+10, mixed mode, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0x62619c]  PhaseAggressiveCoalesce::coalesce(Block*)+0x50c

# JRE version: OpenJDK Runtime Environment Temurin-17.0.3+7 (17.0.3+7) (build 17.0.3+7)
# Java VM: OpenJDK 64-Bit Server VM Temurin-17.0.3+7 (17.0.3+7, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0x597885]  PhaseAggressiveCoalesce::coalesce(Block*)+0x65
In GH CI, the environment seen so far:

Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz, 2 cores, 6G, Ubuntu 20.04.4 LTS
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz, 2 cores, 6G, Ubuntu 20.04.4 LTS
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz, 2 cores, 6G, Ubuntu 20.04.4 LTS
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz, 2 cores, 6G, Ubuntu 20.04.4 LTS
Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz, 2 cores, 6G, Ubuntu 20.04.3 LTS
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz, 2 cores, 6G, Ubuntu 20.04.3 LTS
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz, 2 cores, 6G, Ubuntu 20.04.3 LTS
I'm currently in the process of collecting more detailed information on our developers' machines.

Cross-posting our issue: deephaven/deephaven-core#2038

Comments
New developed compiler/c2/TestRemoveMemBarPrecEdge.java test passed in JDK 19 ATR and CI (on Ubuntu as well)
11-07-2022

A pull request was submitted for review. URL: https://git.openjdk.org/jdk11u-dev/pull/1190 Date: 2022-06-30 14:35:36 +0000
30-06-2022

Fix request [17u] I backport this for parity with 17.0.5-oracle. Tiny change, some of the typical C2 fix risk, we should fix it. Clean backport. Test passes and fails without the fix. SAP nightly testing passed.
30-06-2022

A pull request was submitted for review. URL: https://git.openjdk.org/jdk17u-dev/pull/515 Date: 2022-06-28 20:01:52 +0000
28-06-2022

A pull request was submitted for review. URL: https://git.openjdk.org/jdk18u/pull/165 Date: 2022-06-21 07:53:17 +0000
21-06-2022

Changeset: 78d37126 Author: Christian Hagedorn <chagedorn@openjdk.org> Date: 2022-06-08 14:12:09 +0000 URL: https://git.openjdk.java.net/jdk/commit/78d371266ae8a629db8176ced4d48e9521702cce
08-06-2022

A pull request was submitted for review. URL: https://git.openjdk.java.net/jdk/pull/9060 Date: 2022-06-07 12:16:00 +0000
07-06-2022

C->top() is getting disconnected from the graph by final_graph_reshaping(), which has a comment that says “A method with only infinite loops has no edges entering loops from root”, so maybe the problem is an infinite loop?
01-06-2022

ILW = Crash in C2 code generation, medium?, disable compilation of affected methods = HMM = P2
31-05-2022