JDK 11 | JDK 17 | JDK 19 | JDK 8 |
---|---|---|---|
11.0.17-oracleFixed | 17.0.5-oracleFixed | 19 b26Fixed | 8u421Fixed |
Duplicate :
|
|
Relates :
|
|
Relates :
|
|
Relates :
|
Running a specific test of the Deephaven project leads to the following segmentation fault: # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x00007f9d10dc3d8a, pid=39978, tid=39998 # # JRE version: Java(TM) SE Runtime Environment (19.0+24) (build 19-ea+24-1832) # Java VM: Java HotSpot(TM) 64-Bit Server VM (19-ea+24-1832, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0x593d8a] PhaseAggressiveCoalesce::coalesce(Block*)+0x6a HOW TO REPRODUCE ON JDK 19 (RELEASE BUILD) (Note: these instructions run gradle itself on JDK 11. This can be achieved setting the JAVA_HOME environment variable and/or passing the option -Dorg.gradle.java.home=$JAVA_HOME to all ./gradlew commands.) 1. git clone --depth 1 --branch nightly/phase-aggressive-sigsegv git@github.com:deephaven/deephaven-core.git 2. cd deephaven-core 3. printf 'org.gradle.java.installations.paths=$JDK19_RELEASE_HOME\n' >> gradle.properties (optionally, run $ ./gradlew -q javaToolchains to verify that the JDK 19 build is recognized by gradle) 4. ./gradlew -PtestRuntimeVersion=18 -PforceTest=true engine-table:testOutOfBand --tests io.deephaven.engine.table.impl.QueryTableAggregationTest.testMedianByIncremental (..) # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x00007f9d10dc3d8a, pid=39978, tid=39998 # # JRE version: Java(TM) SE Runtime Environment (19.0+24) (build 19-ea+24-1832) # Java VM: Java HotSpot(TM) 64-Bit Server VM (19-ea+24-1832, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0x593d8a] PhaseAggressiveCoalesce::coalesce(Block*)+0x6a (..) (if step 4 succeeds, re-run it a few times until the crash is triggered) The error log and replay files are attached (hs_err_pid21380.log, replay_pid21380.log). HOW TO REPLAY IT ON JDK 19 (DEBUG BUILD) The issue seems to be hard to reproduce directly on a debug JDK build. Luckily, it can be replayed on a debug build from the replay file generated from the release build crash: 1. run steps 1-3 above 2. download the attached replay file (replay_pid21380.log) 3. build the classpath required to replay the crash, e.g. by extracting it from the gradle debug information: 3.1. ./gradlew --info --debug -PtestRuntimeVersion=18 -PforceTest=true engine-table:testOutOfBand --tests io.deephaven.engine.table.impl.QueryTableAggregationTest.testMedianByIncremental | grep "Using application classpath" | tail -1 > tmp 3.2. REPLAY_CLASSPATH=$(cat tmp | cut -d "[" -f 4- | cut -d "]" -f 1 | sed 's/, /:/g') 4. $JDK19_DEBUG_HOME/bin/java -XX:+ReplayCompiles -XX:+ReplayIgnoreInitErrors -XX:ReplayDataFile=replay_pid21380.log -cp "$REPLAY_CLASSPATH" (..) # To suppress the following error report, specify this argument # after -XX: or in .hotspotrc: SuppressErrorAt=/compile.cpp:1214 # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/opt/mach5/mesos/work_dir/slaves/779adf21-f3e5-4e6a-a889-8cc0f9bc6fbb-S66914/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/f438564a-997a-4f93-8215-28dc0c0bef6d/runs/f41da18c-9f23-49a7-ab67-ad61fa19003a/workspace/open/src/hotspot/share/opto/compile.cpp:1214), pid=42338, tid=42351 # assert(tn->in(0) != __null) failed: must have live top node # # JRE version: Java(TM) SE Runtime Environment (19.0+24) (fastdebug build 19-ea+24-1832) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 19-ea+24-1832, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0xaa884c] Compile::verify_top(Node*) const+0x17c The error log file is attached (hs_err_pid36556.log). ORIGINAL REPORT Originally posted at: https://github.com/adoptium/adoptium-support/issues/516 The issue is exhibited from multiple methods, potentially involving array / vectorization optimizations. We've so far worked around it by setting up a compiler directives file with excludes, but that's rather fragile and we are finding more places that eventually hit this error. Steps to reproduce Currently, we are only able to reproduce using our junit test suite. We've also seen it in our running application, but we don't currently have a framework to easily reproduce that setup. I'm working on creating a more minimal reproduction. Some of our developers are able to reproduce the issue frequently, some are able to reproduce it infrequently, and others appear to not be able to reproduce it. I'm guessing there may be hardware or environmental issues at play. The issue is reproducible within the standard Github Actions runner environment. Here's the branch that is meant to reproduce the issue - https://github.com/deephaven/deephaven-core/tree/nightly/phase-aggressive-sigsegv. ./gradlew -PtestRuntimeVersion=17 -PtestRuntimeVendor=adoptopenjdk -PforceTest=true engine-table:testOutOfBand --tests io.deephaven.engine.table.impl.QueryTableAggregationTest.testMedianByIncremental The above command may need to be run multiple times (10+) to get the SIGSEGV. By default, it's set to run against a Java 11 (specific version depends on OS and gradle) by default. On my local machine, I can reproduce much more consistently w/ java 17 by setting -PtestRuntimeVersion=17. The nightly/phase-aggressive-sigsegv branch is also setup to run a GH workflow to run these specific tests. Triaging info The issue is reproducible on the latest versions of OpenJDK 11 and 17 (and have also been reproduced on earlier versions of 11 and 17). # JRE version: OpenJDK Runtime Environment Temurin-11.0.15+10 (11.0.15+10) (build 11.0.15+10) # Java VM: OpenJDK 64-Bit Server VM Temurin-11.0.15+10 (11.0.15+10, mixed mode, tiered, compressed oops, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0x62619c] PhaseAggressiveCoalesce::coalesce(Block*)+0x50c # JRE version: OpenJDK Runtime Environment Temurin-17.0.3+7 (17.0.3+7) (build 17.0.3+7) # Java VM: OpenJDK 64-Bit Server VM Temurin-17.0.3+7 (17.0.3+7, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0x597885] PhaseAggressiveCoalesce::coalesce(Block*)+0x65 In GH CI, the environment seen so far: Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz, 2 cores, 6G, Ubuntu 20.04.4 LTS Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz, 2 cores, 6G, Ubuntu 20.04.4 LTS Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz, 2 cores, 6G, Ubuntu 20.04.4 LTS Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz, 2 cores, 6G, Ubuntu 20.04.4 LTS Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz, 2 cores, 6G, Ubuntu 20.04.3 LTS Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz, 2 cores, 6G, Ubuntu 20.04.3 LTS Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz, 2 cores, 6G, Ubuntu 20.04.3 LTS I'm currently in the process of collecting more detailed information on our developers' machines. Cross-posting our issue: deephaven/deephaven-core#2038
|