JDK-8275610 : C2: Object field load floats above its null check resulting in a segfault
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 9,11,17,18
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • OS: linux
  • CPU: x86_64
  • Submitted: 2021-10-15
  • Updated: 2022-07-12
  • Resolved: 2021-12-06
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11 JDK 13 JDK 17 JDK 18
11.0.15-oracleFixed 13.0.11Fixed 17.0.3-oracleFixed 18 b27Fixed
Related Reports
Duplicate :  
Relates :  
Relates :  
Description
ADDITIONAL SYSTEM INFORMATION :
Java 17 / ubuntu

This is a CI/CD env so I don't have full access to get exact version numbers for the OS.

A DESCRIPTION OF THE PROBLEM :
We've been building and testing Apache POI on different Java installs for years. On OpenJDK 17, the test runs regularly fail with a SIGSEGV failure.

REGRESSION : Last worked in version 16.0.2

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
* Check out https://github.com/apache/poi
* ./gradlew poi-ooxml:test
* warning - this can take quite a few mins to run

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Successful compile and test run
ACTUAL -
https://ci-builds.apache.org/job/POI/job/POI-DSL-1.17/38/ is one example build where this failed

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f5246b70be0, pid=21730, tid=21917
#
# JRE version: OpenJDK Runtime Environment (17.0+35) (build 17+35-2724)
# Java VM: OpenJDK 64-Bit Server VM (17+35-2724, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# J 34089 c2 com.sun.org.apache.xerces.internal.dom.CoreDocumentImpl.importNode(Lorg/w3c/dom/Node;ZZLjava/util/Map;)Lorg/w3c/dom/Node; java.xml@17 (1038 bytes) @ 0x00007f5246b70be0 [0x00007f5246b70320+0x00000000000008c0]
#
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /home/jenkins/jenkins-agent/workspace/POI/POI-DSL-1.17/poi-ooxml/hs_err_pid21730.log
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
#

FREQUENCY : often



Comments
A pull request was submitted for review. URL: https://git.openjdk.java.net/jdk13u-dev/pull/312 Date: 2022-01-21 13:57:36 +0000
21-01-2022

Fix request (13u,15u) This backport of a crash fix is not exactly clean. Two options different in two passes of the test are not supported in 13u and 15u as well as "randomness" key in the test. Running the test without jtreg consistently does crash VM before the fix.
21-01-2022

A pull request was submitted for review. URL: https://git.openjdk.java.net/jdk11u-dev/pull/784 Date: 2022-01-20 19:26:39 +0000
20-01-2022

Fix request [11u] I backport this for parity with 11.0.15-oracle. I had to adapt the test. Pre-submit tests failed.
20-01-2022

A pull request was submitted for review. URL: https://git.openjdk.java.net/jdk11u-dev/pull/773 Date: 2022-01-18 12:46:48 +0000
18-01-2022

Fix request [17u] I backport this for parity with 11.0.15-oracle. As this is in 11.0.15-oracle, I think it should also go to 17.0.3. Typical risk of a fix in C2. Clean backport. SAP nightly tests passed.
23-12-2021

A pull request was submitted for review. URL: https://git.openjdk.java.net/jdk17u-dev/pull/32 Date: 2021-12-22 16:44:00 +0000
22-12-2021

Changeset: 7c6f57fc Author: Christian Hagedorn <chagedorn@openjdk.org> Date: 2021-12-06 14:48:03 +0000 URL: https://git.openjdk.java.net/jdk/commit/7c6f57fcb1f1fcecf26f7b8046a5a41ca6d9c315
06-12-2021

Introduced by JDK-8143542 which eliminates back to back Ifs by using split-if. In this bug, two back to back null checks are optimized: The dominated null check If node is split through the region merging the if/else branch of the dominating null check If node. This separates the CastPP node from its null check If node. The CastPP ends up after a range check right before the dominating null check If node. Later, we remove that range check because of an earlier range check before the null check covering it. As a consequence, the CastPP is also rewired to the earlier range check. C2 then schedules a field load which has the same control as the CastPP later before the null check. This results in a segfault.
26-11-2021

I was also able to reproduce this. Thanks [~fmatte] for your help! I noticed that the failing method was only C1 compiled in my runs. I added an additional loop to the failing test TestSignature::testNonSha1() and added -XX:-TieredCompilation which now reliably triggers the bug. Will have a closer look at why this is happening.
15-11-2021

I am able to reproduce this issue on 17 b20 (fastdebug build) # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x00007f7b61702308, pid=3951, tid=4221 # # JRE version: Java(TM) SE Runtime Environment (17.0+20) (fastdebug build 17-ea+20-LTS-1743) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 17-ea+20-LTS-1743, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) # Problematic frame: # J 33579 c2 com.sun.org.apache.xerces.internal.dom.CoreDocumentImpl.importNode(Lorg/w3c/dom/Node;ZZLjava/util/Map;)Lorg/w3c/dom/Node; java.xml@17-ea (1038 bytes) @ 0x00007f7b61702308 [0x00007f7b617013a0+0x0000000000000f68] # # Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to /tank/fmatte/JI/9071681/poi/poi-ooxml/core.3951) # # An error report file with more information is saved as: # /tank/fmatte/JI/9071681/poi/poi-ooxml/hs_err_pid3951.log Java HotSpot(TM) 64-Bit Server VM warning: outputStream::do_vsnprintf output truncated -- buffer length is 2000 bytes but 9218 bytes are needed. Java HotSpot(TM) 64-Bit Server VM warning: outputStream::do_vsnprintf output truncated -- buffer length is 2000 bytes but 11613 bytes are needed. # # If you would like to submit a bug report, please visit: # https://bugreport.java.com/bugreport/crash.jsp # Attaching the complete hs_error file hs_err_pid3951.log
12-11-2021

I had a closer at https://ci-builds.apache.org/job/POI/job/POI-DSL-1.17/54/artifact/build/hs_err_pid24100.log. The assembly code around the crash looks like this: mov r10d,DWORD PTR [r11+0x3c] # read _firstChild into r10 mov ecx,DWORD PTR [r11+0x44] mov rdi,r11 mov rax,r10 mov r13d,0xf and r13d,DWORD PTR [r10+0xc] # trying to read from r10=_firstChild which is 0x0 (null) -> crash when trying to read from 0xc (see siginfo: [..] si_addr: 0x000000000000000c) R11=0x00000000cace1558 is an oop: org.apache.xmlbeans.impl.store.AttrXobj {0x00000000cace1558} - klass: 'org/apache/xmlbeans/impl/store/AttrXobj' - ---- fields (total size 12 words): ..... - '_firstChild' 'Lorg/apache/xmlbeans/impl/store/Xobj;' @60 NULL (0) # 0x3c = 60 ..... So, it looks like we are missing a null check. It was either removed by mistake or this access somehow floated above its null check. Looking at the assembly code immediately following the crashing instruction, it seems that the latter is true: and r13d,DWORD PTR [r10+0xc] # floated above null check below test r10d,r10d jne 0x120 I guess it is highly dependent on the order in which optimizations take place and/or how C2 finally schedules the instructions. I will continue to try to reproduce this with a fastdebug build by using a combination of -XX:+StressIGVN/GCM/LCM.
12-11-2021

I could not reproduce this (yet) on Ubuntu 20.04. I did the following: git clone https://github.com/apache/poi.git git checkout 6b1a477997e8e09fbc2fd9f0277baa198ee1c3b4 # Job 54 mentioned above Then repeated runs with ./gradlew clean ./gradlew build and: ./gradlew clean ./gradlew poi-ooxml:test Both time using JDK 17+35-2724 but I could not reproduce it. Was it confirmed that * Check out https://github.com/apache/poi * ./gradlew poi-ooxml:test can reproduce the crash [~sswsharm]?
12-11-2021

ILW = crash in compiled code, not reproducible, disable compilation of problem method = HLM = P3
28-10-2021

I have attached the crash-log-file to not have it aged out by subsequent CI builds on builds.apache.org
27-10-2021

Crash is in JIT'd code so moving to Compiler team.
27-10-2021

Additional Information from submitter: =========================== https://ci-builds.apache.org/job/POI/job/POI-DSL-1.17/54/artifact/build/hs_err_pid24100.log might help to debug.
27-10-2021

additional information requested from submitter: ======================================== I executed the testcase on both Ubuntu and windows 10 with JDK 17 but was not able to reproduce this issue. Steps followed: * Check out https://github.com/apache/poi * ./gradlew poi-ooxml:test Actual behavior : No crash is observed on both Ubuntu and windows 10 Could you please confirm if the steps are proper to reproduce this issue? Also, https://ci-builds.apache.org/job/POI/job/POI-DSL-1.17/38/ does not contain the hs_err_pid.log file, could you please share the complete hs_err_pid.log file to analyze it further. ============================================
20-10-2021