JDK-8218067 : C2 crashes in PhaseIdealLoop::split_up
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 9,10,11,12
  • Priority: P2
  • Status: Closed
  • Resolution: Duplicate
  • Submitted: 2019-01-30
  • Updated: 2019-05-02
  • Resolved: 2019-05-02
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 13
13Resolved
Related Reports
Duplicate :  
Relates :  
Relates :  
Relates :  
Description
#  SIGSEGV (0xb) at pc=0x00007fad0dea1409, pid=10534, tid=10604
#
# JRE version: OpenJDK Runtime Environment (10.0.1+10) (build 10.0.1+10)
# Java VM: OpenJDK 64-Bit Server VM (10.0.1+10, mixed mode, tiered, compressed oops, concurrent mark sweep gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0xc48409]  PhaseIdealLoop::split_up(Node*, Node*, Node*) [clone .part.40]+0x619

Current CompileTask:
C2:1566370 54692  s!   4       org.apache.lucene.index.ConcurrentMergeScheduler::merge (280 bytes)

Stack: [0x00007facec32d000,0x00007facec42e000],  sp=0x00007facec428d60,  free space=1007k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0xc48409]  PhaseIdealLoop::split_up(Node*, Node*, Node*) [clone .part.40]+0x619
V  [libjvm.so+0xc4a464]  PhaseIdealLoop::do_split_if(Node*)+0x854
V  [libjvm.so+0xa1b957]  PhaseIdealLoop::split_if_with_blocks(VectorSet&, Node_Stack&)+0xf7
V  [libjvm.so+0xa15b26]  PhaseIdealLoop::build_and_optimize(bool, bool)+0xe16
V  [libjvm.so+0x63e511]  Compile::Optimize()+0x981
V  [libjvm.so+0x63fdbc]  Compile::Compile(ciEnv*, C2Compiler*, ciMethod*, int, bool, bool, bool, DirectiveSet*)+0x10ac
V  [libjvm.so+0x5654c2]  C2Compiler::compile_method(ciEnv*, ciMethod*, int, DirectiveSet*)+0x1d2
V  [libjvm.so+0x647b86]  CompileBroker::invoke_compiler_on_method(CompileTask*)+0x3d6
V  [libjvm.so+0x648d1b]  CompileBroker::compiler_thread_loop()+0x28b
V  [libjvm.so+0xcd9a78]  JavaThread::thread_main_inner()+0x108
V  [libjvm.so+0xb5a6e2]  thread_native_entry(Thread*)+0xf2
C  [libpthread.so.0+0x76db]  start_thread+0xdb

Reported by the Lucene Team, see https://issues.apache.org/jira/browse/LUCENE-8668
https://mail.openjdk.java.net/pipermail/hotspot-dev/2019-January/036524.html
Comments
Thanks Uwe, closing this as duplicate of JDK-8219448 then. I'll follow up about backporting in the comment section of JDK-8219448.
02-05-2019

Hi, FYI, I updated the preview builds of JDK 13: It's now ea-18 and the Shipilev Fastdebug build of May 1. Previously there were versions installed from March 18 th, (it was ea-12 and the Shipilev build). According to changelog, ea-12 was the first one that included JDK-8219448: [http://hg.openjdk.java.net/jdk/jdk/log?rev=reverse%28%22jdk-13%2B11%22%3A%3A%22jdk-13%2B12%22-%22jdk-13%2B11%22%29&revcount=1000]; since that version I have no seen any failures on the JDK-13 builds! Great! Would it be possible to have this in JDK-11 LTS? Although we can't get the Oracle builds, but there are other communities supporting JDK-11-LTS?
01-05-2019

Great, thanks!
30-04-2019

Hi, I will install a new Shipilev nightly build and report back, not sure on which version we are currently. But I have not seen any crushes or assertion failures anymore. Uwe
30-04-2019

Is this still showing up after JDK-8219448 was pushed?
30-04-2019

Hi, the problem with reproducing this issue is (in contrast to JDK-8219448) is the fact that it only seems to happen in Apache Solr. Apache Solr are heavy integration tests, so debugging and reproducing is very hard. It spawns hundreds of threads and tests communicate via HTTP requests to spawned Solr servers (all in same JVM). JDK-8219448 is easy to reproduce, as it happens in Lucene only, which are very isolated "unit" tests.
02-03-2019

I have a fix for another split-if bug out on review (https://bugs.openjdk.java.net/browse/JDK-8219448). It is confirmed in 12 and 13, but may exist even earlier. I don't know if it is the same bug, but it could be. I will try another round of trying to reproduce this issue.
28-02-2019

Thanks for the updates, Uwe!
18-02-2019

Here are the error logs attached: hs_err_pid9437_fastdebug.log, replay_pid9437_fastdebug.log
17-02-2019

I got results on first run with fastdebug. In fact an assertion really failed: [junit4] JVM J0: stdout was not empty, see: /home/jenkins/workspace/Lucene-Solr-master-Linux/solr/build/solr-core/test/temp/junit4-J0-20190217_133343_0086048383564395154177.sysout [junit4] >>> JVM J0 emitted unexpected output (verbatim) ---- [junit4] # To suppress the following error report, specify this argument [junit4] # after -XX: or in .hotspotrc: SuppressErrorAt=/split_if.cpp:116 [junit4] # [junit4] # A fatal error has been detected by the Java Runtime Environment: [junit4] # [junit4] # Internal Error (/home/buildbot/worker/jdk12u-linux/build/src/hotspot/share/opto/split_if.cpp:116), pid=9437, tid=1773 [junit4] # Error: assert(bol->is_Bool()) failed [junit4] # [junit4] # JRE version: OpenJDK Runtime Environment (12.0) (fastdebug build 12-testing+0-builds.shipilev.net-openjdk-jdk12-b109-20190215-jdk-1229) [junit4] # Java VM: OpenJDK 64-Bit Server VM (fastdebug 12-testing+0-builds.shipilev.net-openjdk-jdk12-b109-20190215-jdk-1229, mixed mode, sharing, tiered, compressed oops, serial gc, linux-amd64) [junit4] # Problematic frame: [junit4] # V [libjvm.so+0x17dd2ff] PhaseIdealLoop::split_up(Node*, Node*, Node*) [clone .part.42]+0x11cf [junit4] # [junit4] # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again [junit4] # [junit4] # An error report file with more information is saved as: [junit4] # /home/jenkins/workspace/Lucene-Solr-master-Linux/solr/build/solr-core/test/J0/hs_err_pid9437.log [junit4] # [junit4] # Compiler replay data is saved as: [junit4] # /home/jenkins/workspace/Lucene-Solr-master-Linux/solr/build/solr-core/test/J0/replay_pid9437.log [junit4] # [junit4] # If you would like to submit a bug report, please visit: [junit4] # http://bugreport.java.com/bugreport/crash.jsp [junit4] # [junit4] Current thread is 1773 [junit4] Dumping core ... [junit4] <<< JVM J0: EOF ---- [junit4] JVM J0: stderr was not empty, see: /home/jenkins/workspace/Lucene-Solr-master-Linux/solr/build/solr-core/test/temp/junit4-J0-20190217_133343_00813155232221245443135.syserr [junit4] >>> JVM J0 emitted unexpected output (verbatim) ---- [junit4] OpenJDK 64-Bit Server VM warning: increase O_BUFLEN in ostream.hpp -- output truncated [junit4] <<< JVM J0: EOF ----
17-02-2019

Hi Tobias, the bug happened again with latest JDK 12 b32 RC build: https://jenkins.thetaphi.de/job/Lucene-Solr-master-Linux/23686/artifact/solr/build/solr-core/test/J1/hs_err_pid31273.log OpenJDK 64-Bit Server VM (12+32, mixed mode, tiered, parallel gc, linux-amd64) So it looks unrelated to JDK-8215757.
17-02-2019

I added JDK-12 (latest) fastdebug builds into the lucene test loop. Will report soon!
17-02-2019

Sorry for the delay: we are now running CI with latest JDK-12 (b32). We will se if the issue still appears. In parallel I'll try to setup some fastdebug runs on CI, too.
15-02-2019

I have tried reproducing with replay_pid8792.log as well without success. Subscribing to Lucene-Solr-master-Linux failure feed now.
05-02-2019

I have tried reproducing the crash using the replay_pid27685.log, JDK 11 and solr build from checkout db57468242 together with the appropriate commandline. The java and solr versions match, no profile data is missing, the inlining matches, but the compile completes successfully anyway.
31-01-2019

Hi Uwe, no, I haven't been able to reproduce it but based the priority on the information you've provided (-XX:-SplitIfBlocks should serve as a workaround because it disables the optimization that crashes). We can still lower/raise priority when we get more information. I suspect the issue only shows up on one machine due to the specific CPU features (AVX, SSE, ..) that trigger some C2 intrinsics or due to some specific timing when gathering profiles and triggering compilation. The crash looks similar to JDK-8215757 which we have fixed lately in JDK 12 b28. Could you try with latest JDK 12 [1] to see if this still reproduces? Running with a fastdebug build would also help to get a better understanding of what's going on. Thanks. [1] https://jdk.java.net/12/
31-01-2019

I just checked the problematic class: There were only few changes in Lucene so it's not introduced by a recent change in Lucene: https://github.com/apache/lucene-solr/commits/master/lucene/core/src/java/org/apache/lucene/index/ConcurrentMergeScheduler.java The problematic method is here, not sure which "if" statement is causing this: https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/index/ConcurrentMergeScheduler.java#L497-L564 This is not in a tight inner loop like most of the Lucene failures before, its the code that coordinates merges done by other threads. I can give more details on FOSDEM face2face, if needed.
30-01-2019

Hi Tobias, does this mean you were able to reproduce it? I can of course try to use the above options (-SplitIfBlocks) tomorrow or exclude ConcurrentMergeSchedule from compilation. I will post the corresponding Java code tomorrow. Uwe
30-01-2019

ILW = Crash during C2 compilation, reproduces with Lucene test suite, -XX:-SplitIfBlocks or exclude method from compilation = HMM = P2
30-01-2019