JDK-8225475 : Node budget asserts on x86_32/64
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 11,13
  • Priority: P2
  • Status: Closed
  • Resolution: Fixed
  • Submitted: 2019-06-07
  • Updated: 2022-03-14
  • Resolved: 2019-07-02
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11 JDK 13 JDK 14
11.0.16-oracleFixed 13 b28Fixed 14Fixed
Related Reports
Relates :  
Relates :  
Description
(provisional synopsis, feel free to edit)

Happens more or less reliably with the test like:

$ CONF=linux-x86-server-fastdebug make images run-test TEST=jdk/jshell/ToolSimpleTest.java

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/home/shade/jdk-jdk/src/hotspot/share/opto/loopnode.hpp:1409), pid=90700, tid=91215
#  assert(C->live_nodes() - live_at_begin <= 2 * _nodes_required) failed: Bad node estimate: actual = 2161 >> request = 959
#
# JRE version: OpenJDK Runtime Environment (13.0) (fastdebug build 13-internal+0-adhoc.shade.jdk-jdk)
# Java VM: OpenJDK Server VM (fastdebug 13-internal+0-adhoc.shade.jdk-jdk, mixed mode, tiered, g1 gc, linux-x86)
# Problematic frame:
# V  [libjvm.so+0xf2468a]  AutoNodeBudget::~AutoNodeBudget()+0xda
#
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /home/shade/jdk-jdk/build/linux-x86-server-fastdebug/test-support/jtreg_test_langtools_jdk_jshell_ToolSimpleTest_java/scratch/0/hs_err_pid90700.log
#
# Compiler replay data is saved as:
# /home/shade/jdk-jdk/build/linux-x86-server-fastdebug/test-support/jtreg_test_langtools_jdk_jshell_ToolSimpleTest_java/scratch/0/replay_pid90700.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
#
---------------  T H R E A D  ---------------

Current thread (0xad6c9800):  JavaThread "C2 CompilerThread1" daemon [_thread_in_native, id=91215, stack(0xa967f000,0xa9700000)]


Current CompileTask:
C2:  20942 6049   !   4       com.sun.tools.javac.parser.JavaTokenizer::readToken (2155 bytes)

Stack: [0xa967f000,0xa9700000],  sp=0xa96fd180,  free space=504k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0xf2468a]  AutoNodeBudget::~AutoNodeBudget()+0xda
V  [libjvm.so+0xf21bdf]  IdealLoopTree::iteration_split_impl(PhaseIdealLoop*, Node_List&)+0x11f
V  [libjvm.so+0xf220db]  IdealLoopTree::iteration_split(PhaseIdealLoop*, Node_List&)+0x11b
V  [libjvm.so+0xf21fed]  IdealLoopTree::iteration_split(PhaseIdealLoop*, Node_List&)+0x2d
V  [libjvm.so+0xf43eb7]  PhaseIdealLoop::build_and_optimize(LoopOptsMode)+0x10b7
V  [libjvm.so+0x86f9c8]  PhaseIdealLoop::optimize(PhaseIterGVN&, LoopOptsMode)+0x268
V  [libjvm.so+0x86c3d3]  Compile::Optimize()+0x4d3
V  [libjvm.so+0x86e65f]  Compile::Compile(ciEnv*, C2Compiler*, ciMethod*, int, bool, bool, bool, DirectiveSet*)+0x17df
V  [libjvm.so+0x6c3ace]  C2Compiler::compile_method(ciEnv*, ciMethod*, int, DirectiveSet*)+0x39e
V  [libjvm.so+0x87acee]  CompileBroker::invoke_compiler_on_method(CompileTask*)+0x5ee
V  [libjvm.so+0x87bd92]  CompileBroker::compiler_thread_loop()+0x2f2
V  [libjvm.so+0x14cd409]  compiler_thread_entry(JavaThread*, Thread*)+0x59
V  [libjvm.so+0x14d5706]  JavaThread::thread_main_inner()+0x1c6
V  [libjvm.so+0x14dc189]  Thread::call_run()+0xf9
V  [libjvm.so+0x1133aa6]  thread_native_entry(Thread*)+0x136
C  [libpthread.so.0+0x627a]  start_thread+0xda

Started to happen recently, after JDK-8223363, JDK-8223502 JDK-8224648 enabled the node budged verifications again.
Comments
A pull request was submitted for review. URL: https://git.openjdk.java.net/jdk11u-dev/pull/883 Date: 2022-03-11 15:45:04 +0000
11-03-2022

Fix Request (11u): Should get backported for parity with 11.0.16-oracle. Applies cleanly. Nightly tests have passed.
11-03-2022

URL: http://hg.openjdk.java.net/jdk/jdk13/rev/38c73e24fa7b User: phedlin Date: 2019-07-02 07:50:33 +0000
02-07-2019

Update! Without the patch: 12/100 failures. With the patch: 0/500 failures. The patch does really seem to work.
27-06-2019

The issue is intermittent. Tallied up recent logs from my TR 2950X build node and jdk/jdk tree: Failed 5 times: linux-x86-server-fastdebug, jdk/jshell/ToolSimpleTest.java Failed 1 time: linux-x86_64-server-fastdebug, jdk/jshell/ToolRetainTest.java Failed 1 time: linux-x86_64-server-fastdebug, jdk/jshell/ToolLocalSimpleTest.java
27-06-2019

Suggesting making (the ad-hoc) node budget estimate more pessimistic (in general). diff -r 72bbc930d7b6 -r ffffa3a6e710 src/hotspot/share/opto/loopnode.cpp --- a/src/hotspot/share/opto/loopnode.cpp Sat Jun 22 02:03:41 2019 +0200 +++ b/src/hotspot/share/opto/loopnode.cpp Tue Jun 25 11:43:36 2019 +0200 @@ -2494,9 +2494,11 @@ } } } - // Add data (x1.5) and control (x1.0) count to estimate iff both are > 0. + // Add data and control count (x2.0) to estimate iff both are > 0. This is + // a rather pessimistic estimate for the most part, in particular for some + // complex loops, but still not enough to capture all loops. if (ctrl_edge_out_cnt > 0 && data_edge_out_cnt > 0) { - estimate += ctrl_edge_out_cnt + data_edge_out_cnt + data_edge_out_cnt / 2; + estimate += 2 * (ctrl_edge_out_cnt + data_edge_out_cnt); } The above change alleviates the assert above (for DeoptimizeALot).
27-06-2019

Additional witness: # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (open/src/hotspot/share/opto/loopnode.hpp:1410), pid=10476, tid=11697 # assert(C->live_nodes() - live_at_begin <= 2 * _nodes_required) failed: Bad node estimate: actual = 443 >> request = 218 # # JRE version: Java(TM) SE Runtime Environment (14.0) (fastdebug build 14-internal+0-2019-06-17-1149288.pliden.null) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 14-internal+0-2019-06-17-1149288.pliden.null, mixed mode, tiered, z gc, linux-amd64) # Problematic frame: # V [libjvm.so+0x11be347] AutoNodeBudget::~AutoNodeBudget()+0x107 # Reproduced by: (using DeoptimizeALot) $java -XX:+UnlockExperimentalVMOptions -XX:+UseZGC -Xms1G -Xmx4G -XX:+UnlockDiagnosticVMOptions -XX:+ZVerifyViews -Xlog:gc:gc.log -XX:+DeoptimizeALot -Dspecjvm.home.dir=$HGROOT/gc-test-suite/build/SPECjvm2008/SPECjvm2008 -jar $HGROOT/gc-test-suite/build/SPECjvm2008/SPECjvm2008/SPECjvm2008.jar -ikv --benchmarkThreads 8 --warmuptime 30 --iterationTime 60 mpegaudio
27-06-2019

Ran the suggested (pessimistic) patch above with linux-x86-server-fastdebug, jdk/jshell/ToolSimpleTest.java: without patch: 2/10 fails, with patch: 0/10 fails. Running longer now to verify if this was just luck.
27-06-2019

Indeed, slightly different profiling data produces differences in inlining (and loop complexity) and, is somewhat emphasised by the option DeoptimizeALot. A fully robust solution, removing the ad-hoc nature of rough estimates, would require us to provide either; 1) exact estimates (e.g. by simulation) or, 2) loop (body) cloning [*] that support roll-back (e.g. via snapshots or a trail mechanism). [*] It is essentially the loop cloning functionality that may push us over the edge (i.e. the node budget).
27-06-2019

Have been running the ToolSimpleTest & ToolRetainTest (reported via mail by Aleksey to assert on x64) test-cases for four days+ (~100h and a few thousand runs) without triggering the assert (running locally only).
19-06-2019

Could you share your configuration and build options? -- Do you cross compile? (I'm unable to reproduce the assert with the mach5 build; linux-x86-debug.)
11-06-2019

I cross-compile using http://hg.openjdk.java.net/jdk/jdk/raw-file/tip/doc/building.html#creating-and-using-sysroots-with-qemu-deboostrap. Pretty sure the usual devkit would yield the similar binary. Configure log for my builds: https://builds.shipilev.net/openjdk-jdk/openjdk-jdk-latest-linux-x86-fastdebug.configure.log
07-06-2019

Have seen it in tier1 x86_32 so far. Not sure if x86_32 specific, though.
07-06-2019

32-bit only (?). Not ran in regular Mach5 testing. Might be direct or indirect excessive (macro) node expansion not _obvious_ from the graph?
07-06-2019