JDK-8223502 : Node estimate for loop unswitching is not correct: assert(delta <= 2 * required) failed: Bad node estimate
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 11.0.16-oracle,13
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2019-05-07
  • Updated: 2022-03-11
  • Resolved: 2019-06-03
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11 JDK 13 JDK 14
11.0.16-oracleFixed 13 b24Fixed 14Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Description
It seems the JDK-8223389 was not complete. There is at least one failure like this:

$ jdk-jdk/build/linux-x86_64-server-fastdebug/images/jdk/bin/java -jar target/benchmarks.jar -foe true -f 1 -wi 5 -i 5 -t 1 -w 1s -r 1s --jvmArgs "-Xmx1g -Xms1g -XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC" XmlTransform
...
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/home/shade/trunks/jdk-jdk/src/hotspot/share/opto/loopnode.hpp:1388), pid=27591, tid=27623
#  assert(delta <= 2 * required) failed: Bad node estimate (actual: 707, request: 335)
#
# JRE version: OpenJDK Runtime Environment (13.0) (fastdebug build 13-internal+0-adhoc.shade.jdk-jdk)
# Java VM: OpenJDK 64-Bit Server VM (fastdebug 13-internal+0-adhoc.shade.jdk-jdk, mixed mode, sharing, tiered, compressed oops, shenandoah gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0x11af362]  AutoNodeBudget::~AutoNodeBudget()+0x1e2
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P" (or dumping to /home/shade/core.27591)
#
# An error report file with more information is saved as:
# /home/shade/hs_err_pid27591.log
#
# Compiler replay data is saved as:
# /home/shade/replay_pid27591.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp

Comments
A pull request was submitted for review. URL: https://git.openjdk.java.net/jdk11u-dev/pull/874 Date: 2022-03-10 14:09:01 +0000
10-03-2022

URL: http://hg.openjdk.java.net/jdk/jdk/rev/ba171f871932 User: phedlin Date: 2019-06-03 09:11:25 +0000
03-06-2019

Note, JDK-8223911 has disabled the offending assert. Patric, please enable it back as you fix the original issue.
15-05-2019

Asked to disable the assert with JDK-8223911.
14-05-2019

Okay. Patric, how far out are you to the fix? If there is no reliable fix in sight, I'd like to propose a simple patch that disables assert for a while to unbreak fastdebug testing.
14-05-2019

I'm looking at tightening the bound/estimate (we have the same issue in "peeling", with O(n^2) theoretical growth), but without having to compute "exact" numbers.
14-05-2019

I don't think increasing the constant multiplier is good enough. For loop unswitching (and possibly other loop optimizations that clone the loop body), C2 creates merge nodes outside the cloned loop bodies for every exit of the loop and for each of the exit it has to merge all values that are live out of the loop but modified in the loop body. So worse case, the number of nodes grow by O(n^2), n = loop body size. In the common cases, using O(n^2) could be too pessimistic though.
14-05-2019

Right. I am just looking for the workaround that we can apply to downstream repositories we maintain. Having jdk/jdk that fails intermittently in fastdebug mode is painful. Adding "maintainer-pain" tag.
14-05-2019

Any ideas how to fix it? I checked this dummy patch works for both the attached test and our original failures: diff -r 76751d3faf7b src/hotspot/share/opto/loopUnswitch.cpp --- a/src/hotspot/share/opto/loopUnswitch.cpp Tue May 14 09:12:06 2019 +0200 +++ b/src/hotspot/share/opto/loopUnswitch.cpp Tue May 14 11:18:34 2019 +0200 @@ -80,5 +80,5 @@ // Too speculative if running low on nodes. - return phase->may_require_nodes(est_loop_clone_sz(3, _body.size())); + return phase->may_require_nodes(est_loop_clone_sz(6, _body.size())); }
14-05-2019

ILW = Assert failed due to bad node estimate (regression from JDK-8216137), easy to reproduce with regression test, no workaround = MMH = P3
13-05-2019

This is not a Shenandoah bug. I can reproduce it without Shenandoah with the attached test case and the following command line: java -XX:-TieredCompilation -XX:-BackgroundCompilation -XX:-UseOnStackReplacement -XX:CompileOnly=LoopUnswitchingBadNodeBudget::test -XX:CompileCommand=dontinline,LoopUnswitchingBadNodeBudget::helper -XX:+UnlockExperimentalVMOptions -XX:-UseSwitchProfiling LoopUnswitchingBadNodeBudget Internal Error at loopnode.hpp:1388, pid=23637, tid=23648 assert(delta <= 2 * required) failed: Bad node estimate (actual: 774, request: 263)
10-05-2019