JDK-8283466 : C2: missing skeleton predicates in peeled loop
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 19
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2022-03-21
  • Updated: 2022-06-20
  • Resolved: 2022-06-02
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 19
19 b26Fixed
Related Reports
Duplicate :  
Relates :  
Relates :  
Relates :  
Description
Synopsis is provisional, feel free to change it as you see fit. 

The attached fuzzer test fails:

$ build/linux-x86_64-server-fastdebug/images/jdk/bin/java -cp 0015/ -Xmx512m -XX:+UnlockDiagnosticVMOptions -XX:+StressLCM -XX:+StressGCM -XX:+StressCCP -XX:+StressIGVN -Xcomp -XX:CompileOnly=Test -XX:-TieredCompilation -XX:StressSeed=95886967 Test

#  Internal Error (/home/shade/trunks/jdk/src/hotspot/share/opto/gcm.cpp:276), pid=1065282, tid=1065295
#  assert(false) failed: unscheduable graph
#
# JRE version: OpenJDK Runtime Environment (19.0) (fastdebug build 19-internal-adhoc.shade.jdk)
# Java VM: OpenJDK 64-Bit Server VM (fastdebug 19-internal-adhoc.shade.jdk, compiled mode, sharing, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0xdce852]  assert_dom(Block*, Block*, Node*, PhaseCFG const*)+0x1c2

It is semi-intermittent without StressSeed setting.

Bisection points to JDK-8230382 as the first bad commit.

Copied from JDK-8283460 (duplicate bug):
Reopened as separate bug: JDK-8286638.

$ build/linux-x86_64-server-fastdebug/images/jdk/bin/java -cp 0029/ -Xmx512m -XX:+UnlockDiagnosticVMOptions -XX:+StressIGVN -Xcomp -XX:CompileOnly=Test -XX:-TieredCompilation Test
CompileCommand: compileonly Test.* bool compileonly = true
# To suppress the following error report, specify this argument
# after -XX: or in .hotspotrc: SuppressErrorAt=/gcm.cpp:766
#
# A fatal error has been detected by the Java Runtime Environment:
#
# Internal Error (/home/shade/trunks/jdk/src/hotspot/share/opto/gcm.cpp:766), pid=3733508, tid=3733521
# assert(!LCA_orig->dominates(pred_block) || early->dominates(pred_block)) failed: early is high enough
#
# JRE version: OpenJDK Runtime Environment (19.0) (fastdebug build 19-internal-adhoc.shade.jdk)
# Java VM: OpenJDK 64-Bit Server VM (fastdebug 19-internal-adhoc.shade.jdk, compiled mode, sharing, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# V [libjvm.so+0xdd1fe5] PhaseCFG::insert_anti_dependences(Block*, Node*, bool)+0x18a5

It is somewhat intermittent, and seems to rely on StressIGVN.
Comments
Changeset: 199832a7 Author: Emanuel Peter <epeter@openjdk.org> Date: 2022-06-02 06:49:23 +0000 URL: https://git.openjdk.java.net/jdk/commit/199832a7101ca9dbfe7744ca0a1c4ff11d8832f2
02-06-2022

A pull request was submitted for review. URL: https://git.openjdk.java.net/jdk/pull/8783 Date: 2022-05-19 08:56:22 +0000
30-05-2022

Update: I reran all reported tests here, none reproduces after fixing CmpU issue in JDK-8286638 and implementing skeleton predicate initialization for peeled loops. Going back to commits around 22 Mar 2022, applying my new fixes: Test2.java seems to be fixed with CmpU fix JDK-8286638. Test.java seems unrelated to peeling skeleton_predicate fix and to CmpU fix. Test.java does not reproduce after JDK-8284742, however it seems unlikely that this was a true fix.
19-05-2022

It is now clear that we have two bugs here: 0029.zip -> CmpU under/overflow analysis -> new bug JDK-8286638 0015.zip -> skeleton_predicate missing after peeling -> this bug for now
12-05-2022

rr ./java -Xmx512m -XX:+UnlockDiagnosticVMOptions -XX:+StressLCM -XX:+StressGCM -XX:+StressCCP -XX:+StressIGVN -Xcomp -XX:CompileOnly=Test -XX:-TieredCompilation -XX:-LoopUnswitching -XX:StressSeed=147368847 Reduced0015_2.java Quick summary: We have a loop i = [1, x), -3 x: #int array access: a[i-1] RC is moved out of loop: we check if in range of a: init = 1 -> statically true limit = roughly x -> cannot know statically bc #int Peel Loop 1 iteration: RC out of loop Peeled loop with i=1 Loop i = [-2, x), -3 Already here: if we still had the RC now, and only now moved it out of the loop, the init i=-2 would not pass, and we could statically remove the loop. Pre-main-post-loop: pre-loop with 1 iteration RC out of loop Peeled loop with i=1 Pre-loop with i=-2 Main i = [-5, something), -3 post-loop (something) Problem: in the pre-loop we know that i=-2 But we cannot statically tell that we will never enter. But array access a[i-1] cannot happen, so it is optimized away starting at the corresponding ConvI2L , which only allows int:>=0 This rips away some data-flow, but the control-flow stays -> causes issues later. I can reason that we never enter the pre-loop, but the vm-code does not manage to do that. Why will we never enter it? if x>=1 then we never enter the loop at all. If we enter, then we check the limit in the RC that was moved before the loops. limit-stride+offset u< a.size() x- -3 + -1 = x + 2 u< a.size case distinction: x>=1 : we never enter loop at all because of loop guard x in {0,-1,-2} : pass limit RC, we enter loop, but only execute the peeled loop, never enter pre-main-post x < -2 : fail limit RC, don't enter loop The problem seems this: We only run the RC once, in a "generic" way that we cannot statically analyze. But we split the data-flow into separate loops, and ranges. These ranges are more specific and may turn out to be impossible. But we do not split the ctrl-flow (no duplication of the RC). With [~chagedorn] we have have been talking about skeleton-predicates. Apparently, these are copied from before a loop, to the cloned loops, eg if we do pre-main-post split. But it seems we don't do that in the peeling. We need to verify this, and see what the possible solutions could be.
11-05-2022

Reduced 0015.zip again, this reproduces, though with a slightly different assert: ./java -Xmx512m -XX:+UnlockDiagnosticVMOptions -XX:+StressLCM -XX:+StressGCM -XX:+StressCCP -XX:+StressIGVN -Xcomp -XX:CompileOnly=Test -XX:-TieredCompilation Reduced0015_1.java # Internal Error (/home/emanuel/Documents/jdk/open/src/hotspot/share/opto/loopnode.cpp:5401), pid=235607, tid=235620 # assert(!had_error) failed: bad dominance -> reverted the ConvI2L changes, and this still reporduces. This could well be an unrelated bug. -------- and with different flags: ./java -Xmx512m -XX:+UnlockDiagnosticVMOptions -XX:+StressLCM -XX:+StressGCM -XX:+StressCCP -XX:+StressIGVN -Xcomp -XX:CompileOnly=Test -XX:-TieredCompilation -XX:-LoopUnswitching Reduced0015_1.java # Internal Error (/home/emanuel/Documents/jdk/open/src/hotspot/share/opto/loopnode.cpp:5834), pid=322227, tid=322240 # assert(false) failed: Bad graph detected in build_loop_late
06-05-2022

I just tried different optimization flags with Reduced0015_2.java: ./java -Xmx512m -XX:+UnlockDiagnosticVMOptions -XX:+StressLCM -XX:+StressGCM -XX:+StressCCP -XX:+StressIGVN -Xcomp -XX:CompileOnly=Test -XX:-TieredCompilation -XX:-LoopUnswitching Reduced0015_2.java # Internal Error (/home/emanuel/Documents/jdk/open/src/hotspot/share/opto/gcm.cpp:276), pid=320662, tid=320675 # assert(false) failed: unscheduable graph ------------------------------- ./java -Xmx512m -XX:+UnlockDiagnosticVMOptions -XX:+StressLCM -XX:+StressGCM -XX:+StressCCP -XX:+StressIGVN -Xcomp -XX:CompileOnly=Test -XX:-TieredCompilation -XX:LoopMaxUnroll=0 -XX:LoopUnrollLimit=0 Reduced0015_2.java # Internal Error (/home/emanuel/Documents/jdk/open/src/hotspot/share/opto/loopnode.cpp:5401), pid=320940, tid=320953 # assert(!had_error) failed: bad dominance Node: with this one I'm less sure it is related to ConvI2L, given what I quickly saw
06-05-2022

Updated plan: Debug / fix Reduced0015_2.java Then see if it also fixes 0015.zip (I expect that), and Reduced0015_1.java (possibly a separate Bug). Then, we see if the fixes are really separate, and if they are separate enough, we create separate bugs / change-sets.
03-05-2022

Before I had cached a half-reduced result of 0015.zip: ./java -Xmx512m -XX:+UnlockDiagnosticVMOptions -XX:+StressLCM -XX:+StressGCM -XX:+StressCCP -XX:+StressIGVN -Xcomp -XX:CompileOnly=Test -XX:-TieredCompilation Reduced0015_2.java # Internal Error (/home/emanuel/Documents/jdk/open/src/hotspot/share/opto/gcm.cpp:1423), pid=256502, tid=256515 # assert(false) failed: graph should be schedulable This still looks like the original Bug reported through 0015.zip, and is somehow related to the ConvI2L refactoring (at least only occurs after it).
03-05-2022

Plan: Debug 0015.zip, see if related to 0029.zip. Once both are fixed, go back in commit history, see if Test.java and Test2.java are fixed by these issues.
02-05-2022

Implementation: const TypeInt* cmp1 = sub(tr1, t2)->is_int(); const TypeInt* cmp2 = sub(tr2, t2)->is_int(); // compute union, so that cmp handles all possible results from the two cases return cmp1->meet(cmp2); After this change these do not throw asserts: 0029.zip, Test.java, Test2.java (though probably the last 2 don't reporduce because of other changes in the last 3 weeks). But 0015.zip still fails: ./java -cp 0015/ -Xmx512m -XX:+UnlockDiagnosticVMOptions -XX:+StressLCM -XX:+StressGCM -XX:+StressCCP -XX:+StressIGVN -Xcomp -XX:CompileOnly=Test -XX:-TieredCompilation -XX:StressSeed=578621535 Test # Internal Error (/home/emanuel/Documents/jdk/open/src/hotspot/share/opto/gcm.cpp:276), pid=360398, tid=360411 # assert(false) failed: unscheduable graph
29-04-2022

With the help of [~thartmann], we tracked it to CmpUNode::Value. There, we analyze if the in1 is an AddI. We detect that this AddI may have 2 ranges: tr1: int:<=-1:www tr2: int:max (underflow: minint-1) We then check how these ranges compare to in2: t2: int:>=0 For this we compute: const Type* cmp1 = sub(tr1, t2); -> TypeInt::CC_GT = [1] const Type* cmp2 = sub(tr2, t2); -> TypeInt::CC_GE = [0...1] But then, we only do something with this result if cmp1 == cmp2. https://github.com/openjdk/jdk/blame/master/src/hotspot/share/opto/subnode.cpp#L832 However, I wonder if we can not just take the union of cmp1 and cmp2, which would be [0...1] = [GE] Then, the output node [655 Bool] checks for [lt], which we could know is never true. We could conclude that the Range-check never passes. This would then also kill the control-flow, in parallel to the data-flow that is already killed with ConvI2L.
28-04-2022

I have a better description now: Scenario: type i: [minint...0] access to c[i-1] Range-check: int index = AddI(i, -1) -> type index: [minint-1 ... -1] -> underflow -> int CmpU(index, c.size()) [lt] -> checks index>=0 and index<c.size() Consequence: range-check cannot statically determine that the access is never ok. Sad, because we can manually tell that the range [minint-1 ... -1] should not pass the range check. Data-flow: long index = ConvI2L( AddI(i, -1) ) -> type of ConvI2L: [0...maxint-1] -> why do we know this? Because this is before an array access. We assume range-check guarantees index in range [0...c.size()-1], and c.size()<=maxint. Then there is a push_thru_add, and we get: long index = AddL( ConvI2L(i), -1) -> type of new ConvI2L: [1...maxint-1] - because we correct the lo by 1 for the add. Somehow we do not adjust hi, in my opinion it should now be maxint, to correct by 1. Consequence: if hi is maxint or maxint-1, there is no overflow. Then, we statically detect that: type i: [minint...0] type ConvI2L: [1...maxint-1] -> filter results in TOP -> data-flow is eliminated sucessfully. Result in the end: data-flow collapses, while control-flow (range-check) does not collapse. This leads to issues described above.
28-04-2022

2 Additions: Why does this only happen with the ConvI2L refactoring I did earlier? Before the refactoring, we used to: 1: expand the type to input type (in Ideal) 2: filter input and output type (in Value) Since we expanded before filtering, we just expanded to input type, then filtering filtered input-type and input-type - so we kept the ConvI2L. After refactoring: We do filtering and expansion both in Value. We filter first, and immediately return TOP. The ConvI2L collapses the data-flow. Second point: Somehow, the range-check expands the type to int, because of underflow (-1), but ConvI2L has no such under/overflow issue, and keeps the smaller range. I need to investigate why there is this difference.
28-04-2022

Update: I am a little closer to understanding now: We have loop unrolling 2x, and are working with i<=0 -- [min...0]. We have an array access to c[i-1] . This array access requires a range-check. Unfortunately, the range-check cannot be constant folded and removed: The input value to the range-check is first range [min...0], but because access is at i-1, we get underflow for min, and so the input for range check is expanded to #int . However, the dataflow after the range-check is eliminated like this: We first do have a CastII pinning the data-flow to the range-check. But after we are in post_loop_opts_phase, we remove the CastII. Then, the data-flow starts with a ConvI2L node, which has type [1...max-1] (this is because it leads to a memory access, hence range [0..max-1], the lower bound 1 is because of the -1 offset which is calculated in below, technical detail). The input type to this node however is <=0. So now that the CastII node is gone, we can filter the input range with the range of ConvI2L and see that the result is empty -> TOP. While the control-flow stays put (range-check), the data-flow erodes (ConvI2L). Eventually, some memory-Phi node dies below a loadF. When we try to compute the LCA for all outputs to loadF, we do not see the old memory-Phi we should see, but one that has much larger scope. Then the LCA ends up much higher than loadF, which messes up the asserts.
28-04-2022

What I see in this case: [260 LoadF] used to have [617 Phi] as output, which is eventually removed because it gets top due to [644 ConvI2L] becoming TOP (chain reaction eats away many nodes, including loads and stores in the loop body). When we calculate LCA for [260 LoadF], we now do not see [617 Phi], put its output node [526 Phi], which has a much wider scope, pushing up the LCA above [260 LoadF] - which does not make sense. I suspect that we maybe should not even enter this loop, because if we already removed the loads / stores, then we actually won't ever do any "work". Looking at Reduced0029.java, the loop will actually never be entered. I fear that we allowed the data-flow to deteriorate while the control-flow stays standing.
27-04-2022

Picking this up 3 weeks later, some of the tests do not reproduce anymore, sadly. However, this one reproduces the same way: ./java -XX:+UnlockDiagnosticVMOptions -XX:+StressIGVN -Xcomp -XX:CompileOnly=Test -XX:-TieredCompilation Reduced0029.java
26-04-2022

# .../src/hotspot/share/opto/gcm.cpp:1423), pid=14603, tid=14616 # assert(false) failed: graph should be schedulable # in PhaseCFG::schedule_late Has same pattern like all others. Before we first expand the type of ConvI2L to input (not TOP). After my refactoring we first filter input and type of ConvI2L, which results in TOP. May be the same bug, or just the same change that reveals multiple bugs.
01-04-2022

i reduced 0015.zip down to less lines of java. See Reduced0015.java. It seems to consistently reproduce. ./java -XX:+UnlockDiagnosticVMOptions -XX:+StressLCM -XX:+StressGCM -XX:+StressCCP -XX:+StressIGVN -Xcomp -XX:CompileOnly=Test -XX:-TieredCompilation Reduced0015.java # Internal Error (/home/emanuel/Documents/fork2-jdk/open/src/hotspot/share/opto/gcm.cpp:1211), pid=274092, tid=274105 # assert(false) failed: graph should be schedulable # in PhaseCFG::hoist_to_cheaper_block
01-04-2022

Spotted another similar occurrence with a fuzzer test that starts to fail after JDK-8230382 (attached as Test2.java) which might also be related: $ java -Xmx1G -Xcomp -Xbatch -XX:CompileOnly=Test -XX:CompileCommand=quiet -XX:-TieredCompilation -XX:+IgnoreUnrecognizedVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+StressLCM -XX:+StressGCM -XX:+StressIGVN -XX:StressSeed=785926479 Results in: assert(false) failed: graph should be schedulable
01-04-2022

TODO: reduce Christians fuzzer test with this: # assert(n->is_Root() || n->is_Region() || n->is_Phi() || n->is_MachMerge() || def_block->dominates(block)) failed: uses must be dominated by definitions $ java -Xmx1G -Xcomp -Xbatch -XX:-TieredCompilation -XX:CompileOnly=Test -XX:CompileCommand=quiet Test.java
01-04-2022

I reduced Christian's fuzzer test crash: java -Xmx1G -Xcomp -Xbatch -XX:CompileOnly=Test -XX:CompileCommand=quiet -XX:-TieredCompilation -XX:+IgnoreUnrecognizedVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+StressLCM -XX:+StressGCM -XX:+StressIGVN -XX:StressSeed=814401927 Test.java Currently running to further simplify it, but for now look at ReduceX.java ./java -Xcomp -Xbatch -XX:CompileOnly=Test -XX:-TieredCompilation -XX:+UnlockDiagnosticVMOptions -XX:+StressLCM -XX:+StressGCM -XX:+StressIGVN ReduceX.java # Internal Error (/home/emanuel/Documents/fork2-jdk/open/src/hotspot/share/opto/gcm.cpp:473), pid=274260, tid=274273 # assert(early->dominates(LCA)) failed: early is high enough
01-04-2022

I reduced 0029.zip down to a few lines of java. See Reduced0029.java. Reproduces in about 20% of runs. ./java -XX:+UnlockDiagnosticVMOptions -XX:+StressIGVN -Xcomp -XX:CompileOnly=Test -XX:-TieredCompilation Reduced0029.java # Internal Error (/home/emanuel/Documents/fork2-jdk/open/src/hotspot/share/opto/gcm.cpp:766), pid=274185, tid=274198 # assert(!LCA_orig->dominates(pred_block) || early->dominates(pred_block)) failed: early is high enough
01-04-2022

Same behavior of test in JDK-8283460, therefore I closed that one as a duplicate of this one.
24-03-2022

I narrowed it down to half of JDK-8230382: only apply src/hotspot/share/opto/castnode.cpp -> CastIINode The reported tests do not fail only apply src/hotspot/share/opto/convertnode.cpp -> ConvI2LNode All the reported tests fail
24-03-2022

Found a similar looking fuzzer crash (attached as Test.java, hs_err_pid6905.log) that starts to fail after JDK-8230382: $ java -Xmx1G -Xcomp -Xbatch -XX:CompileOnly=Test -XX:CompileCommand=quiet -XX:-TieredCompilation -XX:+IgnoreUnrecognizedVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+StressLCM -XX:+StressGCM -XX:+StressIGVN -XX:StressSeed=814401927 Test.java Another assertion is triggered with these flags, also starts to fail after JDK-8230382 (hs_err_pid6767): # assert(n->is_Root() || n->is_Region() || n->is_Phi() || n->is_MachMerge() || def_block->dominates(block)) failed: uses must be dominated by definitions $ java -Xmx1G -Xcomp -Xbatch -XX:-TieredCompilation -XX:CompileOnly=Test -XX:CompileCommand=quiet Test.java
24-03-2022

Ah yes, I forgot to attach mine. It is in 0015.zip now.
22-03-2022

[~shade] Might it be that you forgot to attach the fuzzer test? [~dlong] Ok, I can look into this. Could well be that something got reordered and messed up. Or that it uncovers a bug that was already there.
22-03-2022

ILW = assert failure with fuzzer; intermittent; no workaround = MMH = P3
21-03-2022

[~epeter], could you investigate if this is related to the JDK-8230382 changes?
21-03-2022