JDK-8280126 : C2: detect and remove dead irreducible loops
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 8,11,11.0.12-oracle,17,18,19,20,21
  • Priority: P3
  • Status: In Progress
  • Resolution: Unresolved
  • Submitted: 2022-01-13
  • Updated: 2023-01-12
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 21
21Unresolved
Related Reports
Duplicate :  
Duplicate :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Description
ADDITIONAL SYSTEM INFORMATION :
OS version:
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.3 LTS
Release:        20.04
Codename:       focal

We used the following version:

java version "11.0.12" 2021-07-20 LTS
Java(TM) SE Runtime Environment 18.9 (build 11.0.12+8-LTS-237)
Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.12+8-LTS-237, mixed mode)

java version "17.0.1" 2021-10-19 LTS
Java(TM) SE Runtime Environment (build 17.0.1+12-LTS-39)
Java HotSpot(TM) 64-Bit Server VM (build 17.0.1+12-LTS-39, mixed mode, sharing)

openjdk version "11-internal" 2018-09-25
OpenJDK Runtime Environment (fastdebug build 11-internal+0-adhoc.minghai.jdk11)
OpenJDK 64-Bit Server VM (fastdebug build 11-internal+0-adhoc.minghai.jdk11, mixed mode)

A DESCRIPTION OF THE PROBLEM :
The attached fuzzer test trapped in dead loop with "-Xcomp"(run in compiled mode), but it passed with "-Xint"(run in interpreted mode) or without JVM parameters(run in mixed mode)

ERROR MESSAGES/STACK TRACES THAT OCCUR :
When we run the test in jdk17.0.1 and jdk11.0.12, both in compiled mode(with "-Xcomp"), it trapped in dead loop. But when  run the test in jdk17.0.1 and jdk11.0.12, both in mixed mode or interpreted mode(with "-Xint), it passed successfully.

Besides, when run in jdk11-internal, mixed mode or interpreted mode, it passed successfully.
But when run in jdk11-internal, compiled mode, it crashes with the following messages:
933   ConI    ===  0  [[ 4764  1447  409  1486  3173  4798  4803  4805  4756  304  6451  4769  4795  4796  410  4797  4793  6188  4763  412  5535  305  6053  5642  6055  6469  4770  5005  5006  4403  413  7005  6008  4403  5777  412  4403  411  411  6188  380  6007  5780  5009  5052  5640  303  6046  1405  5589  412  461  303  305  3729  5531  7151  4732  411  6188  305  410  1450  6140  413  3027  4403  4699  6188  411  413  6769  3119  6726  5385  496  5772  5036  339  5638  6019  304  6147  5384  303  6427  1322  304  1836  411  362  410  1399  305  4403  411  412  5390  411  5533  6188  6142  6532  6470  6064  411  6472  6645  411  6188  336  6188  6188  6035  6188  6188  4743  3730  6723  6019  4403  334  6452  4747  5536  3054  5538  5522  412  341  5530  305  5383  413  5470  304  6188  5512  303  6019  5523  304  4403  305  335  412  303  4403  6188  4403  304  303  304  4403  303  6720  6625  6724  381  412  305  304  412  412  304  412  412  303  413  305  305  6639  5525  5637  305  413  303  304  303  305  303  304  4403  6026  337  338  7022  342  340  3076  7033 ]]  #int:1 !jvms: JSON::parseObject @ bci:477
 3054   AddI    === _  3054  933  [[ 6428  6026  7008  339  1415  342  335  335  336  336  6019  3054  334  334  340  3091  7033  341  5501  338  337  7022 ]]  !orig=[7007],... !jvms: JSON::parseObject @ bci:821
# To suppress the following error report, specify this argument
# after -XX: or in .hotspotrc:  SuppressErrorAt=/phaseX.cpp:885
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/home/minghai/jdk11/src/hotspot/share/opto/phaseX.cpp:885), pid=22769, tid=22780
#  assert(no_dead_loop) failed: dead loop detected
#
# JRE version: OpenJDK Runtime Environment (11.0) (fastdebug build 11-internal+0-adhoc.minghai.jdk11)
# Java VM: OpenJDK 64-Bit Server VM (fastdebug 11-internal+0-adhoc.minghai.jdk11, compiled mode, tiered, compressed oops, g1 gc, linux-amd64)
# Core dump will be written. Default location: core.22769 (may not exist)
#
# An error report file with more information is saved as:
# /mnt/c/Users/Minghai/hs_err_pid22769.log
#
# Compiler replay data is saved as:
# /mnt/c/Users/Minghai/replay_pid22769.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
#
Current thread is 22780
Dumping core ...
zsh: abort (core dumped)   -Xcomp -cp  org.junit.runner.JUnitCore

the log files, hs_err_pid22769.log and replay_pid22769.log are also attached.

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
1. extract the bug.zip
2. in dictionary "bug", run command:
java -cp ./bugFiles:./util:./junit.jar:./hamcrest.jar:./target/classes:./target/test-classes org.junit.runner.JUnitCore com.alibaba.fastjson.deserializer.issue1463.TestIssue1463

you may add "-Xcomp" or "-Xint" to get different results.


---------- BEGIN SOURCE ----------
to be attached in bug.zip
---------- END SOURCE ----------


Comments
A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/11764 Date: 2022-12-22 08:40:48 +0000
22-12-2022

Update: I am now working on a test-collection: TestIrreducibleLoops.jasm test_001: Q3.jasm - relatively simple irreducible loop test_002: Q1.jasm - triggers split_fall_in test_003: N.jasm - cut off both entries at the same time, would trigger CFG assert test_004: A2.jasm - triggers PhiNode::merge_through_phi "sanity" assert because of dead Phi-loop test_005: K.jasm - infinite irreducible loop -> bailout test_006: D2.jasm - triggers data dead-loop assert with self-referencing AddI node - StressIGVN can make it intermittent, as the order of decay matters test_007: old entries lose "backedge" controls, and are collapsed. Internal nodes take over as irreducible loop entry. test_008: same, but collapse happens already during parsing, not loop-opts like test_007. test_009: same as test_007, except that there are additional if-statements that are no longer in loop when old entries lose "backedge" control. New entries are below the if-section. Update (Nov 15): test_010: reducible loop in irreducible loop, with same loop head. test_011: irreducible loop entry detection is not consistent, depends on DF traversal order. test_012: WIP
15-11-2022

I found a T.java that produces an irreducible loop, which is then later cut off, and the data internally collapses in a bad way, such that we create a data-dead-loop -> assert. In production, this leads to a SIGSEGV. See T.jasm, and image series in zip. Analysis: I have a "killswitch" (red), the "if (x == 0)". This cuts the backedge from before the OSR. I wanted a "empty_loop" to collapse during OSR-compilation. Since it contains a backedge, I have to make sure it is visited once before OSR compilation, hence LOOP3 (triggers OSR, green) is in an extra if-statement. There, we also set x=1, which activates the "killswitch". It then takes a while until the "empty_loop" properly falls apart - eventually during IGVN it does, rips away the entries to the irreducible loop (always have y==0, return directly). We have the data for "i" -> it quickly collapses when the control flow collapses (pink). We can see that the control flow of the LOOP1 is already very close to dying too -> 252 SafePoint will soon disappear, and remove 357 Loop, eventually the whole code would be deleted - except we process "310 LShiftI" (i *= 2) first, which is already a self-loop. Solution: we have to make sure to detect when the irreducible loop is disconnected, and aggressively remove control-flow. What this shows: it is possible to create difficult irreducible loops with just pure Java, with OSR. Unfortunately, I still need to restrict the trap-limit, will need to investigate if that is really necessary. It seems to take a lot to trigger such a scenario - but it is possible. Reproduce it like this: java -XX:CompileCommand=compileonly,T::test -XX:-TieredCompilation -XX:PerMethodTrapLimit=0 T.java I usually run it like this: java -XX:+UnlockExperimentalVMOptions -XX:+UseG1GC -XX:CompileCommand=compileonly,T::test -XX:-TieredCompilation -XX:-RenumberLiveNodes -XX:PerMethodTrapLimit=0 -XX:+TraceLoopOpts -Xbatch T.java Results: (product) openjdk 17.0.4 2022-07-19 OpenJDK Runtime Environment (build 17.0.4+8-Ubuntu-120.04) OpenJDK 64-Bit Server VM (build 17.0.4+8-Ubuntu-120.04, mixed mode, sharing) # SIGSEGV (0xb) at pc=0x00007fb7d02ab8fa, pid=1412770, tid=1412783 (debug) 0 Root === 0 316 [[ 0 1 3 249 21 182 26 238 30 212 34 209 38 67 81 83 87 88 91 92 206 139 142 153 164 370 425 503 538 571 573 575 609 ]] 91 ConI === 0 [[ 311 167 166 310 252 179 ]] #int:1 310 LShiftI === _ 310 91 [[ 311 310 252 289 167 293 273 302 ]] !orig=[251],263 !jvms: T::test @ bci:151 (line 10) # Internal Error (/home/emanuel/Documents/fork6-jdk/open/src/hotspot/share/opto/phaseX.cpp:943), pid=1413030, tid=1413043 # assert(no_dead_loop) failed: dead loop detected # # JRE version: Java(TM) SE Runtime Environment (20.0) (slowdebug build 20-internal-2022-11-09-0613525.emanuel...) # Java VM: Java HotSpot(TM) 64-Bit Server VM (slowdebug 20-internal-2022-11-09-0613525.emanuel..., mixed mode, compressed oops, compressed class ptrs, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0x115478c] PhaseGVN::dead_loop_check(Node*)+0x180 Edit: The trap-limit is necessary here, else we get an uncommon_trap at the loop-exit of LOOP3, since the loop-exit was never taken before OSR: 122 CallStaticJava === 117 98 119 8 9 (121 1 1 101 1 102 103 104 1 109 ) [[ 123 ]] # Static uncommon_trap(reason='unstable_if' action='reinterpret' debug_id='0') void ( int ) C=0.000100 T::test @ bci:65 (line 19) reexecute !jvms: T::test @ bci:65 (line 19) Maybe we can somehow avoid hitting any of the profiling unc_traps. But I think it is quite difficult. An easier, but probably contrived way is to exhaust the traps. Edit2: T2.java We can exhaust the unc_traps. But it is a bit contrived. java -XX:CompileCommand=compileonly,T::test -XX:-TieredCompilation -XX:PerMethodTrapLimit=5 T.java I have 5 "decoy" for loops. Each one triggers OSR, and sets a "unstable_if" at the loop-exit. Once the OSR execution exits the respective loop, the unc_trap is triggered, we decompile. Then the next for-loop OSR compiles. Until we exhaust the unc_trap limit for "unstable_if". Default trap limit is 100, and given this example we could extend it easily by copying the for-loops 100 times. Maybe there is another solution, but I did not get simple if / switch to work for that. Edit3: T3.java java -XX:PerMethodTrapLimit=10 T3.java We can exhaust the trap limit, and do not need the number of "decoy" loops to be exactly equal to the trap limit, the trap limit just needs to be lower than the number of "decoy" loops. Xcomp seems required, but that can probably also be circumvented. -Xbatch may be required if the "decoy" loops do not perform enough iterations. Note: we can easily duplicate the "decoy" loops more, and exhaust the default trap limit of 100.
11-11-2022

Update to previous Work Status report: https://bugs.openjdk.org/browse/JDK-8280126?focusedCommentId=14532876&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14532876 Fuzzing found more of the same 3 asserts: CFG -> assert(!in->is_CFG()) failed: CFG Node with no controlling input? sanity -> assert(cached_vbox != __null) failed: sanity dead-loop -> assert(no_dead_loop) failed: dead loop detected All of these seem to have to do with dead irreducible loops. However, the sanity assert is somewhat less related, as it seems to be an optimization that makes assumptions that are not really given. The CFG and the dead-loop assert can be prevented if we track the irreducible regions properly, and do connectivity check every time we lose a control input. In addition, fuzzing seems to hit my own assert, if I do not track the irreducible regions correctly (eg. Q1.jasm). Fuzzing also found 3 unrelated asserts, bugs reported in: JDK-8296412, JDK-8296389, JDK-8296318 Good test candidates: N.jasm - triggers CFG assert, with simple irreducible loop prepended by empty loop, in PhaseIdealLoop::verify A2.jasm - triggers merge_through_phi sanity assert, with irreducible loop in loop, right after parsing K.jasm - bailout bad CFG. K3 infinite irreducible loop, under loop with modulo condition that collapses during loop-opts. D2.jasm - trigger "dead loop detected". Not as simple as maybe hoped for, but maybe hard to do simpler. empty loop plus irreducible loop with nested loop. Irreducible loop dies from inside via data to control to data. Q1.jasm - trigger assert that checks that we do not find irreducible regions that we did not reviously mark. This Q1.jasm creates a new Region in split_fall_in that must be marked as irreducible if the previous region was marked irreducible. What I still wish: maype: dead_loop assert in a simpler way - maybe with StressIGVN? (really not so important - but I need to run with StressIGVN during fuzzing) Edit done: infinite irreducible loop maybe: OSR irreducible graph case - but probably not relevant Edit most likely not possible: irreducible loop where one entry falls quick, the other only later irreducible loop that gets converted into a int-counted loop, maybe then becomes dead also (the previous two ideas could be done with the empty_loop and the unrolling template) (why it is not possible: once we lose all but one entry, the graph becomes reducible -> collapses normally) maybe: trigger merge_through_phi outside an irreducible loop - probably not possible
10-11-2022

Not all irreducible loop heads are necessarily "irreducible loop entry". See Q3.jasm as example. 43 Region === 43 _ 39 [[ 43 41 ]] #irreducible-entry !jvms: Q3::test @ bci:14 44 Region === 44 _ 38 [[ 44 24 ]] #irreducible-entry !jvms: Q3::test @ bci:22 60 Region === 60 _ 56 [[ 60 58 ]] #irreducible !jvms: Q3::test @ bci:25
08-11-2022

FYI: Fuzzing found another assert triggered. Looks like a separate bug. Filed it as JDK-8296412.
04-11-2022

R.jasm (see file and pictures, including series in zip file) # Internal Error (/home/emanuel/Documents/fork2-jdk/open/src/hotspot/share/opto/node.cpp:830), pid=1965983, tid=1965996 # assert(idx < _cnt) failed: oob # V [libjvm.so+0x1095f68] Node::del_req(unsigned int)+0x26 Happens in PhaseCFG::convert_NeverBranch_to_Goto Context: Normal case: "live"/"succ" projection is added as output of NeverBranch before the "dead" projection leading to Halt. Details: During DFS, usually we go from Halt to NeverBranch. Then via Region/Loop, take backedge, and find the "live"/"succ" edge. We already have its inputs (NeverBranch), thus we can now post-visit the live edge, and attach it to the NeverBranch first. Later, once we have processed the whole infinite loop, we post-visit out of NeverBranch to the "dead" projection edge, which we attach second. In R.jasm: Abnormal order: "dead" projection is first attached to NeverBranch, and "live"/"succ" projection is added second. Details: In our pathological case, we go in through the shared 122 Halt, and first visit the 166 NeverBranch of the peeled iteration, and visit all of what is above. But there is no backedge, so we will not find the "live" edge, and we post-visit the "dead" edge first (171 CProj). Then, we take the second branch of the Halt, and visit the peeled loop. From there, we finally find the "live" projection (167 CProj) of the peeled iterations NeverBranch, and attach it second. A further detail: 159 Halt is visited even later, which is somewhat relevant, because if it was visited before 122 Halt, then we would first attach the "live" 167CProj, and avoid the bug. Why is the order of "live"/"succ" Projection vs the "dead" one relevant? It predicts in what order we later DFS traverse the nodes for scheduling the nodes into blocks, and also matters the order of successor blocks. It seems that the code in PhaseCFG::convert_NeverBranch_to_Goto expects the "succ" projection and block to have idx=0. On first appearance the code looks like it should also handle the inverted case where "succ" has idx=1, but the code has a bug, where we overwrite the _succs array where the "dead" block reference is stored before we read it off. This looks somewhat related to irreducible loops, but maybe this bug is reproducible without irreducible loops. In this case, the decaying irreducible loop creates a situation with multiple NeverBranch nodes in a subgraph, and a special order of creation which seems relevant for the specific DFS traversal during matching, which leads to the inversion, and the eventual assert in the DFS traversial during scheduling. Update: I found a case without irreducible loops! Bug filed under JDK-8296389
04-11-2022

I have started a jasm-byte-code-fuzzer project. It spat out some jasm files that trigger some asserts that are already reported here. But I found a new case. P.jasm (see attached, plus pictures) # assert(is_loop || block->find_node(def) < j) failed: uses must follow definitions Analysis: The assert checks that if both use and def are in the same block, we have def < use (before). There are some exceptions where we do not check this. First, we do not check this if we have a LoopNode as the block head, since the backedge might cause def/use to be in a loop (def-use-....-def). There is another special case "tight loop with no LoopNode". I am currently not sure what exactly this could be. Maybe it can even be removed. In this specific example, the block-region can be traced back to a irreducible-loop-entry/head. Possibly this is a new case we have not handled, and may have to fix / special case in the assert. More info: We only run PhaseIdealLoop::build_loop_tree once, and find the NeverBranch case (no loop exit). Thus, in this run, the found loops are not attached to the loop tree, and the Regions are not transformed to LoopNodes. Q: is this really infinite loop specific? Found non-irreducible case for this: JDK-8296318
03-11-2022

I am working on a fix, that tags the Regions "irreducible" if we find out at some point (during parsing in ciTypeFlow) that they are either the head or a "secondary entry" to an irreducible loop. Whenever such a tagged node loses any control input, we must check if the region is still connected to Root. This solution assumes that we can also detect new Region nodes that are created "irreducible", and we must tag them too. I found a Q1.jasm (attached, with 2 pictures). We see that in split_fall_in, we have a loop-head Region node that is tagged "irreducible", and we multiple of its fall-in controls away into a separate region. It is now possible that this new separate region takes over as the entry to the irreducible loop. In Q1.jasm, this is the case. TODO: There may be more such cases, I need to investigate.
02-11-2022

A2.jasm (see file and 2 attached pictures) We seem to be generating some sort of loop/irreducible subgraph, that is still connected at PHASE_AFTER_PARSING. But during IGVN (PHASE_ITER_GVN1), we lose control, and the graph disconnects, without collapsing. We trigger this assert because we have a dead-phi-loop: # assert(cached_vbox != __null) failed: sanity Interesting about this assert is that it triggers so early, right after parsing. The question is if this really depends on the irreducibility in the graph. I think it has to do something with us inserting more Phi nodes if there is an irreducible loop.
28-10-2022

Good test candidates: N.jasm - triggers CFG assert, with simple irreducible loop prepended by empty loop, in PhaseIdealLoop::verify A2.jasm - triggers merge_through_phi sanity assert, with irreducible loop in loop, right after parsing K.jasm - bailout bad CFG. K3 infinite irreducible loop, under loop with modulo condition that collapses during loop-opts. D2.jasm - trigger "dead loop detected". Not as simple as maybe hoped for, but maybe hard to do simpler. empty loop plus irreducible loop with nested loop. Irreducible loop dies from inside via data to control to data. What I still need/want: infinite irreducible loop irreducible loop where one entry falls quick, the other only later irreducible loop that gets converted into a int-counted loop, maybe then becomes dead also maype: dead_loop assert in a simpler way maybe: trigger merge_through_phi outside an irreducible loop - probably not possible maybe: OSR irreducible graph case - but probably not relevant
28-10-2022

I attached D2.jasm, and pictures (see zip for the 10 interesting steps) # assert(no_dead_loop) failed: dead loop detected For a self-referencing AddI node Code: LOOP_Y: empty_loop (violet) Condition that turns out to be false after LOOP_Y collapses -> LOOP_3 is cut off LOOP_3: irreducible loop, with an inner loop Analysis: LOOP_Y collapses and cuts off LOOP_3 in PhaseIdealLoop. During IGVN, we see that LOOP_3 dies from the inside out, which leads to some bad states. For one, we have an AddI that references itself, but also an 140 If has no IfTrue anymore. This happens because the data-nodes for register 0 eventually collapse, as they realize that they have no inputs. 106 Phi finds that its unique input is TOP. As the inner loop of LOOP_3 has a condition depending on register 0, the dying data-flow makes the control-flow die as well. Eventually, 192 Region dies, which then also removes 193 Phi from register 12, and 159 Phi realizes that 175 AddI is its unique input and subsumes itself with that 175 AddI. Now we have a self-reverencing AddI, which leads to the dead-loop assert next time the AddI is processed on the worklist. (edited)
28-10-2022

Just found JDK-6742111, a case where we seem to generate irreducible loops during an optimization. Not sure if it is still true. And there is a comment in the code: // disable assert until issue with split_flow_path is resolved (6742111) // assert(!_has_irreducible_loops || C->parsed_irreducible_loop() || C->is_osr_compilation(), // "shouldn't introduce irreducible loops"); it seems to be the only use of parsed_irreducible_loop Problem with this: If we have no irreducilbe loop in the graph, and then suddenly an optimization creates irreducibility, we might not realize until PhaseIdealLoop build_loop_tree. And then who knows if we might not end up with dead code.
26-10-2022

N.jasm (see file attached and 2 graph pictures) LOOP_Y (orange) collapses during PhaseIdealLoop, as an empty_loop. This makes that 87 If always is False, so during IGVN of PhaseIdealLoop, LOOP_3 (green) loses control. But LOOP_3 does not collapse, and in PhaseIdealLoop::verify we detect it as dead-code: # assert(!in->is_CFG()) failed: CFG Node with no controlling input? Why does LOOP_3 (green) not collapse? We have an irreducible loop (head at 104 Region, second entry at 103 Region). Thus, we have no LoopNode, and when one of those region loses control, we decide that there is no "unsafe case". As far as we understand, the definition of "safe dead loops" depends on having only reducible loops. Result of conversation: We need to know if we detect all irreducible loops. If yes, we could try one of these solutions: 1) have global flag "have any irreducible loops". If true, we always do reachability check if any Region loses any control input. Simple solution, but may be expensive, even for regions that are not part of irreducible loops. 2) Find a way to mark all Regions that are implicated by the irreducible loop, and only do reachability there. 3) Bailout would not be very desirable. 4) An eventual goal should be to elliminate irreducibility from C2. One task worth investing in: See if we can reproduce dead-loop asserts elsewhere (outside PhaseIdealLoop) with similar code.
26-10-2022

I am currently playing around to get more examples. And make them more readable. My working hypothesis was that we do not handle irreducible loops well during PhaseIdealLoop and its IGVN runs. Now I found a K.jasm where we bailout with reason "unhandled CFG detected during loop optimization". The comments around that location tell me that PhaseIdealLoop::build_loop_tree_impl is not happy that it found a second loop entry. I attached the K.jasm, plus a CFG snapshot in picture form. Now I need to find out for which irreducible loops we bailout, and why, and why we do not bailout in the other cases I found. Update: we indeed first enter the "green loop" via 116, and later also via the second entry 108. We have already post-traversed the whole "green loop" from 116, so when we look at it again from 108, we detect that we were have a second entry -> bailout. I wonder if this bailout is necessary? Could we not just "undo" the loops here?
20-10-2022

Summary: 1. I think we have two bugs. I have a H.jasm where we do not hit the "sanity" assert because we never generate a dead-loop that is only Phi's. 2. I have multiple examples now, where we can see that if we produce a data-dead-loop during IGVN, and the dead-loop is not removed: 2a) This can trigger asserts during IGVN, like if we are left with a singular AddI node that references itself. 2b) I can also extend it to 3-hop cycles of for example MulF,ConvF2I,ConvI2F. This is not cought by the dead-loop assert, because we only catch 2-hop cycles at most. 2c) Further, there are cases where we are left with a Phi node that has a AddI as left input and AddI as right input, but both of those have the Phi node as input -> we have a :infinity: shaped double-dead-loop. This is not detected. 2d) I can create a more complex loop with multiple Phi's and AddI, which is also a dead-loop. Cases 2 b-d are all not detected during IGVN, and run into dead-code detection in PhaseIdealLoop::verify. # assert(!in->is_CFG()) failed: CFG Node with no controlling input? The problem is that we assume that there is no dead code after IGVN. --- I think in "normal" cases (the ones where we do not have some ugly irreducible loop), we do not have Phi nodes that dead-loop in this way (e.g. :infinity: shape). Normally, the dead-loop is expected to collapse. If the loop has no dependency on the data-dead-loop (e.g. and If in the loop depends on that data), we have no problem, the data has no use and it is removed (not sure yet how the control is also removed, but that seems to happen somehow I guess). If we do have some If that depends on that data-dead-loop, it seems we do need the data-dead-loop to collapse / be replaced with TOP. This causes the control also to collapse. If the control does not collapse, we eventually have some control connection down to Root, and PhaseIdealLoop::verify will find that connection and assert because it sees that this is dead code.
12-10-2022

H.jasm -> similar to D.jasm, but I can avoid hitting the "sanity" assert. This shows that we most likely do have two separate bugs here. java -jar ~/Documents/asmtools-7.0-build/release/lib/asmtools.jar jasm H.jasm ./java -XX:+UnlockExperimentalVMOptions -XX:+UseG1GC -XX:+PrintOptoAssembly -XX:-UseLoopPredicate -XX:-UseProfiledLoopPredicate -XX:+TraceLoopOpts -XX:+PrintInlining -Xcomp -XX:CompileCommand=compileonly,H::test -XX:-TieredCompilation -XX:-LoopUnswitching H Loop: N0/N0 has_sfpt Loop: N167/N175 IRREDUCIBLE sfpts={ 175 } Loop: N260/N258 limit_check sfpts={ 232 239 } 0 Root === 0 251 209 [[ 0 1 3 22 23 48 55 98 99 116 237 ]] 237 ConI === 0 [[ 238 ]] #int:2 238 AddI === _ 238 237 [[ 244 239 175 238 205 205 ]] !orig=172,[191],216 !jvms: H::test @ bci:64 ... # Internal Error (/home/emanuel/Documents/fork2-jdk/open/src/hotspot/share/opto/phaseX.cpp:943), pid=1847547, tid=1847560 # assert(no_dead_loop) failed: dead loop detected
12-10-2022

G.jasm -> IGVN leaves dead-loop with multiple Phi and Addi nodes -> PhaseIdealLoop::verify finds dead code, asserts java -jar ~/Documents/asmtools-7.0-build/release/lib/asmtools.jar jasm G.jasm ./java -XX:+UnlockExperimentalVMOptions -XX:+UseG1GC -XX:+PrintOptoAssembly -XX:-UseLoopPredicate -XX:-UseProfiledLoopPredicate -XX:+TraceLoopOpts -XX:+PrintInlining -Xcomp -XX:CompileCommand=compileonly,G::test -XX:-TieredCompilation -XX:-LoopUnswitching G # Internal Error (/home/emanuel/Documents/fork2-jdk/open/src/hotspot/share/opto/loopnode.cpp:5284), pid=1840533, tid=1840546 # assert(!in->is_CFG()) failed: CFG Node with no controlling input? https://bugs.openjdk.org/secure/attachment/101053/G.jasm.crash.graph.png
12-10-2022

Summary so far: A.jasm -> Phi loop -> bad assumption in PhiNode::merge_through_phi (Phi nodes are dead_loop_safe, so phi-dead-loops are allowed, the sanity assert is incorrect) D.jasm -> last Phi with a unique input of an AddI, that collapses to a self-looping AddI -> dead-loop assert E.jasm -> Phi with left-loop and right-loop (2 AddI that loop back to Phi). -> dead-loop not detected -> fail in PhaseIdealLoop::verify F.jasm -> last Phi with unique input of 3-hop Loop (MulF -> ConvF2I -> ConvI2F -> MulI). dead-loop assert does not trigger because cycle too long, but PhaseIdealLoop::verify finds dead code, asserts
10-10-2022

I just added an F.jasm, where the optimization finally leads to a 3-hop data-deadloop. See the relevant subgraph here: https://bugs.openjdk.org/secure/attachment/101004/F.jasm.crash.graph.png We do not hit the # assert(no_dead_loop) failed: dead loop detected because that assert only detects 1-hop and 2-hop cycles. So like in E.jasm, after PhaseIdealLoop (incl IGVN) we have not removed the HaltNode with the dead-loop above, which leads to the assert # assert(!in->is_CFG()) failed: CFG Node with no controlling input?
10-10-2022

It seems to me, that the issue with E.jasm is that we have two AddI nodes in the dead data-loop at the end, so the Phi node thinks we have a valid entry / backedge loop, but both are "backedges", so we still have a dead-loop. Please see https://bugs.openjdk.org/secure/attachment/101001/E.jasm.crash.graph.png After PhaseIdealLoop (we only ran PHASE_PHASEIDEALLOOP1), where we also run IGVN, we leave behind this dead-loop (both data and control are deadloop - in this case we also have an endless loop [NeverBranch], but that is irrelevant). Then, we run PhaseIdealLoop::verify(igvn); and find a HaltNode for which has_node(n) is false. This is because in PhaseIdealLoop::build_loop_tree we do a traversal from Root down the outputs, so we never find this dead-loop. And in verify, we assume that every control node has been visited (i.e. we cannot have dead-loop code).
09-10-2022

It seems that the issue is that in PhiNode::is_data_loop (better name: is_dead_data_loop), we check if we have a Loop, and if we have a Loop, there is the possibility that we have a unique input because entry is TOP, and the backedge holds the now unique data input. We have a dead data loop. But in our case, we have a unique input because both entry and loopback data are this AddI node. Basically, we have a double loop (loop inside loop), but in total it is all dead. The difficulty is that there are legitimate ways that the Phi node can have two identical inputs for entry and loopback, for example if the loop-modification to this "varialbe" collpases to "identity function". Currently, we decide that since we do not have different values for entry and loopback, this is not dead, and allow the Phi node to be replaced with the Add node, leaving this Add node to self-reference, and trigger the dead-loop assert. We need to find a better condition for dead-loops. As far as I understand, it is bad if a Phi node is removed last in a loop. Maybe what is really bad is if we leave a data-node loop without any Phi node. So a loop with only "unsafe" nodes. Maybe we cannot avoid a bigger traversal?
07-10-2022

In search for other (simpler or more general) cases, I have found a very similar jasm code, that triggers a different assert, and does not even ever visit the PhiNode::is_data_loop function. java -jar ~/Documents/asmtools-7.0-build/release/lib/asmtools.jar jasm E.jasm ./java -XX:+UnlockExperimentalVMOptions -XX:+UseG1GC -XX:-UseLoopPredicate -XX:-UseProfiledLoopPredicate -XX:+TraceLoopOpts -XX:+PrintInlining -Xcomp -XX:CompileCommand=compileonly,E::test -XX:-TieredCompilation -XX:-LoopUnswitching E # Internal Error (/home/emanuel/Documents/fork2-jdk/open/src/hotspot/share/opto/loopnode.cpp:5281), pid=1643825, tid=1643838 # assert(!in->is_CFG()) failed: CFG Node with no controlling input? I wonder if this is the same or a different bug? Note, that also this E.jasm would first trigger the assert in PhiNode::merge_through_phi. So I simply use the bailout code from above, that returns nullptr instead of asserting "sanity".
07-10-2022

I added the following change to avoid that "sanity" assert: @@ -2536,6 +2536,10 @@ Node* PhiNode::merge_through_phi(Node* root_phi, PhaseIterGVN* igvn) { stack.pop(); } } + if (cached_vbox == nullptr) { + // Found dead-loop of Phi nodes. Phi nodes are called dead-loop-safe, so this is expected. + return nullptr; // no optimization - TODO: do we remove the dead-loop ??? + } assert(cached_vbox != NULL, "sanity"); const TypeInstPtr* btype = cached_vbox->box_type(); const TypeVect* vtype = cached_vbox->vec_type(); With that change, I found a D.jasm, such that I get that second assert: # assert(no_dead_loop) failed: dead loop detected Assemble: java -jar ~/Documents/asmtools-7.0-build/release/lib/asmtools.jar jasm D.jasm Run: java -XX:+UnlockExperimentalVMOptions -XX:+UseG1GC -XX:-UseLoopPredicate -XX:-UseProfiledLoopPredicate -XX:+TraceLoopOpts -XX:+PrintInlining -Xcomp -XX:CompileCommand=compileonly,D::test -XX:-TieredCompilation -XX:-LoopUnswitching D Note: without the VM code change, we would still first hit the "sanity" assert.
05-10-2022

I found A.jasm, a reduction of the class file, which triggers a different assert: # Internal Error (/home/emanuel/Documents/fork2-jdk/open/src/hotspot/share/opto/cfgnode.cpp:2539), pid=16034, tid=16048 # assert(cached_vbox != __null) failed: sanity # # JRE version: Java(TM) SE Runtime Environment (20.0) (slowdebug build 20-internal-2022-09-21-0845014.emanuel...) # Java VM: Java HotSpot(TM) 64-Bit Server VM (slowdebug 20-internal-2022-09-21-0845014.emanuel..., compiled mode, compressed oops, compressed class ptrs, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0x7b9687] PhiNode::merge_through_phi(Node*, PhaseIterGVN*)+0x31d Assemble it to class file like this: ./java -jar ~/Documents/asmtools-7.0-build/release/lib/asmtools.jar jasm A.jasm And run like that: ./java -XX:+UnlockExperimentalVMOptions -XX:+UseG1GC -XX:-UseLoopPredicate -XX:-UseProfiledLoopPredicate -XX:+TraceLoopOpts -XX:+PrintInlining -Xcomp -XX:+TraceOptoParse -XX:CompileCommand=compileonly,A::test -Xcomp -XX:-TieredCompilation A The problem seems this: During IGVN, a dead loop is generated. We have a dead-loop of 3 phi's, that only reference each other in the inputs. But Phi nodes are considered is_dead_loop_safe. For unsafe dead-loops, we decide to remove them immediately, but safe dead-loops remain in the graph. This seems mostly an optimization so that we do not always have to traverse the graph with is_unreachable_from_root. The PhiNode::merge_through_phi assumes there cannot be a dead-loop with phi's, but here I have an example. We will probably have to discuss this, but maybe we can bail out of the optimization in PhiNode::merge_through_phi instead of the assert. Since it is a dead loop, further optimizations would be useless anyway. Maybe we could even call a cleanup of the graph in this case. I will now reduce the class file for the original assert: # assert(no_dead_loop) failed: dead loop detected Note: I have struggled with the jasm decompiler because it uses class file version "64:0". The bug here I could reproduce with a smaller version number "49:0".
29-09-2022

Deferring this to JDK 20 for now because it's a long standing issue and we are getting closer to the fork. If the fix is ready in time for JDK 19, the fix version needs to be reset.
18-05-2022

I have enough other work for now. If nobody takes it, I might take it back at a later point.
03-05-2022

I investigated why 3330 gets ctrl input TOP. We find a dominating If with the identical test. Then, we realize that after that dominating if, we took 3259 IfTrue === 3258 [[ 3314 ]] but right now we just used 3316 IfFalse === 3314 [[ 3330 ]] which means 3330 is not reachable. Hence, it seems that this is a good move to disconnect. The difficulty with this bug is that we have many paths that get eliminated, finding the real source of the issue feels like searching for a needle in a haystack.
23-03-2022

Another possibility: the ctrl flow should erode (TOP propagated down) before the phi's become a dead loop.
22-03-2022

What I have found so far: We crash in PhiNode::merge_through_phi , because there is a phi-spaghetti ball which has no data input (6 phi nodes). This triggers the sanity check assert. Then, I implemented a crawler that goes recursively visit all ctrl nodes (where is_CFG() is true). About 40-50 ctrl nodes up (If, ProjFalse etc), I find one that has the control input TOP. 3330 If === 1 3329 It could well be valid that this is TOP. The node 3330 is also put on the work list, but it is not processed before we run into the assert. My hypothesis: we should only do PhiNode::merge_through_phi if we know that not somewhere up in the control flow we still have nodes that are on the worklist and may collapse. Because the whole Phi-spaghetti ball may die from TOP inputs that should propagate down eventually..
22-03-2022

I implemented some helper functions to analyze the Phi structure and the ctrl structure: // Call this from debugger: // Starting at a phi node, traverse inputs (not control) // Traverse phi nodes recursively // Print and count all non-phi, non-NULL inputs int traverse_phi(const int idx, bool printNonPhi) { Node* root_phi = find_node(idx); if( root_phi == NULL ) { tty->print("traverse_phi: node idx not found: %d\n",idx); return 0; } if( !root_phi->isa_Phi() ) { tty->print("traverse_phi: input is not a phi node:\n"); root_phi->dump(); return 0; } Node_Stack stack(1); VectorSet visited; int cnt = 0; stack.push(root_phi, 1); // ignore control visited.set(root_phi->_idx); while (stack.is_nonempty()) { Node* n = stack.node(); uint idx = stack.index(); if (idx < n->req()) { stack.set_index(idx + 1); Node* in = n->in(idx); if (in == NULL) { continue; // ignore dead path } else if (in->isa_Phi()) { if (!visited.test_set(in->_idx)) { stack.push(in, 1); // ignore control tty->print("# phi found:\n"); in->dump(); } } else { if (!visited.test_set(in->_idx) && printNonPhi) { tty->print("# %d: interesting input found in node %d _in[%d]\n",cnt++,n->_idx,idx); in->dump(); } } } else { stack.pop(); } } return cnt; } // Call this from debugger: // Starting at a region node, traverse inputs (only control) // Traverse region nodes recursively int traverse_cfg(const int idx, bool printNonCfg = false) { Node* root_cfg = find_node(idx); if( root_cfg == NULL ) { tty->print("traverse_cfg: node idx not found: %d\n",idx); return 0; } if( !root_cfg->is_CFG() ) { tty->print("traverse_cfg: input is not a cfg node:\n"); root_cfg->dump(); return 0; } Node_Stack stack(1); VectorSet visited; int cnt = 0; stack.push(root_cfg, 0); // also take control visited.set(root_cfg->_idx); while (stack.is_nonempty()) { Node* n = stack.node(); uint idx = stack.index(); if (idx < n->req()) { stack.set_index(idx + 1); Node* in = n->in(idx); if (in == NULL) { continue; // ignore dead path } else if (in->is_CFG() ) { if (!visited.test_set(in->_idx)) { stack.push(in, 0); // also take control tty->print("# cfg found:\n"); in->dump(); } } else { if (!visited.test_set(in->_idx) && printNonCfg) { tty->print("# %d: interesting input found in node %d _in[%d]\n",cnt++,n->_idx,idx); in->dump(); } } } else { stack.pop(); } } return cnt; }
22-03-2022

What I tried so far: I disassembled the JSON.class with jdis, and immediately reassembled it with jasm. Unfortunately, this process is not reflexive, and it does not produce an identical file (only about 3/4 the size). If put in that new class file, I get this error instead: -------------------------- Time: 0.067 There was 1 failure: 1) testIssue1463(com.alibaba.fastjson.deserializer.issue1463.TestIssue1463) java.lang.NoClassDefFoundError: serializer/SerializeWriter at com.alibaba.fastjson.deserializer.issue1463.TestIssue1463.doubleDeserialization(TestIssue1463.java:37) at com.alibaba.fastjson.deserializer.issue1463.TestIssue1463.testIssue1463(TestIssue1463.java:28) ... 31 trimmed Caused by: java.lang.ClassNotFoundException: serializer.SerializeWriter at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641) at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521) ... 34 more FAILURES!!! Tests run: 1, Failures: 1 -------------------- However, if I disassemble it with jdec, and reassemble it with jcoder the files are identical. Unfortunately, the jdec output is not humanly readable, The byte-code is displayed in hex format only. I tried to jdis - jasm - jdec the class file. This also produced hex format byte-code. However, it is not identical (though similar), and if pasted into the other file this leads to issues (I assume some symbols or offsets have changed). It is thus not clear how to reduce the class file. We may have to pain-stakingly debug the graph of the given class-file.
10-02-2022

Additional information from the submitter that may be useful: This fuzzer is specially designed to test the JIT compilers in JVM, it mutates(change some codes inside a file) class files, run the mutated files and expects different behaviors of JVM under interpret mode and compiled mode. This fuzzer is based on byte code. That is, the fuzzer reads in the byte code and mutate it directly, so I cannot provide the source code that is mutated since the fuzzer generates byte code instead of source code. Maybe decompilation tools are helpful. The original source code is from Alibaba's fastjson: https://github.com/alibaba/fastjson. In the fuzz test, only one function in one specific class is mutated: com/alibaba/fastjson/JSON.parseObject(String ,Type, ParserConfig, ParseProcess, int, Feature...) The mutated file that is generated by the fuzzer is: bug/bugFiles/com/alibaba/fastjson/JSON.class. It's sure that this function causes the bug. To produce the bug, just replace the original com/alibaba/fastjson/JSON.class with this mutated one, then run the test: com.alibaba.fastjson.deserializer.issue1463.TestIssue1463 through JUnit. You could run this command in bug/: java -Xcomp -cp ./bugFiles:./util:./junit.jar:./hamcrest.jar:./target/classes:./target/test-classes org.junit.runner.JUnitCore com.alibaba.fastjson.deserializer.issue1463.TestIssue1463 You can see that in this command, ./bugFiles is the first in cp, in order to replace original file with the mutated one. To remove the '-Xcomp' or replace it with '-Xint' leads to different results, as indicated in the report.
10-02-2022

This issue is reproducible in jdk8 also
09-02-2022

I just ran with -Xcomp and hit the assert, I have jdk19 b04 (slightly older version) /tank/fmatte/JAVA/jdk19/jdk-19-ea+04/fastdebug/bin/java -Xcomp -cp ./bugFiles:./util:./junit.jar:./hamcrest.jar:./target/classes:./target/test-classes org.junit.runner.JUnitCore com.alibaba.fastjson.deserializer.issue1463.TestIssue1463 JUnit version 4.13.2 . 341 ConI === 0 [[ 1339 1163 5194 1159 1153 7033 7034 7021 7022 111 110 7023 7024 7025 5195 7026 7077 153 118 110 110 2998 7135 117 7136 6849 7018 7094 7098 152 146 7089 2952 603 153 2951 814 1910 110 5196 2953 633 153 6830 7099 7100 5196 3055 153 427 2747 3193 110 7117 2953 152 153 2952 1152 3136 2996 3049 110 7123 7041 2952 2951 111 5231 2951 5196 153 5196 2952 7040 391 109 2919 3000 153 797 2917 2952 2997 634 7116 2999 5196 2753 5195 2524 111 4378 2953 5232 2952 5196 111 5195 462 153 7124 6821 6833 108 111 6834 160 7075 111 2953 7076 2916 111 111 111 403 1904 7115 6906 2951 5196 6935 2951 2953 7035 2951 111 7114 7728 2952 5196 152 111 2951 153 153 2953 153 2953 7754 7229 7000 2952 7118 165 2951 2952 2953 5260 110 2951 6812 153 2951 2953 5196 6810 358 6990 2918 111 153 2953 2952 2951 7039 6811 2953 2952 110 106 6934 2915 2753 2968 6934 2914 ]] #int:1 !jvms: JSON::parseObject @ bci:240 (line 368) 2524 AddI === _ 2524 341 [[ 2996 160 7509 2919 4408 2916 2997 2997 2998 2998 152 2524 2996 2763 2918 2917 1920 2999 3000 ]] !orig=[7508],... !jvms: JSON::parseObject @ bci:748 (line 368) # To suppress the following error report, specify this argument # after -XX: or in .hotspotrc: SuppressErrorAt=/phaseX.cpp:943 # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/opt/mach5/mesos/work_dir/slaves/a2dc162d-743b-4800-9e92-31f85abb45b1-S136873/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/1a465b75-bc3e-436d-8ce8-843258d7f72f/runs/4c2d264d-5066-447e-8e7e-d0ab59cf74c8/workspace/open/src/hotspot/share/opto/phaseX.cpp:943), pid=11039, tid=11052 # assert(no_dead_loop) failed: dead loop detected # # JRE version: Java(TM) SE Runtime Environment (19.0+4) (fastdebug build 19-ea+4-144) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 19-ea+4-144, compiled mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0x16935e6] PhaseGVN::dead_loop_check(Node*) [clone .part.0]+0x156 # # Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to /tank/fmatte/bugs/8280126/bug/core.11039) # # An error report file with more information is saved as: # /tank/fmatte/bugs/8280126/bug/hs_err_pid11039.log # # Compiler replay data is saved as: # /tank/fmatte/bugs/8280126/bug/replay_pid11039.log # # If you would like to submit a bug report, please visit: # https://bugreport.java.com/bugreport/crash.jsp #
08-02-2022

I can also reproduce this with latest JDK when using -XX:+UnlockExperimentalVMOptions -XX:-EnableVectorReboxing. Otherwise, I hit the assert which [~dlong] posted above. It is probably the same issue and we just fail earlier because the graph is already broken.
26-01-2022

The builds at https://jdk.java.net/18/ are not debug builds, which explains why I didn't see an assert. I do get the assert with this build: # JRE version: Java(TM) SE Runtime Environment (18.0+32) (fastdebug build 18-ea+32-2068)
21-01-2022

With jdk19 I get: assert(cached_vbox != __null) failed: sanity called from PhiNode::merge_through_phi I haven't been able to reproduce it in jdk18 b31
21-01-2022

ILW = same as JDK-8275326 = P3
20-01-2022

Moving from hotspot/runtime -> hotspot/compiler for initial triage.
20-01-2022

Requested the bug.zip file from the submitter.
20-01-2022

Please ask for bug.zip file mentioned in the bug report
20-01-2022

Additional information from the submitter: I have tried the JDK18 of the following version: openjdk version "18-ea" 2022-03-22 OpenJDK Runtime Environment (build 18-ea+31-2049) OpenJDK 64-Bit Server VM (build 18-ea+31-2049, mixed mode, sharing) However, it seems that the problem still exists. It ran correctly in mixed mode and interpreted mode, but trapped in dead loop in compiled mode.
20-01-2022

Requested the submitter verify the fix by downloading the latest version JDK 18 at https://jdk.java.net/18/ If the issue still exists, requested the bug.zip mentioned in the reproducing steps.
18-01-2022

This looks exactly similar to JDK-8275326. There is no source to validate. Could you please check with jdk18 latest version to confirm the fix?
14-01-2022