JDK-8288981 : C2: Fix issues with skeleton predicates
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 11,17,18,19,20,21,22,23,24
  • Priority: P3
  • Status: In Progress
  • Resolution: Unresolved
  • Submitted: 2022-06-22
  • Updated: 2024-08-09
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 24
24Unresolved
Related Reports
Blocks :  
Blocks :  
Blocks :  
Duplicate :  
Duplicate :  
Duplicate :  
Duplicate :  
Duplicate :  
Duplicate :  
Duplicate :  
Duplicate :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Sub Tasks
JDK-8303497 :  
JDK-8305634 :  
JDK-8305635 :  
JDK-8305636 :  
JDK-8305637 :  
JDK-8305638 :  
JDK-8325746 :  
JDK-8327109 :  
JDK-8327110 :  
JDK-8327111 :  
JDK-8330004 :  
JDK-8330386 :  
JDK-8331168 :  
JDK-8334571 :  
JDK-8334650 :  
JDK-8335257 :  
Description
Since the introduction of skeleton predicates in JDK-8193130 to protect against over-unrolling a main loop, many new related bugs have been found and fixed - also for loop optimizations other than the main loop of pre/main/post (e.g. post loop, loop unswitching, loop peeling). The most recent bug fix (JDK-8283466) fixed a sub problem for loop peeling while there are still remaining unfixed cases for loop peeling (JDK-8278420) which are, however, rare in practice. 

Some recent changes (JDK-8230382) increase the probability of hitting these remaining problems with fuzzing but it's still hard to trigger this in practice. 

JDK-8286805 investigates stressing of various loop opts, including loop peeling. It tries to randomly apply different loop optimization even though the normal policy would not allow it. The idea is to only apply a loop optimization if the policy would not apply it due to inefficiency. This has revealed a lot of cases that can be traced back to some missing cases with skeleton predicates. 

To be able to reduce the noise in JDK-8286805, we should fix the missing skeleton predicate cases. To do that, we first want to redesign and refactor the skeleton predicates code and then fix the missing cases separately (like JDK-8278420). 

This RFE concerns the redesign and refactoring of skeleton predicates. Due to the many fixes that went in, the code became quite complex. On top of that, the naming scheme for all the different predicates is sometimes ambiguous and not consistent.


In this RFE, we want to implement/perform the following ideas:
- Find a good naming scheme to distinguish the different kinds of predicates. In the following, the currently existing names are used due to a lack of a better naming scheme as of today.
- Group skeleton predicates together and place them right above a loop head:
  - Easier to find them again. 
  - They should always be around until we are done with loop opts. This also allows us to rewire data nodes together with the cloning of the skeleton predicates.
  - They can still be found even though the empty predicates with the Conv2B/Opaque1 nodes are cleanup up (see call of Compile::cleanup_loop_predicates()).
  - The ordering of the skeleton predicates is most likely irrelevant.
- Use Halt nodes instead of UCT when inserting the empty predicates to simplify the copying of skeleton predicates (they never trap) as we do not need to find a valid UCT (which might not always be found, e.g. after a peeled iteration).
- Introduce a new OpaqueNode to better distinguish skeleton predicates from other predicates. This makes it easier to find them again and to reason about them when analyzing a graph.
- Refactor code:
  - A lot of code looks similar and should be reused instead of copied.
  - Better method naming (e.g. cloning vs. copying).

The redesign should fix JDK-8288941. Make sure to double check before the integration of this redesign.


Skeleton Predicate Background
===================
This section describes the basic idea of skeleton predicates.

At the time we create a range check predicate RCP for a range check of an array access A inside a single (counted) loop L, the predicate covers the range of L. During runtime, we will trap before executing a single iteration of L if an array access would be outside of the allowed range. So, from the runtime perspective, it does not matter if we have a single loop L or if we split L into multiple sub loops that will together cover all the iterations of L.

However, during C2 compilation we are facing problems when splitting L into different sub loops, lets say L1 and L2, while only having the single predicate RCP above all loops. In C2, data and control are separate. Data for the array access A (that uses the iv phi), could use type information of the updated loop bounds/updated iv phi for each individual loop L1 and L2. We could find that in L2, for example, an array access would be out of bounds. Data is replaced by TOP (done by Cast/Conv nodes that narrow a range and are replaced by TOP if the range becomes empty). This means that the loop L2 is never entered because the predicate RCP would trap during runtime. Therefore, the entire loop L2 should be removed - but that's not done. There is no check before L2 that could fold and let the loop die. We end up with a broken graph where some nodes died while others are still alive. To solve this, we insert a dummy skeleton predicate (that never fails at runtime because we would already trap wit RCP) for each sub loop of L that covers a part of the entire range of L. This allows L2 to die when we find that an array access is out of bounds.

There are two places where we can split a loop into multiple sub loops. We need to insert a skeleton predicate for each sub loop:
- Peeling: Split loop into "Peeled Iteration" + "Peeled Loop"
- Pre/Main/Post: Split loop into "Pre Loop" + "Main Loop" + "Post Loop"

Comments
We are deferring this again to JDK 23.
13-11-2023

Given the complexity and the risk factor of the new on-going implementation, which is not stable yet, we are deferring this to JDK 22. The plan is to only integrate the clean-up RFEs JDK-8305634, JDK-8305635, JDK-8305636, JDK-8305637 in JDK 21 and the final fix JDK-8305638 in JDK 22.
16-05-2023

Attached TestPreMainPostingMainLoop.java which fails after JDK-8303951 because we miss skeleton predicates after pre/main/posting a main loop: $ java -Xcomp -XX:CompileOnly=TestPreMainPostingMainLoop TestPreMainPostingMainLoop.java
05-04-2023

The fix for this should also undo the problem listing done by JDK-8303497.
02-03-2023

Here's a log file snippet from the jdk-21+11-835-tier8 sighting: applications/javafuzzer/BigTest.java ----------System.out:(53/3742)---------- Using JRuby executable: /opt/mach5/mesos/work_dir/jib-master/install/org/jruby/jruby-dist/9.2.12.0/jruby-dist-9.2.12.0-bin.zip/jruby-9.2.12.0/bin/jruby For random generator using seed: 3189920980516317563 To re-run test with same seed value please add "-Djdk.test.lib.random.seed=3189920980516317563" to command line. Starting JavaFuzzer: '/bin/bash /opt/mach5/mesos/work_dir/jib-master/install/com/oracle/jpg/bigapps/javafuzzer/javafuzzer/1.0/javafuzzer-1.0.zip/mrt.sh -R /opt/mach5/mesos/work_dir/slaves/91e16c40-06d4-468a-9fc3-7198a5bb7d5a-S118291/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/75b43dde-1b20-4b2b-b180-7e3f63a5bec8/runs/c73666cb-6d06-484a-8a56-dc4096c67661/testoutput/test-support/jtreg_closed_test_hotspot_jtreg_applications_javafuzzer_BigTest_java/scratch/0 -NT 300 -NP 12 -A -conf config.yml' [2023-02-23T14:56:03.829878780Z] Gathering output for process 432 [2023-02-23T23:01:56.349477716Z] Waiting for completion for process 432 [2023-02-23T23:01:56.349649649Z] Waiting for completion finished for process 432 Output and diagnostic info for process 432 was saved into 'pid-432-output.log' Summary of the JavaFuzzer run: ------------------------------ Host: ol7-x64-298366 Tests: 12 x 300 Args: -conf config.yml Started at: Thu Feb 23 14:56:03 UTC 2023 r8- 300: 190 passed, 0 crashes, 0 fails, 0 hangs, 0 incorrect tests, 110 Reference Java failures r10- 300: 189 passed, 0 crashes, 0 fails, 0 hangs, 0 incorrect tests, 111 Reference Java failures r6- 300: 185 passed, 0 crashes, 0 fails, 0 hangs, 0 incorrect tests, 115 Reference Java failures r5- 300: 181 passed, 0 crashes, 0 fails, 0 hangs, 0 incorrect tests, 119 Reference Java failures r4- 300: 190 passed, 0 crashes, 0 fails, 0 hangs, 0 incorrect tests, 110 Reference Java failures r7- 300: 189 passed, 0 crashes, 0 fails, 0 hangs, 0 incorrect tests, 111 Reference Java failures r3- 300: 192 passed, 0 crashes, 0 fails, 0 hangs, 0 incorrect tests, 108 Reference Java failures r11- 300: 178 passed, 0 crashes, 0 fails, 0 hangs, 0 incorrect tests, 122 Reference Java failures r9- 300: 190 passed, 0 crashes, 0 fails, 0 hangs, 0 incorrect tests, 110 Reference Java failures r1- 300: 179 passed, 1 crashes, 0 fails, 0 hangs, 0 incorrect tests, 120 Reference Java failures r2- 300: 191 passed, 0 crashes, 0 fails, 0 hangs, 0 incorrect tests, 109 Reference Java failures r12- 300: 191 passed, 0 crashes, 0 fails, 0 hangs, 0 incorrect tests, 109 Reference Java failures Finished at: Thu Feb 23 23:01:56 UTC 2023 # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/opt/mach5/mesos/work_dir/slaves/91e16c40-06d4-468a-9fc3-7198a5bb7d5a-S123964/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/31a9704f-679e-4e16-8d24-a366ba036075/runs/cabc4065-f3f9-47ff-b17f-66de229cfbb6/workspace/open/src/hotspot/share/opto/gcm.cpp:1423), pid=14534, tid=14547 # assert(false) failed: graph should be schedulable # # JRE version: Java(TM) SE Runtime Environment (21.0+11) (fastdebug build 21-ea+11-LTS-835) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 21-ea+11-LTS-835, compiled mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0xf0383a] PhaseCFG::schedule_late(VectorSet&, Node_Stack&)+0xc1a # # Core dump will be written. Default location: Core dumps may be processed with "/opt/core.sh %p" (or dumping to /tmp/fuzzer.tmp.d12ohrb3zu/core.14534) # # If you would like to submit a bug report, please visit: # https://bugreport.java.com/bugreport/crash.jsp # [2023-02-23T23:01:56.535408130Z] Waiting for completion for process 432 [2023-02-23T23:01:56.535574080Z] Waiting for completion finished for process 432 ----------System.err:(13/728)---------- java.lang.RuntimeException: assertEquals: expected 1 to equal 2 at jdk.test.lib.Asserts.fail(Asserts.java:594) at jdk.test.lib.Asserts.assertEquals(Asserts.java:205) at jdk.test.lib.Asserts.assertEquals(Asserts.java:189) at applications.javafuzzer.JavaFuzzerRunner.main(JavaFuzzerRunner.java:245) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) at java.base/java.lang.reflect.Method.invoke(Method.java:578) at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:312) at java.base/java.lang.Thread.run(Thread.java:1623) JavaTest Message: Test threw exception: java.lang.RuntimeException JavaTest Message: shutting down test result: Failed. Execution failed: `main' threw exception: java.lang.RuntimeException: assertEquals: expected 1 to equal 2 Here's the crashing thread's stack: --------------- T H R E A D --------------- Current thread (0x00007f35b434ccc0): JavaThread "C2 CompilerThread0" daemon [_thread_in_native, id=14547, stack(0x00007f359c143000,0x00007f359c244000)] Current CompileTask: C2: 605 10 % b 4 Test::mainTest @ 124 (570 bytes) Stack: [0x00007f359c143000,0x00007f359c244000], sp=0x00007f359c23ea30, free space=1006k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0xf0383a] PhaseCFG::schedule_late(VectorSet&, Node_Stack&)+0xc1a (gcm.cpp:1423) V [libjvm.so+0xf040ed] PhaseCFG::global_code_motion()+0x36d (gcm.cpp:1520) V [libjvm.so+0xf08361] PhaseCFG::do_global_code_motion()+0x51 (gcm.cpp:1642) V [libjvm.so+0xb282e9] Compile::Code_Gen()+0x299 (compile.cpp:2912) V [libjvm.so+0xb2dfd9] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x19f9 (compile.cpp:866) V [libjvm.so+0x93a037] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x4e7 (c2compiler.cpp:113) V [libjvm.so+0xb3b49c] CompileBroker::invoke_compiler_on_method(CompileTask*)+0xa7c (compileBroker.cpp:2237) V [libjvm.so+0xb3c288] CompileBroker::compiler_thread_loop()+0x608 (compileBroker.cpp:1916) V [libjvm.so+0x10941e6] JavaThread::thread_main_inner()+0x206 (javaThread.cpp:710) V [libjvm.so+0x1a99b80] Thread::call_run()+0x100 (thread.cpp:224) V [libjvm.so+0x1738ec3] thread_native_entry(Thread*)+0x103 (os_linux.cpp:737) I would have linked this failure to the following bug: JDK-8296077 applications/javafuzzer/BigTest.java fails "assert(false) failed: graph should be schedulable" after JDK-8281429 but JDK-8296077 was closed as a duplicate this bug (JDK-8288981).
23-02-2023

ILW = same as JDK-8288941 = P3
22-07-2022

I changed this from an RFE to a bug, because we have other bugs that were closed as a duplicate of this one.
22-07-2022