JDK-8285820 : C2: LCM prioritizes locally dependent CreateEx nodes over projections after 8270090
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 11,17,18,19
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2022-04-28
  • Updated: 2022-07-03
  • Resolved: 2022-05-12
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11 JDK 17 JDK 19
11.0.17-oracleFixed 17.0.5-oracleFixed 19 b23Fixed
Related Reports
Relates :  
Relates :  
Description
Before JDK-8270090, CreateEx nodes *initially ready* were given priority over Proj nodes [1], but CreateEx nodes with input dependencies within the block ("locally-dependent") were given the same priority as Proj nodes [2]. That is, the ranking between CreateEx and Proj nodes was:

1. Initially ready CreateEx nodes
2. Proj nodes and other CreateEx nodes (tie)

JDK-8270090 altered this ranking, giving *all* CreateEx nodes priority over Proj nodes, regardless of whether they are ready initially or not [3]. This was done based on the wrong assumption that all CreateEx nodes would be initially ready. The new ranking has triggered the following test failure, specific to x86_32 using -XX:+UseShenandoahGC, but might affect other platforms and configurations:

$ CONF=linux-x86-server-fastdebug make run-test TEST=java/util/concurrent/tck/JSR166TestCase.java TEST_VM_OPTS="-XX:+UseShenandoahGC"

# Internal Error (/home/shade/trunks/jdk/src/hotspot/cpu/x86/gc/shenandoah/shenandoahBarrierSetAssembler_x86.cpp:817), pid=3039050, tid=3039063
# assert(res != __null) failed: need result register

Current CompileTask:
C2: 2791 1631 4 java.util.concurrent.LinkedTransferQueue::xfer (236 bytes)

Stack: [0xb183f000,0xb1900000], sp=0xb18fcd00, free space=759k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V [libjvm.so+0x14d12a0] ShenandoahBarrierSetAssembler::cmpxchg_oop(MacroAssembler*, RegisterImpl*, Address, RegisterImpl*, RegisterImpl*, bool, RegisterImpl*, RegisterImpl*)+0x710
V [libjvm.so+0x391091] compareAndSwapP_shenandoahNode::emit(CodeBuffer&, PhaseRegAlloc*) const+0x281
V [libjvm.so+0x1346f81] PhaseOutput::scratch_emit_size(Node const*)+0x491
V [libjvm.so+0x1113969] MachNode::size(PhaseRegAlloc*) const+0x69
V [libjvm.so+0x133d1db] PhaseOutput::shorten_branches(unsigned int*)+0x34b
V [libjvm.so+0x134fde5] PhaseOutput::Output()+0xad5
V [libjvm.so+0x96d98d] Compile::Code_Gen()+0x4dd
V [libjvm.so+0x972a6a] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x141a
V [libjvm.so+0x796d6f] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x89f
V [libjvm.so+0x9844cd] CompileBroker::invoke_compiler_on_method(CompileTask*)+0x15bd
V [libjvm.so+0x985311] CompileBroker::compiler_thread_loop()+0x7b1
V [libjvm.so+0x9af41b] CompilerThread::thread_entry(JavaThread*, JavaThread*)+0x2b
V [libjvm.so+0x16ee8f8] JavaThread::thread_main_inner()+0x5b8
V [libjvm.so+0x16f10a2] JavaThread::run()+0x402
V [libjvm.so+0x16fa55e] Thread::call_run()+0xfe
V [libjvm.so+0x1324ff3] thread_native_entry(Thread*)+0x123
C [libpthread.so.0+0x7635] start_thread+0xf5

[1] https://github.com/openjdk/jdk/blob/beedae1141b6b650dc4cedf1f038afc1c8b460dd/src/hotspot/share/opto/lcm.cpp#L1057-L1070
[2] https://github.com/openjdk/jdk/blob/beedae1141b6b650dc4cedf1f038afc1c8b460dd/src/hotspot/share/opto/lcm.cpp#L531-L538
[3] https://github.com/openjdk/jdk/blob/8ebea443f333ecf79d6b0fc725ededb231e83ed5/src/hotspot/share/opto/lcm.cpp#L532-L537
Comments
Fix request [11u] I backport this for parity with 11.0.17-oracle. Required follow-up to 8270090. Clean backport. SAP nighltly testing passed.
29-06-2022

A pull request was submitted for review. URL: https://git.openjdk.org/jdk11u-dev/pull/1179 Date: 2022-06-24 09:55:51 +0000
24-06-2022

Fix request [17u] I backport this for parity with 17.0.5-oracle. Required follow-up. Clean backport. SAP nightly testing passed.
09-06-2022

A pull request was submitted for review. URL: https://git.openjdk.java.net/jdk17u-dev/pull/442 Date: 2022-06-08 10:29:12 +0000
08-06-2022

Changeset: 89392fb1 Author: Roberto CastaƱeda Lozano <rcastanedalo@openjdk.org> Date: 2022-05-12 07:05:38 +0000 URL: https://git.openjdk.java.net/jdk/commit/89392fb15e9652b7b562b3511f79bda725c5499c
12-05-2022

Thanks for the heads-up [~marchof], I do not expect the integration of virtual threads to affect this fix, but we will see after JDK-8286365 is addressed.
11-05-2022

FYI: Due to JDK-8286365 master cannot be built any more for arm32.
10-05-2022

A pull request was submitted for review. URL: https://git.openjdk.java.net/jdk/pull/8568 Date: 2022-05-06 10:42:52 +0000
10-05-2022

Great, thanks Marc! I will soon submit a pull request for review (https://github.com/openjdk/jdk/pull/8568).
06-05-2022

JaCoCo build is green with fastdebug build of your branch!
06-05-2022

Thank you both!
05-05-2022

[~rcastanedalo] The JaCoCo build is already running: https://pici.beachhub.io/#/jdk-8285820-jacoco
05-05-2022

[~rcastanedalo] JDK build is running https://pici.beachhub.io/#/jdk-8285820
05-05-2022

[~marchof] The fix to JDK-8270090 caused failures in other configurations, so it has to be partially reverted. The parts that have to be reverted should not reintroduce the issue seen in JDK-8270090 (wrong relative ranking between projections and CheckCastPP nodes), but would be good to test it. Would it be possible to test building JaCoCo on ARM32 using a fastdebug build of https://github.com/robcasloz/jdk/tree/JDK-8285820? (i.e. same as https://pici.beachhub.io/#/JDK-8270090/ and https://pici.beachhub.io/#/JDK-8270090-jacoco/ but for my personal JDK-8285820 branch). Thanks!
05-05-2022

I decided to go with the lower-risk approach of partially reverting JDK-8270090 by now, the proposed solution is implemented in https://github.com/robcasloz/jdk/tree/JDK-8285820. I will file a RFE for reworking the scheduling of initially-ready CreateEx nodes in the future. [~shade] Testing on windows-x64, linux-x64, linux-aarch64, and macosx-x64 looks good. Could you please re-run your linux-x86+UseShenandoahGC tests on the above branch?
05-05-2022

I tested the original Shenandoah reproducer and it passes. Once PR is open, GHA would test x86_32 tier1, and I can do tier1 with Shenandoah too.
05-05-2022

Thanks [~marchof]! The build succeeded, would it be possible to test building JaCoCo using this JDK build? (similarly to https://pici.beachhub.io/#/JDK-8270090-jacoco).
05-05-2022

Updating priority after the pre-investigation: ILW = C2 assertion failure; observed running Shenandoah GC on x86 but might be generic; disable compilation of affected method = MMM = P3
04-05-2022

Yes, that makes sense. So before JDK-8270090 we added initially ready CreateEx node to the beginning of the worklist, and that always resolved the tie against subsequent Proj in favor of CreateEx. If we do my hack, that would mean initially ready CreateEx nodes *may* lose that tie and Proj would be selected first -- and so it would require reinstating adding the initially ready CreateEx at the beginning of worklist. I agree that's quite fragile, but at least it gets us back to state before JDK-8270090. If you can come up with better way to do this, it would be nice. This confirms the bug is more or less generic, it just so happens to manifest on Shenandoah x86_32. Feel free to change the synopsis accordingly.
03-05-2022

I think I finally see what's going on. Before JDK-8270090, only CreateEx nodes *initially ready* were given priority over Proj nodes [1], but CreateEx nodes with input dependencies within the block where given the same priority as Proj nodes [2]. That is, the ranking between CreateEx and Proj nodes was: 1. Initially ready CreateEx nodes 2. Proj nodes and other CreateEx nodes (tie) JDK-8270090 indeed altered this ranking, giving *all* CreateEx nodes priority over Proj nodes, regardless of whether they are ready initially or not [3]. This was done based on the seemingly wrong assumption that all CreateEx nodes would be initially ready. I think the ranking between CreateEx nodes and Proj nodes should be, for clarity: 1. Initially ready CreateEx nodes 2. Proj nodes 3. Other CreateEx nodes Alternatively, we could partially revert JDK-8270090 as you suggest in your patch above, also reintroducing the code that manipulates the ready list order to raise the priority of initially ready CreateEx nodes [4]. This would be less disruptive but I find it brittle and a bit convoluted to rely on the ready list order for correctness. What do you think? [1] https://github.com/openjdk/jdk/blob/beedae1141b6b650dc4cedf1f038afc1c8b460dd/src/hotspot/share/opto/lcm.cpp#L1057-L1070 [2] https://github.com/openjdk/jdk/blob/beedae1141b6b650dc4cedf1f038afc1c8b460dd/src/hotspot/share/opto/lcm.cpp#L531-L538 [3] https://github.com/openjdk/jdk/blob/8ebea443f333ecf79d6b0fc725ededb231e83ed5/src/hotspot/share/opto/lcm.cpp#L532-L537 [4] https://github.com/openjdk/jdk/blob/beedae1141b6b650dc4cedf1f038afc1c8b460dd/src/hotspot/share/opto/lcm.cpp#L1063-L1066
03-05-2022

Thanks for the analysis [~shade]. The intention of JDK-8270090 w.r.t. CreateEx nodes was to preserve the original prioritization, where they were given the highest priority by placing them at the beginning of the ready list before running the list scheduling algorithm (see the last change in https://github.com/openjdk/jdk/pull/7988/files), so it is still unclear to me why this fails after, but not before, JDK-8270090. I don't have access to a x86_32 system, would it be possible to get the output for this test case using "-XX:+TraceOptoPipelining -XX:+Verbose"? Thanks!
02-05-2022

Unfortunately, TraceOptoPipelining output is huge, even if I limit it to the problematic method with CompileCommand. I can try and manage the dump later. But here is what I gather. Before the patch, we inserted CreateEx at the top of the list, and selected Proj anyway ("Projections always win"). Meaning, before the JDK-8270090, Proj nodes had priority over CreateEx. After JDK-8270090, it is not really relevant if we inserted CreateEx on the top or not, because we would roll over the entire worklist and return CreateEx if it is anywhere on the list, right? Meaning, after JDK-8270090 invert the priority of CreateEx over Proj: now CreateEx would be selected first.
02-05-2022

Looks to me the Shenandoah CAS node was optimized away. The hack below solves the failure, by partially reverting the JDK-8270090 that made CreateEx chosen over the projection (those are used to pin atomic ops, afaics, see BarrierSetC2::pin_atomic_op). [~rcastanedalo], I wonder if CreateEx handling is now incorrect. diff --git a/src/hotspot/share/opto/lcm.cpp b/src/hotspot/share/opto/lcm.cpp index 3bbc46f4e1a..0bad9b05166 100644 --- a/src/hotspot/share/opto/lcm.cpp +++ b/src/hotspot/share/opto/lcm.cpp @@ -529,7 +529,7 @@ Node* PhaseCFG::select( Node *n = worklist[i]; // Get Node on worklist int iop = n->is_Mach() ? n->as_Mach()->ideal_Opcode() : 0; - if (iop == Op_CreateEx) { + if (n->is_Proj() || iop == Op_CreateEx) { // CreateEx must start the block (after Phi and Parm nodes which are // pre-scheduled): select it right away. worklist.map(i,worklist.pop());
28-04-2022

ILW = C2 assertion failure; running Shenandoah GC on x86; disable compilation of affected method or use default GC = MLM = P4
28-04-2022

Bisection points to JDK-8270090.
28-04-2022