JDK-8332920 : C2: Partial Peeling is wrongly applied for CmpU with negative limit
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 18,21,22,23,24
  • Priority: P2
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2024-05-24
  • Updated: 2024-06-25
  • Resolved: 2024-06-11
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 21 JDK 23 JDK 24 Other
21.0.5-oracleFixed 23Fixed 24 b02Fixed naResolved
Related Reports
Blocks :  
Relates :  
Relates :  
Description
The following code executes incorrectly, stoppin after about 800'000 iterations and performing the last `System.out.printf`. The exact number of iterations changes from run to run, probably depending on compilation timing.

But there is more: commenting out the lines after the loop and executing throws a "/ by zero" ArithmeticException on L.12.

In both cases, it should instead report `n=-2147483647, q=0, q1=1` and exit with code 22, as when executed in interpreter mode with -Xint.

Tried on macOS 14.5/AArch64 and Ubuntu 22.04.4/x64 with JDK 21 and 22. Executes correctly on JDK 17.

```
import static java.lang.Integer.*;

public class Buggy {
    public static void main(String[] args) {
        int N = -1;
        int d = MIN_VALUE + 1;

        long i = 0;
        int n = 0;
        for (; compareUnsigned(n, N) < 0; ++n, ++i) {
            int q = divideUnsigned(n, d + 1);
            int q1 = divideUnsigned(n, d);
            if (q1 != q) {
                System.out.printf("n=%d, q=%d, q1=%d%n", n, q, q1);
                System.exit(22);
            }
        };
        int q = divideUnsigned(n, d + 1);
        int q1 = divideUnsigned(n, d);
        if (q1 != q) {
            System.out.printf("n=%d, q=%d, q1=%d", n, q, q1);
            System.exit(22);
        }
        System.out.printf("counted=%d, expected=%d%n", i, (N & 0xFFFF_FFFFL) + 1);
    }
}
```
Comments
[jdk21u-fix-request] Approval Request from Martin Should get backported for parity with 21.0.5-oracle. Applies cleanly, but the test needs an adaptation which has been reviewed by the original author. Tier 1-4 have passed.
22-06-2024

A pull request was submitted for review. URL: https://git.openjdk.org/jdk21u-dev/pull/728 Date: 2024-06-14 15:38:59 +0000
14-06-2024

A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/19653 Date: 2024-06-11 11:48:55 +0000
11-06-2024

Fix was pushed to JDK 24 while this main issue was targeted to JDK 23. Reset this issue to fixed in JDK 24 and copied the Robo Duke entry here.
11-06-2024

Changeset: ef101f1b Author: Christian Hagedorn <chagedorn@openjdk.org> Date: 2024-06-11 11:32:12 +0000 URL: https://git.openjdk.org/jdk/commit/ef101f1bf20f2813f855af4bc4eb317565175208
11-06-2024

A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/19522 Date: 2024-06-03 12:39:05 +0000
03-06-2024

Thanks Christian!
03-06-2024

Thanks everyone for analyzing and narrowing the problem down and attaching simplified reducers! [~djelinski] is right I think, we should not apply this optimization if limit is negative. So, I think it is not directly related to JDK-8276162 but it just revealed the issue. Before that, I'm not sure how to get an unsigned comparison with a negative value or if it's even possible. The compareUnsigned() intrinsic was also only introduced later with JDK-8283726. About the stride, I think it does not matter if the iv phi overflows or not. I've prepared a patch with some more tests and also included the mentioned test cases above. To be sure to not hit the CCP assert found by [~thartmann], I've filed JDK-8333366 to get this fixed first (just sent out a PR for it). The fix is straight forward. Once it is in, we can also get this in. I will run some testing over the weekend and if it results look good, I can prepare a PR. I think we can still make it before the fork (but given it's a P2, we would still have more time).
31-05-2024

[~djelinski] Yes, the logic in PhaseIdealLoop::insert_cmpi_loop_exit seems flawed. -XX:-PartialPeelAtUnsignedTests will disable it and serves as a workaround. Looking at the code again, I just noticed that I fixed something similar a few years ago with JDK-8251535. I think the fix is incomplete.
30-05-2024

[~qamai] that's an interesting pointer! Started going through the code, and immediately found a few comments that look suspicious. These two are only true if limit is non-negative, otherwise the and/or predicates need to be swapped: https://github.com/openjdk/jdk/blob/bc7d9e3d0bc663bbbeb068889082da4a9f0fa8de/src/hotspot/share/opto/loopopts.cpp#L3042 https://github.com/openjdk/jdk/blob/bc7d9e3d0bc663bbbeb068889082da4a9f0fa8de/src/hotspot/share/opto/loopopts.cpp#L3046 This one is only reliable if stride =+-1, for larger strides the counter might wrap: https://github.com/openjdk/jdk/blob/bc7d9e3d0bc663bbbeb068889082da4a9f0fa8de/src/hotspot/share/opto/loopopts.cpp#L3047 Also, this optimization can be turned off with -XX:-PartialPeelAtUnsignedTests. I haven't checked if that helps yet. On a side note, the Intel optimization manual recommends to prefer unsigned loop variables over signed. That recommendation dates back to pre-Nehalem days and doesn't matter much on modern hardware, but still suggests that replacing unsigned with signed loops might not always be the right thing to do.
30-05-2024

Thanks, [~qamai]. Sure, I'll re-assign.
30-05-2024

I believe this is due to our policy of preferring signed comparisons to unsigned ones during partial peeling being incorrect [1]. The solution may be just not doing that? [~thartmann] The solution to me seems to have unknown impacts and I'm not really familiar with loop opts, so please re-assign the issue as this has a high priority. [1]: https://github.com/openjdk/jdk/blob/bc7d9e3d0bc663bbbeb068889082da4a9f0fa8de/src/hotspot/share/opto/loopopts.cpp#L2989
29-05-2024

Running below version of the test with OSR disabled triggers an assert in PhaseCCP (it's a separate issue though because the test also fails without PhaseCCP): public static boolean test() { for (int i = MAX_VALUE - 50_000; compareUnsigned(i, -1) < 0; ++i) { if (compareUnsigned(MIN_VALUE, i) < 0) { return true; } } return false; } public static void main(String[] args) { for (int i = 0; i < 10_000; ++i) { if (!test()) throw new RuntimeException("Test failed"); } } java -XX:-TieredCompilation -XX:CompileCommand=quiet -XX:CompileCommand=compileonly,Test::test* -Xbatch -XX:-UseOnStackReplacement Test.java Missed Value optimization: dist dump --------------------------------------------- 1 22 ConI === 0 [[ 49 96 60 98 38 ]] #int:-1 1 89 AddI === _ 73 27 [[ 98 90 96 73 ]] !jvms: Test::test @ bci:22 (line 18) 0 96 CmpU3 === _ 89 22 [[ 106 ]] !jvms: Test::test @ bci:5 (line 18) Current type: int:-1 Optimized type: int:-1..0 # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/workspace/open/src/hotspot/share/opto/phaseX.cpp:1834), pid=2347935, tid=2347949 # assert(!failure) failed: PhaseCCP not at fixpoint: analysis result may be unsound. # # JRE version: Java(TM) SE Runtime Environment (23.0+25) (fastdebug build 23-ea+25-2051) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 23-ea+25-2051, mixed mode, sharing, compressed oops, compressed class ptrs, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0x152922d] PhaseCCP::analyze()+0x63d
29-05-2024

[~qamai] It might well be that JDK-8276162 just triggers an existing issue. Do you plan to investigate further or should we re-assign? Thanks!
29-05-2024

> Commenting out these 2 lines seems to fix the issue [~djelinski] That essentially disables JDK-8276162. > the above assembly snippet is unique to the cmpU3 node. However, IGVisualizer shows that the only cmpU3 node was removed very early, in the RemoveUseless step. How did it manage to generate code? The CmpU3 is converted to a CmpU that is then incorrectly optimized. The assembly snippet is not unique to a CmpU3.
29-05-2024

Here is an even simpler test: public static boolean test() { for (int i = MAX_VALUE - 50_000; compareUnsigned(i, -1) < 0; ++i) { if (compareUnsigned(MIN_VALUE, i) < 0) { return true; } } return false; } public static void main(String[] args) { for (int i = 0; i < 2; ++i) { if (!test()) throw new RuntimeException("Test failed"); } } java -XX:-TieredCompilation -XX:CompileCommand=quiet -XX:CompileCommand=compileonly,Test::test* Test.java Also reproduces when turning off some optimizations via -XX:LoopUnrollLimit=0 -XX:-UseLoopPredicate -XX:-RangeCheckElimination
29-05-2024

Try this function: public static int test() { int N = -1; int d = MIN_VALUE + 1; int n = 0; for (; compareUnsigned(n, N) < 0; ++n) { int q = divideUnsigned(n, d + 1); int q1 = divideUnsigned(n, d); if (q1 != q) { return 0; } } return 1; } After partial peeling, the exit test seems to be converted to n < N, which seems completely wrong. This leads to all iterations other than the first one, which was peeled out already, removed.
28-05-2024

the above assembly snippet is unique to the cmpU3 node. However, IGVisualizer shows that the only cmpU3 node was removed very early, in the RemoveUseless step. How did it manage to generate code?
27-05-2024

fwiw, the generated code is failing because the compareUnsigned clobbers the dst register. Tried the PrintAssembly, got: 0x00007ce190b40e78: cmp %ebx,%r14d 0x00007ce190b40e7b: mov $0xffffffff,%ebp 0x00007ce190b40e80: jb 0x00007ce190b40e8a 0x00007ce190b40e82: setne %bpl 0x00007ce190b40e86: movzbl %bpl,%ebp ;*invokestatic compareUnsigned {reexecute=0 rethrow=0 return_oop=0} ; - Test::main@13 (line 10) EBP is also used as one of the constant divisors in the loop, and is not saved/restored around the compareUnsigned; as soon as this code is executed, the loop starts dividing by the wrong number.
27-05-2024

Commenting out these 2 lines seems to fix the issue: https://github.com/openjdk/jdk/blob/be1d374bc54d43aae3b3c1feace22d38fe2156b6/src/hotspot/share/opto/subnode.cpp#L876-L877
27-05-2024

[~qamai], could you please have a look?
27-05-2024

ILW = Wrong result with C2 compiled code, reproducible with test using unsigned compare, no workaround but disable compilation of affected method = HMM = P2
27-05-2024

This is a regression from JDK-8276162 in JDK 18.
27-05-2024

One way to overcome the issue and have correct behavior is to split the loop in two halves, like so ``` for (; compareUnsigned(n, N >>> 1) < 0; ++n, ++i) { int q = divideUnsigned(n, d + 1); int q1 = divideUnsigned(n, d); if (q1 != q) { System.out.printf("n=%d, q=%d, q1=%d%n", n, q, q1); System.exit(22); } }; for (; compareUnsigned(n, N) < 0; ++n, ++i) { int q = divideUnsigned(n, d + 1); int q1 = divideUnsigned(n, d); if (q1 != q) { System.out.printf("n=%d, q=%d, q1=%d%n", n, q, q1); System.exit(22); } }; ```
25-05-2024