JDK-8306302 : C2 Superword fix: use VectorMaskCmp and VectorBlend instead of CMoveVF/D
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 17,20,21
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • CPU: x86_64
  • Submitted: 2023-04-18
  • Updated: 2023-08-03
  • Resolved: 2023-05-24
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 21
21 b25Fixed
Related Reports
Duplicate :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Description
I found this during work of JDK-8306088 / JDK-8304720.

With appended Test.java run like this:

./java -Xcomp -XX:CompileCommand=compileonly,Test::test -XX:+TraceLoopOpts -XX:+UseVectorCmov -XX:+UseCMoveUnconditionally -XX:+TraceSuperWord -XX:MaxVectorSize=32 -XX:+TraceNewVectors Test.java

I get:

# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/home/emanuel/Documents/fork7-jdk/open/src/hotspot/share/opto/node.hpp:392), pid=21814, tid=21831
#  assert(i < _max) failed: oob: i=1, _max=1
#
# JRE version: Java(TM) SE Runtime Environment (21.0) (slowdebug build 21-internal-LTS-2023-04-17-1039212.emanuel...)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (slowdebug 21-internal-LTS-2023-04-17-1039212.emanuel..., compiled mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0x416445]  Node::in(unsigned int) const+0x41

Note: the body of the loop can contain the equals check, or non-equals - it fails for both:
dataFc[i] = (dataFa[i] == dataFb[i]) ? dataFa[i] : dataFb[i];
dataFc[i] = (dataFa[i] != dataFb[i]) ? dataFa[i] : dataFb[i];

My guess is that something is wrong with the "EQ/NEQ tests (cmpOpUCF2)" in x86.
Comments
Changeset: beb75e65 Author: Emanuel Peter <epeter@openjdk.org> Date: 2023-05-24 07:00:27 +0000 URL: https://git.openjdk.org/jdk/commit/beb75e651f1e4a9bd21f611f9abc7ca28afbae31
24-05-2023

A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/13493 Date: 2023-04-17 13:14:37 +0000
09-05-2023

I know what the problem is with Test2.java The code in this function is wrong: https://github.com/openjdk/jdk/commit/0485593fbc4a3264b79969de192e8e7d36e5b590#diff-7070c036c7d88ba4a8467e404d8d88aee646b97bf7bacc8b73cbc93f3ef11d2dR2097 To understand this, we must know that: CmpF a b Is based on the java byte-code "fcmpl". It returns -1 if any operand is NaN. -1: b larger or any NaN 0: a equals b 1: a is larger In our example java code, we have: r[i] = (a[i] > b[i]) ? a[i] : b[i]; This means we would like to have a[i], iff "a[i] GT_OQ b[i]". (O for ordered means that if there is a NaN, we have false -> b[i]) Inversely, we could also say we want b[i] iff "a[i] LE_UQ b[i]". (Unordered means that if there is a NaN, we get true -> b[i]) Note: LE_UQ == NGT_UQ (Q stands for Quiet, S for signaling. The difference is if we want to signal for a jump, or not signal because we are computing a mask) javac turns this into that: load a[i] load b[i] fcmpl if_le: load b[i] else: load a[i] store r[i] So if cmpl returns -1 or 0, we take b[i], and we only take a[i] if it is larger (and no NaN). Note: the "le" does not mean we compare "a le b". It means that the return code of "fcmpl" is "le = [-1, 0]". The only difference here is with NaN's. If we have a NaN: "a <= b" -> false (any comparison with NaN is false) "a > b" -> false CmpF a b [le] -> true (fcmpl returns -1, and that is in le) The code in "cmpOp_vcmppd" is wrong, here an example: Imagine we had a pattern like this: CmpF a b Bool [le] We turn this into a: CMoveVF a b [le] The "cmpOp_vcmppd" now maps the "copnd == le": int cond = (Assembler::Condition)($copnd$$cmpcode); This is somehow metaprogrammed to go from "cmpcode" -> "ccode" -> "COND_INTER" in "cmpOp_vcmppd". So we map "le" -> "less_equal" -> 0x2 In "enum ComparisonPredicateFP", this equates to "LE_OS". But form what I stated above, it should be "LE_UQ == NGT_UQ". Generally, I'm not sure why "cmpOp_vcmppd" has a mix of S and Q. Generally, I think the person who wrote it thought that "Bool le" translates to "LE_OQ" directly, and forgot that "le" is on the return code of CmpF. Once I replcaed the code there with "NGT_UQ", my Test2.java passes. Let's play the same game with a few more examples: "eq": "CmpF a b == 0": equal, no NaN : EQ_OQ (that was correct) "lt": less-than or NaN: LT_UQ (was wrong, had LT_OS) "le": less-equal or NaN: LE_UQ (was wrong, had LE_OS) "ne": not-equal or NaN: NEQ_UQ (was wrong, had NEQ_OQ) "ge": greater or equal, no NaN: GE_OQ (was close enough with GE_OS) "gt": greater, no NaN: GT_OQ (was close enough with GT_OS) This was a great exercise. But I will delete this code anyway, with the removal of "CMoveVF/D". But I will have to handle similar cases when going from "Cmp Bool" -> "VectorMaskCmp", since the vector compare turns all compares into OQ (except NE), see "booltest_pred_to_comparison_pred_fp".
04-05-2023

I found a new issue: SuperWord gets the comparisons with NaN wrong (UseVectorCmov). ./java -Xbatch -XX:CompileCommand=compileonly,Test2::test -XX:+TraceLoopOpts -XX:+UseVectorCmov -XX:+UseCMoveUnconditionally -XX:+TraceSuperWord -XX:+TraceNewVectors -XX:+Verbose -XX:UseAVX=2 -XX:MaxVectorSize=32 Test2.java This seems also to happen with JDK17, have not tested it further back.
04-05-2023

Thanks [~fgao] for the hints. I will have a look at it. It would be nice if we could even extend the logic to use different arguments for the "VectorMaskCmp" than for the "VectorBlend". I will see what I can do. I'll have to learn a bit more about how the matcher rules work.
28-04-2023

Hi [~epeter], thanks for your analysis! The restructure in matching here, https://github.com/openjdk/jdk/blob/af4d5600e37ec6d331e62c5d37491ee97cad5311/src/hotspot/share/opto/matcher.cpp#L2394, is to follow the original design in x86 here: https://github.com/openjdk/jdk/blob/af4d5600e37ec6d331e62c5d37491ee97cad5311/src/hotspot/cpu/x86/x86.ad#L5970. Well, as we can see, the info of BoolNode and BoolTest mask are duplicate whether from semantics side or ideal graph side. For aarch64, no Cmp is needed here and only a constant would be enough: https://github.com/openjdk/jdk/blob/af4d5600e37ec6d331e62c5d37491ee97cad5311/src/hotspot/cpu/aarch64/aarch64_vector.ad#L5881. So, I don't know if x86 could construct the info of current BoolNode based on the constant in the backend. If yes, then we can drop the restructuring in matching and keep a constant only, like: `match(Set dst (CMoveVF cond (Binary src1 src2)));`. Thus, the error here won't exist naturally, I suppose, and the code would be clean as well. Also, the semantics of `CMoveVF` is perfectly composed of `VectorMaskCmp` + `VectorBlend`. It's fine for aarch64 to implement `CMoveVF` by `VectorMaskCmp` + `VectorBlend` separately. I don't know if it's okay for x86. If yes, we can implement it in the mid-end and clean rules for `CMoveVF` in the backend. Thanks.
19-04-2023

This is in the way of adding more CMove tests for SuperWord: https://github.com/openjdk/jdk/pull/13493
18-04-2023

The "modified node is not on IGVN._worklist" assert first reproduces after JDK-8255665 in JDK 16.
18-04-2023

The assert gets triggered in the code from this change: JDK-8285973 On this line: https://github.com/openjdk/jdk/commit/c1db70d827f7ac81aa6c6646e2431f672c71c8dc#diff-e5266a3774f26ac663dcc67e0be18608b1735f38c0576673ce36e0cd689bab4aR4309 [~sviswanathan] [~qamai] it seems you two are familiar with this code. Would you want to look at it? From what I see, the issue is this: The problematic line wants to find a Cmp above the Bool, and compare its inputs. But we have no Cmp there, just a constant, that we have set during matching: https://github.com/openjdk/jdk/blob/af4d5600e37ec6d331e62c5d37491ee97cad5311/src/hotspot/share/opto/matcher.cpp#L2394 JDK-8289422 is also related [~fgao] [~pli].
18-04-2023

ILW = Assert during C2 compilation, reproducible with targeted test and UseVectorCmov (non-default), disable superword or compilation of affected method = HLM = P3
18-04-2023

With JDK 16 - JDK 19, we hit: # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (workspace/open/src/hotspot/share/opto/phaseX.cpp:1074), pid=3222333, tid=3222346 # fatal error: modified node is not on IGVN._worklist # # JRE version: Java(TM) SE Runtime Environment (16.0+25) (fastdebug build 16-ea+25-1626) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 16-ea+25-1626, compiled mode, sharing, tiered, compressed oops, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0x15eb5b0] PhaseIterGVN::init_verifyPhaseIterGVN()+0x140
18-04-2023