Bug ID: JDK-8329077 C2 SuperWord: MoveF2I cases do not vectorize

Other
tbdUnresolved

FYI I'm currently working on the implementation
04-07-2025
[~galder] Are you still interested in this issue?
02-06-2025
[~galder] I changed the title from "C2: MemorySegment double accesses don't vectorize" to "C2 SuperWord: MoveF2I cases do not vectorize" Because it is not really about MemorySegment, but core SuperWord functionality, as the Test.java shows.
07-01-2025
[~roland] I had asked you on Skack on April 23, without any response. You can keep it, if you want.
07-05-2024
Well wait, that actually was no surprise: ConvF2I actually converts the numbers, but MoveF2I just reinterprets them! Ok, what are the correponding Vector operations? For ConvF2I, we pick it in VectorCastNode::opcode Maybe I would have to make a special exception there, so that I am not producing a VectorCastF2X, but a VectorReinterpret. The only tricky thing is that VectorCastNode::implemented only seems to pass the dst_type to the Matcher::match_rule_supported_auto_vectorization, and not the src_type. Not sure what to do about that... Maybe I have to create a new "VectorReinterpretNode::opcode / ::implemented" method, in parallel to those for the casts. How can we check if they are implemented? I see some checks in src/hotspot/share/opto/vectorIntrinsics.cpp arch_supports_vector(Op_VectorReinterpret, mem_num_elem, mem_elem_bt, VecMaskNotUsed) I'd have to look more closely how that one gets processed...
07-05-2024
Hoping it's ok it's definitely not enough. Asking is a minimum. So no, assigning to yourself a bug that was assigned to someone else without any form of prior discussion is not ok. I'm not sure what made you think this was not actively being worked on. Please don't do that. In this particular case, though, we hadn't spent a lot of time on our side working on it (but we could have) so you can keep it.
07-05-2024
Looking into x64 AD files: instruct convF2I_reg_reg(rRegI dst, regF src, rFlagsReg cr) %{ match(Set dst (ConvF2I src)); effect(KILL cr); format %{ "convert_f2i $dst, $src" %} ins_encode %{ __ convertF2I(T_INT, T_FLOAT, $dst$$Register, $src$$XMMRegister); %} ins_pipe(pipe_slow); %} instruct MoveF2I_stack_reg(rRegI dst, stackSlotF src) %{ match(Set dst (MoveF2I src)); effect(DEF dst, USE src); ins_cost(125); format %{ "movl $dst, $src\t# MoveF2I_stack_reg" %} ins_encode %{ __ movl($dst$$Register, Address(rsp, $src$$disp)); %} ins_pipe(ialu_reg_mem); %} instruct MoveF2I_reg_stack(stackSlotI dst, regF src) %{ match(Set dst (MoveF2I src)); effect(DEF dst, USE src); ins_cost(95); // XXX format %{ "movss $dst, $src\t# MoveF2I_reg_stack" %} ins_encode %{ __ movflt(Address(rsp, $dst$$disp), $src$$XMMRegister); %} ins_pipe(pipe_slow); %} instruct MoveF2I_reg_reg(rRegI dst, regF src) %{ match(Set dst (MoveF2I src)); effect(DEF dst, USE src); ins_cost(85); format %{ "movd $dst,$src\t# MoveF2I" %} ins_encode %{ __ movdl($dst$$Register, $src$$XMMRegister); %} ins_pipe( pipe_slow ); %} ------------------- Well, ConvF2I actually does a call to "C2_MacroAssembler::convertF2I", which is not just a simple no-op, but checks the value, and does some cleanup. If I read the stubs right, this is for the NaN values. ------------------ But I think MoveF2I directly does the move, and no cleanup... so these have different semantics.
07-05-2024
Trying to figure out what is the difference between "MoveF2I" and "ConvF2I" and their analogue for I/L and F/D. Uses of "MoveF2I" is quite limited: case vmIntrinsics::_floatToRawIntBits: result = new MoveF2INode(arg); break; case vmIntrinsics::_floatToIntBits: { .... ads some branching for the NaN ... phi->init_req(2, _gvn.transform(new MoveF2INode(arg))); Node* MoveI2FNode::Identity(PhaseGVN* phase) { if (in(1)->Opcode() == Op_MoveF2I) { return in(1)->in(1); } return this; } const Type* MoveF2INode::Value(PhaseGVN* phase) const { const Type t = phase->type( in(1) ); if( t == Type::TOP ) return Type::TOP; if( t == Type::FLOAT ) return TypeInt::INT; const TypeF tf = t->is_float_constant(); JavaValue v; v.set_jfloat(tf->getf()); return TypeInt::make( v.get_jint() ); } bool LibraryCallKit::inline_vector_reduction() { ... switch (elem_bt) { case T_BYTE: case T_SHORT: case T_INT: { bits = gvn().transform(new ConvI2LNode(value)); break; } case T_FLOAT: { value = gvn().transform(new MoveF2INode(value)); bits = gvn().transform(new ConvI2LNode(value)); break; } case T_DOUBLE: { bits = gvn().transform(new MoveD2LNode(value)); break; } bool PhaseIdealLoop::intrinsify_fill(IdealLoopTree* lpt) { ... // Convert float/double to int/long for fill routines if (t == T_FLOAT) { store_value = new MoveF2INode(store_value); _igvn.register_new_node_with_optimizer(store_value); } else if (t == T_DOUBLE) { store_value = new MoveD2LNode(store_value); _igvn.register_new_node_with_optimizer(store_value); } So far, it almost looks like these nodes basically do the same thing... why do we have them? Let's also look into the backend, to see what they produce.
07-05-2024
I'll work on this on the side, hope that is ok. ./java -XX:CompileOnly=Test::test5 -Xbatch -XX:+TraceLoopOpts -XX:CompileCommand=TraceAutoVectorization,Test::test*,SW_REJECTIONS -XX:+TraceNewVectors Test.java
07-05-2024
Here an extract of some IR tests I have in a draft elsewhere, feel free to use it here: @Test @IR(counts = {IRNode.STORE_VECTOR, "= 0"}, applyIfPlatform = {"64-bit", "true"}, applyIfCPUFeatureOr = {"sse4.1", "true", "asimd", "true"}) // In theory, one would expect this to be a simple 4byte -> 4byte conversion. // But there is a CmpF and CMove here because we check for isNaN. Plus a MoveF2I. // // Would be nice to vectorize: Missing support for CmpF, CMove and MoveF2I. static Object[] test5(int[] a, float[] b) { for (int i = 0; i < a.length; i++) { a[i] = Float.floatToIntBits(b[i]); } return new Object[]{ a, b }; } @Test @IR(counts = {IRNode.STORE_VECTOR, "= 0"}, applyIfPlatform = {"64-bit", "true"}, applyIfCPUFeatureOr = {"sse4.1", "true", "asimd", "true"}) // Missing support for MoveF2I static Object[] test6(int[] a, float[] b) { for (int i = 0; i < a.length; i++) { a[i] = Float.floatToRawIntBits(b[i]); } return new Object[]{ a, b }; } @Test @IR(counts = {IRNode.STORE_VECTOR, "= 0"}, applyIfPlatform = {"64-bit", "true"}, applyIfCPUFeatureOr = {"sse4.1", "true", "asimd", "true"}) // Missing support for MoveI2F static Object[] test7(int[] a, float[] b) { for (int i = 0; i < a.length; i++) { b[i] = Float.intBitsToFloat(a[i]); } return new Object[]{ a, b }; } @Test @IR(counts = {IRNode.STORE_VECTOR, "= 0"}, applyIfPlatform = {"64-bit", "true"}, applyIfCPUFeatureOr = {"sse4.1", "true", "asimd", "true"}) // Missing support for Needs CmpD, CMove and MoveD2L static Object[] test8(long[] a, double[] b) { for (int i = 0; i < a.length; i++) { a[i] = Double.doubleToLongBits(b[i]); } return new Object[]{ a, b }; } @Test @IR(counts = {IRNode.STORE_VECTOR, "= 0"}, applyIfPlatform = {"64-bit", "true"}, applyIfCPUFeatureOr = {"sse4.1", "true", "asimd", "true"}) // Missing support for MoveD2L static Object[] test9(long[] a, double[] b) { for (int i = 0; i < a.length; i++) { a[i] = Double.doubleToRawLongBits(b[i]); } return new Object[]{ a, b }; } @Test @IR(counts = {IRNode.STORE_VECTOR, "= 0"}, applyIfPlatform = {"64-bit", "true"}, applyIfCPUFeatureOr = {"sse4.1", "true", "asimd", "true"}) // Missing support for MoveL2D static Object[] test10(long[] a, double[] b) { for (int i = 0; i < a.length; i++) { b[i] = Double.longBitsToDouble(a[i]); } return new Object[]{ a, b }; }
23-04-2024
This really does seem to be a missing feature of SuperWord. I also don't see these to be vectorized, for the same reason: for (int i = 0; i < a.length; i++) { a[i] = Double.doubleToRawLongBits(b[i]); } for (int i = 0; i < a.length; i++) { b[i] = Double.longBitsToDouble(a[i]); } Note, that this also blocks: for (int i = 0; i < a.length; i++) { a[i] = Double.doubleToLongBits(b[i]); } But there also seems to be an additional issue with Cmove / CmpF here. I have not investigated more. And the exactly same issues exist for Float.
23-04-2024
Yes, I added it to my list in: JDK-8317424: C2 SuperWord Umbrella: improvements
27-03-2024
Converting this to an enhancement. [~epeter], you might want to link this to the other superword enhancements.
27-03-2024