See Test.java of test/compiler/6431242, the function:
public void test_copy_longs_store_reversed(long[] dst, long[] src) {
for(int i=0; i<src.length; i++) {
dst[i] = Long.reverseBytes(1 + src[i]);
}
}
-XX:PrintOptoAssembly :
02c B3: # B16 B4 <- B2 B6 Loop: B3-B6 inner stride: not constant pre of N164 Freq: 13469.202c + SLL R
_L5,#2,R_L2
030 + CMP R_L5,R_L1 ! unsigned
034 BPuge icc_U,B16 P=0.000001 C=-1.000000
034 ADD R_I2,R_L2,R_L0
034
03c B4: # B15 B5 <- B3 Freq: 13469.2
03c + LDUW [R_L0 + #12],R_L0
040 + ADD R_L0,#1,R_L3
044 + STW R_L3,[SP + #64] !stk
048 + LDUWA [SP + #64], R_L4 !asi
050 LDUW [R_I1 + #8],R_L3 ! range
054 NullCheck R_I1
054
054 B5: # B14 B6 <- B4 Freq: 13469.1
054 + CMP R_L5,R_L3 ! unsigned
058 BPuge icc_U,B14 P=0.000001 C=-1.000000
058
060 B6: # B3 B7 <- B5 Freq: 13469.1
060 + ADD R_I1,R_L2,R_L2
064 + ADD R_L5,#1,R_L5
068 + CMP R_L5,#1
06c BPlt ccr,B3 ! Loop end P=0.888820 C=7131.000000
06c STW R_L4,[R_L2 + #12]
The problem for pre- and post- loop where generated code still spill on stack first and read back reversely. The expected result is take the addI(P) result and store byte reversely into destination.
test_copy_ints_store_reversed and test_copy_long_store_reversed have same result.