The ARMv8-A architecture employs a weakly ordered model of memory[1]. Some armv8 processors might reorder stores. We caught this issue when we run jcstress.
eg. org.openjdk.jcstress.tests.defaultValues.arrays.small.plain.ByteTest
by putting ParallelGC and C1 together:
java -XX:TieredStopAtLevel=1 -XX:CompileOnly=ByteTest::actor1 -XX:+PrintCompilation -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly -XX:+UseParallelGC ByteTest
we observe OoBoundary Exception or Wrong answer for the test.
java.lang.ArrayIndexOutOfBoundsException: 0
at org.openjdk.jcstress.tests.defaultValues.arrays.small.plain.ByteTest.actor2(ByteTest.java:55)
at org.openjdk.jcstress.tests.defaultValues.arrays.small.plain.ByteTest_jcstress.actor2(ByteTest_jcstress.java:193)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
We believe the root cause is putfield happened before the completion of the bytecode newarray. actor2 peeked an oop in wrong state.
public void actor1();
Code:
0: aload_0
1: iconst_4
2: newarray byte
4: putfield #2 // Field arr:[B
7: return
G1 is tricky because g1 emits a membar(Assembler::StoreLoad) in its write_barrier_post. The most obvious example comes from parallelGC as follows. 0x0000ffff893382fc-308 is the slowpath of allocate_array. please refer to LIR_Assembler::emit_alloc_array(c1_LIRAssembler_aarch64.cpp)
0x0000ffff893382c8: dmb ishst ;*newarray {reexecute=0 rethrow=0 return_oop=0}
; - ByteTest::actor1@2 (line 5)
0x0000ffff893382cc: lsr x8, x0, #3
0x0000ffff893382d0: str w8, [x1, #12]
0x0000ffff893382d4: lsr x0, x1, #9
0x0000ffff893382d8: mov x1, #0x3000 // #12288
0x0000ffff893382dc: movk x1, #0xa3b9, lsl #16
0x0000ffff893382e0: movk x1, #0xffff, lsl #32
0x0000ffff893382e4: strb wzr, [x0, x1, lsl #0] ;*putfield arr {reexecute=0 rethrow=0 return_oop=0}
; - ByteTest::actor1@4 (line 5)
0x0000ffff893382e8: ldp x29, x30, [sp, #48]
0x0000ffff893382ec: add sp, sp, #0x40
0x0000ffff893382f0: ldr x8, [x28, #264]
0x0000ffff893382f4: ldr wzr, [x8] ; {poll_return}
0x0000ffff893382f8: ret
0x0000ffff893382fc: adrp x8, 0x0000ffff88901000 ; {runtime_call new_type_array Runtime1 stub}
0x0000ffff89338300: add x8, x8, #0x300
0x0000ffff89338304: blr x8 ; ImmutableOopMap {c_rarg1=Oop }
;*newarray {reexecute=0 rethrow=0 return_oop=0}
; - ByteTest::actor1@2 (line 5)
0x0000ffff89338308: b 0x0000ffff893382cc
The jump at 0x0000ffff89338308 misses dmb ishst. This is not the case for C2.
Reference:
[1] https://developer.arm.com/docs/100941/0100/the-memory-model