JDK-8234977 : [Aarch64] C1 lacks a membar store in slowpath of allocate_array
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 11
  • Priority: P2
  • Status: Closed
  • Resolution: Duplicate
  • OS: generic
  • CPU: aarch64
  • Submitted: 2019-11-29
  • Updated: 2023-07-21
  • Resolved: 2023-07-21
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 14
14Resolved
Related Reports
Duplicate :  
Description
The ARMv8-A architecture employs a weakly ordered model of memory[1]. Some armv8 processors might reorder stores. We caught this issue when we run jcstress.  
eg. org.openjdk.jcstress.tests.defaultValues.arrays.small.plain.ByteTest

by putting ParallelGC and C1 together: 
java -XX:TieredStopAtLevel=1 -XX:CompileOnly=ByteTest::actor1 -XX:+PrintCompilation -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly -XX:+UseParallelGC  ByteTest

we observe OoBoundary Exception or Wrong answer for the test. 
java.lang.ArrayIndexOutOfBoundsException: 0
	at org.openjdk.jcstress.tests.defaultValues.arrays.small.plain.ByteTest.actor2(ByteTest.java:55)
	at org.openjdk.jcstress.tests.defaultValues.arrays.small.plain.ByteTest_jcstress.actor2(ByteTest_jcstress.java:193)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

We believe the root cause is putfield happened before the completion of the bytecode newarray.  actor2 peeked an oop in wrong state.  
  public void actor1();
    Code:
       0: aload_0
       1: iconst_4
       2: newarray       byte
       4: putfield      #2                  // Field arr:[B
       7: return

G1 is tricky because g1 emits a membar(Assembler::StoreLoad) in its write_barrier_post.  The most obvious example comes from parallelGC as follows. 0x0000ffff893382fc-308 is the slowpath of allocate_array. please refer to LIR_Assembler::emit_alloc_array(c1_LIRAssembler_aarch64.cpp)

  0x0000ffff893382c8:   dmb ishst                       ;*newarray {reexecute=0 rethrow=0 return_oop=0}
                                                            ; - ByteTest::actor1@2 (line 5)
  0x0000ffff893382cc:   lsr x8, x0, #3
  0x0000ffff893382d0:   str w8, [x1, #12]
  0x0000ffff893382d4:   lsr x0, x1, #9
  0x0000ffff893382d8:   mov x1, #0x3000                 // #12288
  0x0000ffff893382dc:   movk    x1, #0xa3b9, lsl #16
  0x0000ffff893382e0:   movk    x1, #0xffff, lsl #32
  0x0000ffff893382e4:   strb    wzr, [x0, x1, lsl #0]       ;*putfield arr {reexecute=0 rethrow=0 return_oop=0}
                                                            ; - ByteTest::actor1@4 (line 5)
  0x0000ffff893382e8:   ldp x29, x30, [sp, #48]
  0x0000ffff893382ec:   add sp, sp, #0x40
  0x0000ffff893382f0:   ldr x8, [x28, #264]
  0x0000ffff893382f4:   ldr wzr, [x8]                   ;   {poll_return}
  0x0000ffff893382f8:   ret
  0x0000ffff893382fc:   adrp    x8, 0x0000ffff88901000      ;   {runtime_call new_type_array Runtime1 stub}
  0x0000ffff89338300:   add x8, x8, #0x300
  0x0000ffff89338304:   blr x8                          ; ImmutableOopMap {c_rarg1=Oop }
                                                            ;*newarray {reexecute=0 rethrow=0 return_oop=0}
                                                            ; - ByteTest::actor1@2 (line 5)
  0x0000ffff89338308:   b   0x0000ffff893382cc

The jump at 0x0000ffff89338308 misses dmb ishst.  This is not the case for C2.

Reference: 
[1] https://developer.arm.com/docs/100941/0100/the-memory-model
Comments
JDK-8233839
04-12-2019

As Doerr, Martin pointed out, dtor of ThreadInVMfromJava introduced by JRT_ENTRY will insert a full memory fence. Therefore, it's not necessary to have a membar(StoreStore) for the continuation. for the problem we observed, It's a duplication of JDK-8233839.
04-12-2019

The attachment hotspot_pid308_jdk13.log is the log from real jcstress test. the actor1 is same as ByteTest.java. there're two versions of actor1 in it. c2 correctly jump back to the membar while c1 misses it.
29-11-2019