Bug ID: JDK-8007898 Incorrect optimization of Memory Barriers in Matcher::post_store_load

JDK-8007898 : Incorrect optimization of Memory Barriers in Matcher::post_store_load_barrier()

Type: Bug
Component: hotspot
Sub-Component: compiler
Affected Version: hs24,hs25

Priority: P3
Status: Resolved
Resolution: Fixed

Submitted: 2013-02-09
Updated: 2013-09-04
Resolved: 2013-07-16

Versions (Unresolved/Resolved/Fixed)

The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.

JDK 7	JDK 8	Other
7u40Fixed	8Fixed	hs24Fixed

Related Reports

Relates :

JDK-8009120 - Fuzz instruction scheduling in HotSpot compilers

Description

Hi,

we contributed a change fixing memory ordering in taskqueue (8006971).
In the course of the discussion Alexey Shipilev pointed us to the
concurrency torture tests.

While running these tests we found one test, for which, in our eyes,
Matcher::post_store_load_barrier() generates wrong IR.  Nevertheless
the test passes with openJDK, because lcm establishes a proper ordering.

We would like to know whether this happens by accident, or whether
this is designed into lcm (and OptoScheduling).

I have written a self-contained test program similar to "DekkerTest"
which tests volatile read after write.

   class TestData {
                   volatile int a;
                   volatile int b;
   }

   int actor1(TestData t) {
                   t.a = 1;
                   return t.b;
   }

(Whole program under http://cr.openjdk.java.net/~goetz/c2_membar_issue/)

Below you can see a part of the Ideal graph generated for
TestRAW$runner1::run with inlined actor1.
Membar 329 and 332 order Store 328 and Load 339.
Matcher::post_store_load_barrier() turns 329 and 332 into
unnecessary_MemBar_volatile so that the Store and Load are no more
ordered by these ones.

Lcm places Load 339 behind Membar 335 which results in correct code.
Unfortunately there is no edge that forces Load 339 behind Membar 335.

In other words, the ideal graph allows the schedule:
328 StoreI - 339 LoadI - 335 MemBarVolatile and the outcome of lcm
seems to be accidential.


328    StoreI  ===  324  325  327  58  [[ 329  329  332  335 ]]  @TestRAW$TestData+12 *, name=a, idx=10; Volatile!  Memory: @TestRAW$TestData:NotNull+12 *, name=a, idx=10; !jvms: TestRAW::actor1 @ bci:2 TestRAW$runner1::run @ bci:40

329    MemBarVolatile  ===  324  1  328  1  1  328  [[ 330  331 ]]  !jvms: TestRAW::actor1 @ bci:2 TestRAW$runner1::run @ bci:40

330    Proj    ===  329  [[ 332 ]] #0 !jvms: TestRAW::actor1 @ bci:2 TestRAW$runner1::run @ bci:40

332    MemBarVolatile  ===  330  1  325  1  1  328  [[ 333  334 ]]  !jvms: TestRAW::actor1 @ bci:2 TestRAW$runner1::run @ bci:40

333    Proj    ===  332  [[ 335 ]] #0 !jvms: TestRAW::actor1 @ bci:2 TestRAW$runner1::run @ bci:40

334    Proj    ===  332  [[ 326  335  339 ]] #2  Memory: @BotPTR *+bot, idx=Bot; !jvms: TestRAW::actor1 @ bci:2 TestRAW$runner1::run @ bci:40

335    MemBarVolatile  ===  333  1  334  1  1  328  [[ 336  337 ]]  !jvms: TestRAW::actor1 @ bci:2 TestRAW$runner1::run @ bci:40

339    LoadI   ===  315  334  338  [[ 351  340  358 ]]  @TestRAW$TestData+16 *, name=b, idx=11; Volatile! #int !jvms: TestRAW::actor1 @ bci:6 TestRAW$runner1::run @ bci:40

(Whole ideal graph under http://cr.openjdk.java.net/~goetz/c2_membar_issue/)

The question we now have is the following. Is there a mechanism
(e.g. in local code motion) which guarantees that the MemBarVolatile
gets scheduled before the LoadI?

If we change the local code motion, we can get the schedule
StoreI-LoadI-MemBarVolatile causing this test to fail.
lcm changes may make sense for architectures with many registers
which may benefit from longer life ranges.

Kind regards,
Martin and G��tz

Comments

7-critical-request justification: Was found by SAP with jdk7. Small safe changes. The fix is pushed into hs25/jdk8 and will be tested (see https://jbs.oracle.com/bugs/browse/JDK-8020483). Aleksey Shipilev run performance tests and there was no regression.
13-07-2013
For the record, here is code generated for volatile store (with only volatile store in code). Before the fix: 49 MemBarRelease === 5 1 28 1 1 [[ 50 51 ]] !jvms: Thread::<init> @ bci:26 50 Proj === 49 [[ 56 55 ]] #0 !jvms: Thread::<init> @ bci:26 51 Proj === 49 [[ 59 55 ]] #2 Memory: @BotPTR +bot, idx=Bot; !jvms: Thread::<init> @ bci:26 55 StoreI === 50 51 54 32 [[ 56 56 59 ]] @java/lang/Thread+56 , name=threadStatus, idx=8; Volatile! Memory: @java/lang/Thread:NotNull+56 , name=threadStatus, idx=8; !jvms: Thread::<init> @ bci:26 56 MemBarVolatile === 50 1 55 1 1 55 [[ 57 58 ]] !jvms: Thread::<init> @ bci:26 57 Proj === 56 [[ 59 ]] #0 !jvms: Thread::<init> @ bci:26 58 Proj === 56 [[ 52 ]] #2 Memory: @java/lang/Thread+56 , name=threadStatus, idx=8; !jvms: Thread::<init> @ bci:26 59 MemBarVolatile === 57 1 51 1 1 55 [[ 60 61 ]] !jvms: Thread::<init> @ bci:26 60 Proj === 59 [[ 24 ]] #0 !jvms: Thread::<init> @ bci:26 GraphKit::control()->dump(3) = (void) 61 Proj === 59 [[ 52 ]] #2 Memory: @BotPTR +bot, idx=Bot; !jvms: Thread::<init> @ bci:26 52 MergeMem === _ 1 61 1 1 1 1 1 58 [[ 24 ]] { - - - - - N58:java/lang/Thread+56 } Memory: @BotPTR +bot, idx=Bot; !jvms: Thread::<init> @ bci:26 24 SafePoint === 60 6 52 8 9 10 11 12 10 32 1 1 1 1 1 [[]] SafePoint !orig=22,19 After fix: 49 MemBarRelease === 5 1 28 1 1 [[ 50 51 ]] !jvms: Thread::<init> @ bci:26 50 Proj === 49 [[ 56 55 ]] #0 !jvms: Thread::<init> @ bci:26 51 Proj === 49 [[ 52 55 ]] #2 Memory: @BotPTR +bot, idx=Bot; !jvms: Thread::<init> @ bci:26 55 StoreI === 50 51 54 32 [[ 52 ]] @java/lang/Thread+56 , name=threadStatus, idx=8; Volatile! Memory: @java/lang/Thread:NotNull+56 , name=threadStatus, idx=8; !jvms: Thread::<init> @ bci:26 52 MergeMem === _ 1 51 1 1 1 1 1 55 [[ 56 ]] { - - - - - N55:java/lang/Thread+56 * } Memory: @BotPTR +bot, idx=Bot; !jvms: Thread::<init> @ bci:26 56 MemBarVolatile === 50 1 52 1 1 [[ 57 58 ]] !jvms: Thread::<init> @ bci:26 57 Proj === 56 [[ 24 ]] #0 !jvms: Thread::<init> @ bci:26 58 Proj === 56 [[ 59 ]] #2 Memory: @BotPTR +bot, idx=Bot; !jvms: Thread::<init> @ bci:26 59 MergeMem === _ 1 58 1 [[ 24 ]] { - } Memory: @BotPTR *+bot, idx=Bot; !jvms: Thread::<init> @ bci:26 24 SafePoint === 57 6 59 8 9 10 11 12 10 32 1 1 1 1 1 [[]] SafePoint !orig=22,19
11-07-2013
I did not look on it yet, busy with P2 for 7u40.
09-07-2013
JDK 8b96-ea, Solaris 10, 2x8x2 SandyBridge: $ jdk1.8.0/fastdebug/bin/java -XX:+StressLCM -XX:+StressGCM -jar jcstress.jar -t ".seqcst." ... org.openjdk.jcstress.tests.seqcst.volatiles.L1_L1_S1_L2__S2_L1__Test: Observed forbidden state: [0, 0, 0, 0] org.openjdk.jcstress.tests.seqcst.volatiles.L1_L1_S2_L1__S1_L2__Test: Observed forbidden state: [0, 0, 0, 0] org.openjdk.jcstress.tests.seqcst.volatiles.L1_L1__S1_L2__S2_L1__Test: Observed forbidden state: [0, 0, 0, 0] org.openjdk.jcstress.tests.seqcst.volatiles.L1_L1__S1_L2__S2_L1__Test: Observed forbidden state: [1, 1, 0, 0] org.openjdk.jcstress.tests.seqcst.volatiles.L1_L1__S1_L2__S2_L1__Test: Observed forbidden state: [0, 1, 0, 0] org.openjdk.jcstress.tests.seqcst.volatiles.L1_L2_S1_L2__S2_L1__Test: Observed forbidden state: [0, 0, 0, 0] ... Grand total of 557 tests failed, all of them with the similar pattern S2_L1, meaning the "store to volatile X2" before "load from volatile X1". Quick glance on the number of test scenarios show that the possible executions involve reordering S2 and L1, leading to the scenario described by the original issue.
09-07-2013
Vladimir, any updates on this bug? I have a plenty of tests in jcstress [1] failing in the similar way under -XX:+StressLCM -XX:+StressGCM on x86_64. [1] http://openjdk.java.net/projects/code-tools/jcstress/
09-07-2013
On 2/11/13 2:03 AM, Doerr, Martin wrote: > Hi Vladimir, > > thanks for filing the bug. > Following the memory edges in addition will indeed fix the problem. > > However, we could end up in emitting the MemBarVolatile twice. > 332 would become a necessary MemBar because of the Proj(334) and LoadI(339) > which fixes the problem. But the MemBarVolatile 335 is still there, but > the current implementation would not recognize it as unnecessary. Martin, An other issue here is aliasing of membars. The reason we have 3 membars at this part of code is they are aliased to different memory slices (see memory Proj nodes below) but post_store_load_barrier() treat them as equal barriers. Which is correct since in hardware they are. Membars are generated in Parse::do_put_xxx() method. And I need to ask around why we don't use one fat membar there. Thanks, Vladimir 328 StoreI === 324 325 327 58 [[ 329 329 332 335 ]] @TestRAW$TestData+12 , name=a, idx=10; Volatile! Memory: @TestRAW$TestData:NotNull+12 , name=a, idx=10; !jvms: TestRAW::actor1 @ bci:2 TestRAW$runner1::run @ bci:40 329 MemBarVolatile === 324 1 328 1 1 328 [[ 330 331 ]] !jvms: TestRAW::actor1 @ bci:2 TestRAW$runner1::run @ bci:40 331 Proj === 329 [[ 326 ]] #2 Memory: @TestRAW$TestData+12 , name=a, idx=10; !jvms: TestRAW::actor1 @ bci:2 TestRAW$runner1::run @ bci:40 332 MemBarVolatile === 330 1 325 1 1 328 [[ 333 334 ]] !jvms: TestRAW::actor1 @ bci:2 TestRAW$runner1::run @ bci:40 334 Proj === 332 [[ 326 335 339 ]] #2 Memory: @BotPTR +bot, idx=Bot; !jvms: TestRAW::actor1 @ bci:2 TestRAW$runner1::run @ bci:40 335 MemBarVolatile === 333 1 334 1 1 328 [[ 336 337 ]] !jvms: TestRAW::actor1 @ bci:2 TestRAW$runner1::run @ bci:40 337 Proj === 335 [[ 326 ]] #2 Memory: @TestRAW+16 , name=start_trigger, idx=5; !jvms: TestRAW::actor1 @ bci:2 TestRAW$runner1::run @ bci:40 339 LoadI === 315 334 338 [[ 351 340 358 ]] @TestRAW$TestData+16 , name=b, idx=11; Volatile! #int !jvms: TestRAW::actor1 @ bci:6 TestRAW$runner1::run @ bci:40
13-02-2013
You are right, and I think the problem is in the next wrong assumption in Matcher::post_store_load_barrier(): if (x->is_MemBar()) { // We must retain this membar if there is an upcoming volatile // load, which will be preceded by acquire membar. if (xop == Op_MemBarAcquire) return false; MemBarAcquire is following a volatile load and not preceding. So we need to fix post_store_load_barrier() to follow not only control edges but also memory.
09-02-2013