The fix for 6385730 modified LoadKlass to always use that original memory state as a way of improving commoning of the LoadKlass calls. In some very limited cases this can confuse the anti dependence logic resulting in unschedulable graphs. in addition to using immutable memory there's no need for LoadKlass and LoadRange to have antidependence edges. Here's a test case. The salient features that seem required to trigger the problem as the fact that the intrinsic for currentThread returns a nonull instead and that existence of multiple subclasses triggers a chain of loadKlasses for the checkcast. Here's the test case.
public class ct extends Thread {
static class ct0 extends ct {
public void message() {
// System.out.println("message");
}
public void run() {
message();
ct0 ct = (ct0) Thread.currentThread();
}
}
static class ct1 extends ct0 {
public void message() {
// System.out.println("message");
}
}
static class ct2 extends ct0 {
public void message() {
// System.out.println("message");
}
}
static {
new ct0();
new ct1();
new ct2();
}
public static void main(String[] args) throws Exception {
for (int i = 0; i < 100000; i++) {
Thread t = null;
switch (i % 3) {
case 0: t = new ct0(); break;
case 1: t = new ct1(); break;
case 2: t = new ct2(); break;
}
t.start();
t.join();
}
}
}
And here's the unschedulable block after gcm:
B2: # B4 B3 <- B1 Freq: 1
16 Start === 16 1 [[ 16 15 17 18 19 20 27 0 24 ]] #{0:control, 1:abIO, 2:memory, 3:rawptr:BotPTR, 4:return_address, 5:ct$ct0:NotNull *} !jvms: ct$ct0::run @ bci:7
15 MachProj === 16 [[ 13 ]] #0/unmatched !jvms: ct$ct0::run @ bci:-1
17 MachProj === 16 [[ 13 ]] #1/unmatched !jvms: ct$ct0::run @ bci:-1
18 MachProj === 16 [[ 13 9 22 ]] #2/unmatched Memory: @BotPTR *+bot, idx=Bot; !jvms: ct$ct0::run @ bci:-1
19 MachProj === 16 [[ 13 4 2 29 31 ]] #3 !jvms: ct$ct0::run @ bci:-1
20 MachProj === 16 [[ 13 ]] #5 Oop:ct$ct0:NotNull * !jvms: ct$ct0::run @ bci:-1
27 MachProj === 16 [[ 4 29 31 ]] #4 !jvms: ct$ct0::run @ bci:-1
12 MachProj === 13 [[ 11 ]] #0/unmatched !jvms: ct$ct0::run @ bci:1
21 MachProj === 13 [[ 11 4 29 31 33 ]] #1/unmatched !jvms: ct$ct0::run @ bci:1
9 loadKlass === _ 18 22 [[ 8 13 ]] * Klass: *
22 loadKlass === _ 18 23 [[ 9 ]] klass java/lang/Thread: 0x080de410 * Klass:klass java/lang/Thread: 0x080de410 * !jvms: ct$ct0::run @ bci:7
23 loadP === _ 26 24 [[ 22 4 ]] java/lang/Thread:NotNull * Oop:java/lang/Thread:NotNull * !jvms: ct$ct0::run @ bci:4
26 MachProj === 13 [[ 23 4 29 31 ]] #2/unmatched Memory: @BotPTR *+bot, idx=Bot; !jvms: ct$ct0::run @ bci:1
13 CallDynamicJavaDirect === 15 17 18 19 0 20 0 0 | 9 [[ 14 12 21 26 ]] Dynamic ct$ct0::message # void ( ct$ct0:NotNull * ) ct$ct0::run @ bci:1 !jvms: ct$ct0::run @ bci:1
14 MachProj === 13 [[]] #10005/fat
0 Con === 16 [[]] #top
24 tlsLoadP === 16 [[ 25 23 ]]
25 MachProj === 24 [[]] #1
11 Catch === 12 21 [[ 10 32 ]] !jvms: ct$ct0::run @ bci:1
10 CatchProj === 11 [[ 37 ]] #0@bci -1 !jvms: ct$ct0::run @ bci:1
32 CatchProj === 11 [[ 39 ]] #1@bci -1 !jvms: ct$ct0::run @ bci:1
The problematic nodes are these:
24 tlsLoadP === 16 [[ 25 23 ]]
9 loadKlass === _ 18 22 [[ 8 13 ]] * Klass: *
22 loadKlass === _ 18 23 [[ 9 ]] klass java/lang/Thread: 0x080de410 * Klass:klass java/lang/Thread: 0x080de410 * !jvms: ct$ct0::run @ bci:7
13 CallDynamicJavaDirect === 15 17 18 19 0 20 0 0 | 9 [[ 14 12 21 26 ]] Dynamic ct$ct0::message # void ( ct$ct0:NotNull * ) ct$ct0::run @ bci:1 !jvms: ct$ct0::run @ bci:1
26 MachProj === 13 [[ 23 4 29 31 ]] #2/unmatched Memory: @BotPTR *+bot, idx=Bot; !jvms: ct$ct0::run @ bci:1
23 loadP === _ 26 24 [[ 22 4 ]] java/lang/Thread:NotNull * Oop:java/lang/Thread:NotNull * !jvms: ct$ct0::run @ bci:4
The precendence edge from on 13 to 9 is what creates the cycle. The loadP gets scheduled into the same block as the call because it consumes the memory of the call. After schedule_local, the nodes following the call will be cloned into the successors. It seems like we're getting into trouble because they happen to be in the same block at this point. I don't really understand why we're putting precedence edges on calls in the first place since the whole point of the memory graph is to guarantee proper ordering and anything that consumes all of memory should be completely safe.