JDK-6857159 : local schedule failed with checkcast of Thread.currentThread()
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 7
  • Priority: P3
  • Status: Closed
  • Resolution: Fixed
  • OS: solaris_9
  • CPU: sparc
  • Submitted: 2009-07-02
  • Updated: 2015-01-28
  • Resolved: 2011-03-08
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 6 JDK 7 Other
6u18Fixed 7Fixed hs16Fixed
Related Reports
Relates :  
Relates :  
Description
The fix for 6385730 modified LoadKlass to always use that original memory state as a way of improving commoning of the LoadKlass calls.  In some very limited cases this can confuse the anti dependence logic resulting in unschedulable graphs.  in addition to using immutable memory there's no need for LoadKlass and LoadRange to have antidependence edges.  Here's a test case.  The salient features that seem required to trigger the problem as the fact that the intrinsic for currentThread returns a nonull instead and that existence of multiple subclasses triggers a chain of loadKlasses for the checkcast.  Here's the test case.

public class ct extends Thread {
    static class ct0 extends ct {
        public void message() {
            // System.out.println("message");                                                                 
        }

        public void run() {
            message();
            ct0 ct = (ct0) Thread.currentThread();
        }
    }
    static class ct1 extends ct0 {
        public void message() {
            // System.out.println("message");                                                                 
        }
    }
    static class ct2 extends ct0 {
        public void message() {
            // System.out.println("message");                                                                 
        }
    }
    static {
        new ct0();
        new ct1();
        new ct2();
    }

    public static void main(String[] args) throws Exception {
        for (int i = 0; i < 100000; i++) {
            Thread t = null;
            switch (i % 3) {
            case 0: t = new ct0(); break;
            case 1: t = new ct1(); break;
            case 2: t = new ct2(); break;
            }
            t.start();
            t.join();
        }
    }
}

And here's the unschedulable block after gcm:

B2: #   B4 B3 &lt;- B1  Freq: 1
 16     Start   ===  16  1  [[ 16  15  17  18  19  20  27  0  24 ]]  #{0:control, 1:abIO, 2:memory, 3:rawptr:BotPTR, 4:return_address, 5:ct$ct0:NotNull *} !jvms: ct$ct0::run @ bci:7
 15     MachProj        ===  16  [[ 13 ]] #0/unmatched !jvms: ct$ct0::run @ bci:-1
 17     MachProj        ===  16  [[ 13 ]] #1/unmatched !jvms: ct$ct0::run @ bci:-1
 18     MachProj        ===  16  [[ 13  9  22 ]] #2/unmatched  Memory: @BotPTR *+bot, idx=Bot; !jvms: ct$ct0::run @ bci:-1
 19     MachProj        ===  16  [[ 13  4  2  29  31 ]] #3 !jvms: ct$ct0::run @ bci:-1
 20     MachProj        ===  16  [[ 13 ]] #5  Oop:ct$ct0:NotNull * !jvms: ct$ct0::run @ bci:-1
 27     MachProj        ===  16  [[ 4  29  31 ]] #4 !jvms: ct$ct0::run @ bci:-1
 12     MachProj        ===  13  [[ 11 ]] #0/unmatched !jvms: ct$ct0::run @ bci:1
 21     MachProj        ===  13  [[ 11  4  29  31  33 ]] #1/unmatched !jvms: ct$ct0::run @ bci:1
 9      loadKlass       === _  18  22  [[ 8  13 ]]  *  Klass: *
 22     loadKlass       === _  18  23  [[ 9 ]] klass java/lang/Thread: 0x080de410 *  Klass:klass java/lang/Thread: 0x080de410 * !jvms: ct$ct0::run @ bci:7
 23     loadP   === _  26  24  [[ 22  4 ]] java/lang/Thread:NotNull *  Oop:java/lang/Thread:NotNull * !jvms: ct$ct0::run @ bci:4
 26     MachProj        ===  13  [[ 23  4  29  31 ]] #2/unmatched  Memory: @BotPTR *+bot, idx=Bot; !jvms: ct$ct0::run @ bci:1
 13     CallDynamicJavaDirect   ===  15  17  18  19  0  20  0  0  | 9  [[ 14  12  21  26 ]] Dynamic  ct$ct0::message # void ( ct$ct0:NotNull * ) ct$ct0::run @ bci:1 !jvms: ct$ct0::run @ bci:1
 14     MachProj        ===  13  [[]] #10005/fat
 0      Con     ===  16  [[]]  #top
 24     tlsLoadP        ===  16  [[ 25  23 ]]
 25     MachProj        ===  24  [[]] #1
 11     Catch   ===  12  21  [[ 10  32 ]]  !jvms: ct$ct0::run @ bci:1
 10     CatchProj       ===  11  [[ 37 ]] #0@bci -1  !jvms: ct$ct0::run @ bci:1
 32     CatchProj       ===  11  [[ 39 ]] #1@bci -1  !jvms: ct$ct0::run @ bci:1

The problematic nodes are these:

 24     tlsLoadP        ===  16  [[ 25  23 ]]
 9      loadKlass       === _  18  22  [[ 8  13 ]]  *  Klass: *
 22     loadKlass       === _  18  23  [[ 9 ]] klass java/lang/Thread: 0x080de410 *  Klass:klass java/lang/Thread: 0x080de410 * !jvms: ct$ct0::run @ bci:7
13     CallDynamicJavaDirect   ===  15  17  18  19  0  20  0  0  | 9  [[ 14  12  21  26 ]] Dynamic  ct$ct0::message # void ( ct$ct0:NotNull * ) ct$ct0::run @ bci:1 !jvms: ct$ct0::run @ bci:1
 26     MachProj        ===  13  [[ 23  4  29  31 ]] #2/unmatched  Memory: @BotPTR *+bot, idx=Bot; !jvms: ct$ct0::run @ bci:1
 23     loadP   === _  26  24  [[ 22  4 ]] java/lang/Thread:NotNull *  Oop:java/lang/Thread:NotNull * !jvms: ct$ct0::run @ bci:4

The precendence edge from on 13 to 9 is what creates the cycle.  The loadP gets scheduled into the same block as the call because it consumes the memory of the call.  After schedule_local, the nodes following the call will be cloned into the successors.  It seems like we're getting into trouble because they happen to be in the same block at this point.  I don't really understand why we're putting precedence edges on calls in the first place since the whole point of the memory graph is to guarantee proper ordering and anything that consumes all of memory should be completely safe.

Comments
EVALUATION http://hg.openjdk.java.net/jdk7/hotspot-gc/hotspot/rev/f9094a5e1c8a
27-07-2009

EVALUATION http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/f9094a5e1c8a
22-07-2009

EVALUATION I'm looking at a compilation bailout that results from a failure to schedule a chain of loadKlass with an extra precedence edge. This is bug 6857159. This is the cycle that causes the problem: 24 tlsLoadP === 16 [[ 25 23 ]] 9 loadKlass === _ 18 22 [[ 8 13 ]] * Klass: * 22 loadKlass === _ 18 23 [[ 9 ]] klass java/lang/Thread: 0x080de410 * Klass:klass java/lang/Thread: 0x080de410 * !jvms: ct$ct0::run @ bci:7 13 CallDynamicJavaDirect === 15 17 18 19 0 20 0 0 | 9 [[ 14 12 21 26 ]] Dynamic ct$ct0::message # void ( ct$ct0:NotNull * ) ct$ct0::run @ bci:1 !jvms: ct$ct0::run @ bci:1 26 MachProj === 13 [[ 23 4 29 31 ]] #2/unmatched Memory: @BotPTR *+bot, idx=Bot; !jvms: ct$ct0::run @ bci:1 23 loadP === _ 26 24 [[ 22 4 ]] java/lang/Thread:NotNull * Oop:java/lang/Thread:NotNull * !jvms: ct$ct0::run @ bci:4 The code corresponds to a checkcast of Thread.currentThread() to a type in the middle of hierarchy of classes. The test case is in the bug report. The first loadKlass is loading the klass from the header and it's in its own alias class and is marked as not rewritable. This causes the anti dep code to just skip it. The loadKlass with the precedence edge is a load from the superclass display list used by checkcast. Since it's not in it's own alias class we can't set the is_rewritable flag to false so we will actually look for anti deps. Because it uses the same memory input as the call we end up creating an anti dep edge between them even though the loadP is actually dependent on the output of the call. So maybe LoadKlass should return false for needs_anti_dependence_check? The memory edges on them should always guarantee correct placement. The other thing that seems wrong is that the loadP getting the current thread from tls should use immutable_memory instead of the current memory since it's an unchanging value. Either of these changes actually fix the problem but I think skipping the anti-dep check is the more correct one. Prior to that they would have used the same memory as the loadP so it would have worked correctly.
22-07-2009

EVALUATION I think the root of the problem is that we can't tell for the loadKlass that loads from the mirror that it's immutable so don't skip the dependence computation.
02-07-2009