JDK-6843752 : missing code for an anti-dependent Phi in GCM
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 6u13,6u14
  • Priority: P3
  • Status: Closed
  • Resolution: Fixed
  • OS: windows_xp
  • CPU: x86
  • Submitted: 2009-05-21
  • Updated: 2018-03-16
  • Resolved: 2011-03-08
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 6 JDK 7 Other
6u18Fixed 7Fixed hs16Fixed
Related Reports
Duplicate :  
Relates :  
Relates :  
Relates :  
Description
FULL PRODUCT VERSION :
java version "1.6.0_12"
Java(TM) SE Runtime Environment (build 1.6.0_12-b04)
Java HotSpot(TM) Server VM (build 11.2-b01, mixed mode)

FULL OS VERSION :
Microsoft Windows XP [Version 5.1.2600]


A DESCRIPTION OF THE PROBLEM :
We've found one of our programs suddenly started failing in a simple loop; depending on the exact code it would loop infinitely or cause a null pointer exception.  This happened when upgrading to 1.6.0 update 10.

We've now managed to extract the loop and trigger the bug with a simple test program, and found that:

Java 1.6.0 update 6 and update 7 works OK.
Java 1.6.0 update 10 to 13 fails.

It always fails with the "-server" option, but works with "-client".
Adding an assert for the pointer that is null, makes the program work when running with the "-ea" option.
Using the "-Xint" option also makes the program work OK.


THE PROBLEM WAS REPRODUCIBLE WITH -Xint FLAG: No

THE PROBLEM WAS REPRODUCIBLE WITH -server FLAG: Yes

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
run the attached reproduction program using the "-server" option.

EXPECTED VERSUS ACTUAL BEHAVIOR :
expected:

C:\Documents and Settings\arnej\break>java -server -ea BreakJava
successfully performed 1000000 cases
successfully performed 2000000 cases
successfully performed 3000000 cases
successfully performed 4000000 cases
successfully performed 5000000 cases
successfully performed 6000000 cases
successfully performed 7000000 cases
[...etc...]


actual:

C:\Documents and Settings\arnej\break>java -server BreakJava
ERROR: crashed during case 27456
java.lang.NullPointerException
        at BreakJava.removeItems(BreakJava.java:53)
        at BreakJava.perform(BreakJava.java:65)
        at BreakJava.main(BreakJava.java:78)

C:\Documents and Settings\arnej\break>java -server BreakJava
ERROR: crashed during case 10496
java.lang.NullPointerException
        at BreakJava.removeItems(BreakJava.java:53)
        at BreakJava.perform(BreakJava.java:65)
        at BreakJava.main(BreakJava.java:78)

C:\Documents and Settings\arnej\break>java -server BreakJava
ERROR: crashed during case 10816
java.lang.NullPointerException
        at BreakJava.removeItems(BreakJava.java:53)
        at BreakJava.perform(BreakJava.java:65)
        at BreakJava.main(BreakJava.java:78)

C:\Documents and Settings\arnej\break>java -server BreakJava
ERROR: crashed during case 37440
java.lang.NullPointerException
        at BreakJava.removeItems(BreakJava.java:53)
        at BreakJava.perform(BreakJava.java:65)
        at BreakJava.main(BreakJava.java:78)

ERROR MESSAGES/STACK TRACES THAT OCCUR :
java.lang.NullPointerException
        at BreakJava.removeItems(BreakJava.java:53)
        at BreakJava.perform(BreakJava.java:65)
        at BreakJava.main(BreakJava.java:78)


REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
public class BreakJava {

    Item list;

    static class Item {
        public Item    next;
        public Item    prev;
        public boolean remove;

        Item(boolean r) { remove = r; }
    }

    private void linkIn(Item item) {
        Item head = list;
        if (head == null) {
            item.next = item;
            item.prev = item;
            list = item;
        } else {
            item.next = head;
            item.prev = head.prev;
            head.prev.next = item;
            head.prev = item;
        }
    }

    private void linkOut(Item item) {
        Item head = list;
        if (item.next == item) {
            list = null;
        } else {
            item.prev.next = item.next;
            item.next.prev = item.prev;
            if (head == item) {
                list = item.next;
            }
        }
        item.next = null;
        item.prev = null; // is this the null pointer we are seeing?
    }

    private void removeItems() {
        Item item = list;
        if (item == null) {
            return;
        }
        Item last = item.prev;
        assert(last != null);
        boolean done = false;
        while (!done) {
            // the original code "done = (item == last);" triggered an infinite loop
            // and was changed slightly in order to produce an exception instead.
            done = (item.next == last.next);
            item = item.next;
            if (item.prev.remove) {
                linkOut(item.prev);
            }
        }
    }

    public void perform(int numItems) {
        for (int i = 0; i < numItems; i++) {
            linkIn(new Item(i == 0));
        }
        removeItems();
        list = null;
    }

    static public void main(String[] args) {
        int caseCnt = 0;
        BreakJava bj = new BreakJava();
        try {
            for (;;) {
                int numItems = (++caseCnt % 2);
                if ((caseCnt % 64) == 0) {
                    numItems = 5;
                }
                bj.perform(numItems);
                if ((caseCnt % 1000000) == 0) {
                    System.out.println("successfully performed " + caseCnt + " cases");
                }
            }
        } catch (Exception e) {
            System.out.println("ERROR: crashed during case " + caseCnt);
            e.printStackTrace(System.out);
        }
    }
}

---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
using an array-based container instead of a double-linked list is a workaround in this case, but we're a bit worried about other loops using similar constructs that might trigger the same bug.

Release Regression From : 6u7
The above release value was the last known release where this 
bug was not reproducible. Since then there has been a regression.

Comments
PUBLIC COMMENTS Problem: Changes for 6470497 (part G) incorrectly removed the code for anti-dependent PHI pinned below load's 'early' block. As result the load placed below this phi into the block with lowest frequency and a wrong value is loaded. In the test case the load placed inside loop which modifies the loaded value to NULL. As result we got NULL exception. Solution: Don't place a load below anti-dependent PHI. Added the regression test modified to show the problem in latest VM (the loop body is executed less time then the code before the loop).
28-05-2009

EVALUATION http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/1851e1fb420e
28-05-2009

EVALUATION The regression introduced by changes: 6470497 (part G) C2 cleanups before larger changes which incorrectly removed the code for anti-dependent PHI pinned below load's 'early' block.
23-05-2009

SUGGESTED FIX src/share/vm/opto/gcm.cpp Fri May 22 18:20:43 2009 -0700 @@ -595,6 +595,9 @@ Block* PhaseCFG::insert_anti_dependences assert(!LCA_orig->dominates(pred_block) || early->dominates(pred_block), "early is high enough"); must_raise_LCA = true; + } else { + // anti-dependent upon PHI pinned below 'early', no edge needed + LCA = early; // but can not schedule below 'early' } } }
23-05-2009

EVALUATION Fix for 6384206 may hided the problem. The regression was introduced with next changes: 6510732: Compute consistent block frequencies But even that change may only expose a pre-existing problem. The root of the exception is GCM moved the next load inside the loop Item last = item.prev; which is wrong since during call to linkOut() item.prev = null. The comment in the test is correct - it is the null which produce the exception on the next loop iteration. 025 B2: # B3 <- B1 Freq: 3267.82 025 MOV EDI,EAX 027 MOV [ESP + #8],EAX 02b XOR EDX,EDX 02d 02d B3: # B25 B4 <- B2 B28 Loop: B3-B28 Freq: 3667.9 02d TEST EDX,EDX 02f Jne B25 P=0.471259 C=4645.000000 02f 035 B4: # B33 B5 <- B3 Freq: 1939.37 035 MOV [ESP + #0],ECX 038 MOV EBX,[EAX + #16] ! Field Test$Item.prev <<<<<<<<<<<<<< incorrectly placed inside the loop 03b MOV [ESP + #4],EBX 03f MOV EBX,[EBX + #12] ! Field Test$Item.next <<<<<<<<<<<<<< EBX == NULL 042 NullCheck EBX With 6384206 fix it is: 025 B2: # B3 <- B1 Freq: 3267.82 025 MOV ESI,[EDI + #16] ! Field Test$Item.prev <<<<<<<<<< correctly placed before the loop 028 XOR EBP,EBP 02a 02a B3: # B22 B4 <- B2 B20 Loop: B3-B20 Freq: 6534.11 02a TEST EBP,EBP 02c Jne B22 P=0.471259 C=4645.000000 02c 032 B4: # B27 B5 <- B3 Freq: 3454.85 032 MOV EAX,[ESI + #12] ! Field Test$Item.next 035 NullCheck ESI But we could be simply "lucky" since the load is placed into the lowest-fequency block. B2:Freq > B4:Freq in the failing case and B2:Freq < B4:Freq in the passing case. Note, the load has the control edge before the loop: 31 IfTrue === 28 [[ 325 37 ]] #1 !jvms: Test::removeItems @ bci:6 37 LoadP === 31 7 36 [[ 330 98 154 280 330 ]] @Test$Item+16 *, name=prev, idx=5; #Test$Item * Oop:Test$Item * !orig=[52] !jvms: Test::removeItems @ bci:11 325 Loop === 325 31 427 [[ 325 324 321 59 322 323 61 66 379 ]] !orig=[55] !jvms: Test::removeItems @ bci:35
22-05-2009

EVALUATION It seems this is a duplicate of 6384206, which is integrated in HS14b06. At least starting with this changeset the bug is fixed, but Tom should comment on this one and confirm that it's a duplicate.
21-05-2009