JDK-7024475 : loop doesn't terminate when compiled
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: hs21
  • Priority: P3
  • Status: Closed
  • Resolution: Fixed
  • OS: solaris_10
  • CPU: x86
  • Submitted: 2011-03-04
  • Updated: 2022-09-26
  • Resolved: 2011-04-25
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 7 Other
7Fixed hs21Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Description
The following test periodically hangs when run with -server.  If you run a debug might you might need to increase the time it was before it considers it to be hung.  I'm still trying to narrow which release it appeared it.  6u14 is currently the first release where I've seen it hang.

import java.net.*;
import org.jsuffixarrays.*;

public class test extends Thread {
    public static void main(String[] args) throws Exception {
        int hung = 0;
        for (int i = 0; i < 10000; i++) {
            URLClassLoader apploader = (URLClassLoader)test.class.getClassLoader();
            URLClassLoader ucl = new URLClassLoader(apploader.getURLs(), apploader.getParent());
            Class c = ucl.loadClass("test");
            Thread t = (Thread)c.newInstance();
            t.start();
            for (int s = 0; s < 10; s++) {
                Thread.sleep(500);
                if (!t.isAlive()) {
                    break;
                }
            }
            if (t.isAlive()) {
                hung++;
            }
            System.out.println("ok " + (i + 1 - hung) + " hung " + hung);
        }
    }
    public void run() {
        DivSufSortTest t = new DivSufSortTest();
        t.setupForConstraints();
        t.invariantsOnRandomLargeAlphabet();
        t.sameResultWithArraySlice();
        t.invariantsOnRandomSmallAlphabet();
    }
}
Hi Tom.

Apologies for not providing a broader context, it follows.

> Is it hanging, as in making no progress, or not terminating?  

It is hanging as in: it consumes 100% of a single core, never terminates (when executed without debugging), in the debugging mode it does not terminate until a JVM breakpoint is set inside the loop below, then it breaks out to the debugger and you can step-by-step execute the code or re-run it and, magically, it will terminate.

> Does it terminate if given a lot more time?  It's not GC bound?  By debugger do you mean gdb or a Java debugger?

No, it never terminates... at least not in a reasonable time. Normally this loop ends in 2-3 seconds and when hung it does not terminate after 30 minutes or so (the longest we waited to kill it on the build server).

To me It looks like it JITs into something that causes an endless loop. I also checked with jrockit and J9 and these never hung, so it's probably hotspot.

Talking about the debugger, that was a Java debugger, I didn't inspect JIT dumps of what this method compiles to yet, didn't have the time and wanted to check if it's a known issue first. 

I'll try to reproduce it with JIT code dumps, I'll let you know if I succeed.

Dawid

On Thu, Mar 3, 2011 at 9:37 PM, Tom Rodriguez <###@###.###> wrote:

On Mar 3, 2011, at 4:39 AM, Dawid Weiss wrote:

> Hi. We see an infrequent, but very annoying VM hangups in purely
> computational code, as in here:
>
> http://builds.carrot2.org/build/result/viewBuildResultsFailedTests.action?buildKey=JSA-JSAHEAD-JOB1&buildNumber=8
>
> The loop in question is like this:
>
>            for (c0 = ALPHABET_SIZE - 2, j = m; 0 < j; --c0)
>            {
>                for (c1 = ALPHABET_SIZE - 1; c0 < c1; j = i, --c1)
>                {
>                    i = bucket_B[(c0) * ALPHABET_SIZE + (c1)];
>                    if (1 < (j - i))
>                    {
>                        ssSort(PAb, i, j, buf, bufsize, 2, n, SA[i] == (m - 1));
>                    }
>                }
>            }
>
> Unfortunately I cannot provide an always-halting example, but the bug
> seems to be JVM-related because:
>
> 1) once the VM hangs in debugging mode, breaking out to the debugger
> and stepping through the code terminates normally (so a deopt. I
> assume),

Is it hanging, as in making no progress, or not terminating?  Does it terminate if given a lot more time?  It's not GC bound?  By debugger do you mean gdb or a Java debugger?

tom

>
> 2) the freeze is non-deterministic, while the test is (regardless of
> the 'randomness', it always starts from the same seed),
>
> 3) we could reproduce this occasionally under different VMs and
> different OSs (64-bit linux, 64-bit Windows).
>
> Has there been any bug in the JIT that might be causing this? Thanks
> for pointers if you recognize an evil code pattern above.
>
> Dawid
>
> P.S. The above code is a translation from another person's C snippet,
> so I can't easily explain why the loop is built this way or why c0 is
> passed between the inner and outer loop :).

Comments
EVALUATION 7024475: loop doesn't terminate when compiled Reviewed-by: kvn The code which replaces an empty loop with the final computed value only works correctly if the the loop is guaranteed to execute once since it replaces the conditionally executed loop body with a value. Loop transformations like peeling generally result in zero trip guards being generated but they aren't required to be there. The fix is to make sure we have a guard by peeling the loop first. This results in a small redundant test in some cases. I added a check for obvious ones, which occurred in scimark SOR. Main and post loops are constructed with zero trip guards so I assume one exists in those cases. Tested with new test case, original failing program, refworkload and full CTW. I also improved a few debugging features. Node::find could get into an infinite loop walking debug_orig, so it should be check for a cycle. I made some other efficiency improvements before I realized that the problem was a cycle. I went ahead and kept those changes. IdealGraphPrinter should print out a few more debug fields and should dump the block structure if it exists. I changed the printing after beautify loops so it only happens once instead of after each recursive call. I added an OSROnlyBCI flag to control which OSR bcis can be compiled. I changed the assert in loop verification to allow verifying when the igvn worklist isn't empty.
28-03-2011

EVALUATION http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/1927db75dd85
27-03-2011

EVALUATION Note that this test case fails with every 1.6.0 release.
24-03-2011

EVALUATION This appears to be a long standing issue with IdealLoopTree::policy_do_remove_empty_loop. That codes assumes the counted loop has a zero trip guard and will incorrectly change value produced by the loop if it does not. Most normally written loops will get a zero trip guard as a result of peeling. In the case from the bug report the nested loop with complex bounds and an OSR entry point creates a really unusual loop shape. The RPO parsing changes allowed us to optimize it slightly better too. With help from Vladimir I was able to build a test case that failed for normal compiles. public class Test { static int i; static int x1; static int[] bucket_B; static void test(Test test, int i, int c0, int j, int c1) { for (;;) { if (c1 > c0) { if (c0 > 253) { throw new InternalError("c0 = " + c0); } int index = c0 * 256 + c1; if (index == -1) return; i = bucket_B[index]; if (1 < j - i && test != null) x1 = 0; j = i; c1--; } else { c0--; if (j <= 0) break; c1 = 255; } } } public static void main(String args[]) { Test t = new Test(); bucket_B = new int[256*256]; for (int i = 1; i < 256*256; i++) { bucket_B[i] = 1; } for (int n = 0; n < 100000; n++) { test(t, 2, 85, 1, 134); } } } The fix is most likely to peel the loop before removing it to ensure it has a guard though in the common case this introduces a little redundancy.
24-03-2011

EVALUATION The seems to appear with the RPO parsing changes 6384206. I don't think it's the changes themselves but RPO parsing certainly exposed other issues when they went in.
07-03-2011