Bug ID: JDK-7147733 The 'for each' loop breaks thread scheduling in mixed mode for 1.5, 1.6 and 1.7

JDK-7147733 : The 'for each' loop breaks thread scheduling in mixed mode for 1.5, 1.6 and 1.7

Type: Bug
Component: hotspot
Sub-Component: compiler
Affected Version: 8,9

Priority: P4
Status: Resolved
Resolution: Cannot Reproduce
OS: windows_7
CPU: x86

Submitted: 2012-02-22
Updated: 2019-09-13
Resolved: 2019-02-01

Versions (Unresolved/Resolved/Fixed)

The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.

Other
tbdResolved

Related Reports

Relates :

JDK-7160161 - Missed safepoint in non-Counted loop

Description

FULL PRODUCT VERSION :
java version "1.6.0_27"
Java(TM) SE Runtime Environment (build 1.6.0_27-b07)
Java HotSpot(TM) 64-Bit Server VM (build 20.2-b06, mixed mode)


java version "1.7.0_03"
Java(TM) SE Runtime Environment (build 1.7.0_03-b05)
Java HotSpot(TM) 64-Bit Server VM (build 22.1-b02, mixed mode)

java version "1.5.0_22"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_22-b03)
Java HotSpot(TM) Client VM (build 1.5.0_22-b03, mixed mode, sharing)




FULL OS VERSION :
Microsoft Windows [Version 6.1.7601]

EXTRA RELEVANT SYSTEM CONFIGURATION :
Win7 amd64
Intel Core 2 Quad Q9300


A DESCRIPTION OF THE PROBLEM :
There are two threads. Thread A performs number crunching and is MIN_PRIORITY, the other thread B sends keep-alives to stdout each 100ms and is MAX_PRIORITY.

In a cold start VM in mixed mode if thread A uses 'for each' loop then it blocks thread B from sending keep-alives until thread A finishes or calls yield().

If thread A uses ordinary 'for (int i ....)' loop instead of 'for each' loop everything seems to work fine.


THE PROBLEM WAS REPRODUCIBLE WITH -Xint FLAG: No

THE PROBLEM WAS REPRODUCIBLE WITH -server FLAG: Yes

REGRESSION.  Last worked in version 1.4.2

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
The 'Source code for an executable test case' field contains the test case.

There are two parameters:
 - slp - tells the test to wait before starting number crunching
 - iter - specifies amount of numbers to crunch

Tests runs okay (no errors) on my machine if slp > 4000 or iter < 10000 (apprx.)
Test fails (prints 'Test Failed') on my machine if slp < 3900 and iter > 20000 (apprx)

In order to reproduce, run the test provided in 'Source code" field

EXPECTED VERSUS ACTUAL BEHAVIOR :
It is expected that thread B would run in parallel with thread B as there are spare CPU cores.


Expected when fails
when slp < 3900 and iter > 30000 (apprx)
 test prints 'Test failed'
 prints stacktrace 'RuntimeException'
 there is an interval between consequent 'woken up...' longer than 200 ms (apprx 10 seconds on my machine)

Expected when passes
when slp > 4000 or iter < 10000 (apprx)
 test finishes without any errors
 all time intervals between consequent 'woken up...' are shorter than 1000 ms (usually about 100 ms)

ERROR MESSAGES/STACK TRACES THAT OCCUR :
VM did not crash. It exhibits incorrect behavior.

REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
import java.text.SimpleDateFormat;
import java.util.Calendar;
import java.util.Random;

class Calculator {
    private static Random rndGen = new Random();

	
	public void doCalc(int iter) {
		 System.out.println("calculation started");
		 
	      long start = System.currentTimeMillis();
		  Double[] portfolio = new Double[5000];

	      Random rnd = new Random();
		      
	      for (int i = 0; i < portfolio.length; i++) {
	          portfolio[i] = new Double(50000 * rnd.nextDouble());
	      }
      
	      double[] losses = new double[iter];
	      
	      for (int i = 0; i < iter; i++) {
		      for (Double crd : portfolio) {
	              if (rndGen.nextDouble() >= 1 - rndGen.nextDouble()) {
	                  losses[i] += 1 + Math.sqrt(crd);
	              } else {
	                  losses[i] -= 1 + Math.sqrt(crd);
	              }
		      }
	      }
	      
	      
	      System.out.println("calculation finished in " + (System.currentTimeMillis() - start) + " ms");
	}
}

public class TestSleep {
	
	

	/**
	 * @param args
	 */
	public static void main(String[] args) throws InterruptedException {

		// change slp to >4000 for test to run ok
		int slp = Integer.parseInt(args[0]);
		
		// tune this parameter for computation to run more than 4000 ms
		int iter = Integer.parseInt(args[1]);

		
		Thread thread = new Thread( new Runnable() {
			SimpleDateFormat fmt = new SimpleDateFormat("kk:mm:ss.SSS");

			public void run() {
				while (true) {
					
					long time = System.currentTimeMillis();
					
					try {
						Thread.sleep(100);
 						
					} catch (InterruptedException e) {
						e.printStackTrace();
						return;
					}
										
					String now = fmt.format(Calendar.getInstance().getTime());
					System.out.println("[" + now + "] woken up ok...");
					
					if (System.currentTimeMillis() - time > 1000) {
						System.err.println("Test failed");
						throw new RuntimeException("Test failed");
					}
				}
			}
		});

		Thread.currentThread().setPriority(Thread.MIN_PRIORITY);
					
		thread.setPriority(Thread.MAX_PRIORITY);
		thread.setDaemon(true);
		thread.start();
		
		Thread.sleep(slp);
		
		new Calculator().doCalc(iter);
		
		
		// let the 'woken up' thread print some output
		Thread.sleep(1000);
	}

}

---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
Workarounds:
1) don't use 'for each' loops
2) use interpreted mode only
3) run the thread that prints keep-alives for more than 4000 ms before starting intensive computations
4) perform some computations in the thread that prints keep-alives before starting another thread for computation
5) use JRockit VM jdk1.6.0_29-R28.2.2-4.1.0, it works fine.

Actually, 1) cannot be done because there are 3rd party libraries, 2) is very slow, 3) and 4) highly depend on the application and other threads in VM, 5) is not an option since JRockit is a commercial product expected to be discontinued.

Comments

I wasn't able to reproduce it either with newer or older JDK versions (recent 6u, 7u, 12). So, closing the bug.
01-02-2019
Hi Vladimir, if you have some time, can you please check if this is still an issue or not? Thank you! Best regards, Zoltan
21-02-2017
Tom Rodriguez previously wrote: Missing polls in the loop would only matter if we were trying to safepoint for some reason. The test itself doesn't cause any safe points but GuaranteedSafepointInterval will force one every second. It likely that we eliminate the poll in the foreach loop but there are other polls in Random.next that we can't eliminate that should ensure that every pass through the loop has a poll. All their reports are against windows, so maybe that's a factor somehow? Vladimir writes: C2 inlinines everything, no call in the code. It is OSR compilation for inner loop "for (Double crd : portfolio)". But all loops are not counted so all have safepoints. Also bug reported failure with Client VM (if it is true): Java HotSpot(TM) Client VM (build 1.5.0_22-b03, mixed mode, sharing)
25-05-2016
-XX:+SafepointTimeout might show something if you set the timeout interval correctly. There are a number of things that could give insight here if we could reproduce it - such as a pstack whilst "hung". But without being able to reproduce ...
25-05-2016
Low priority issue that has been there since 2012. Deferring to JDK 10.
25-05-2016
ILW=Bad behaviour, Low likelihood, No workaroud=MLH=P4
24-03-2014
Looks like a compiler inlining report.
16-01-2014
PUBLIC COMMENTS FreqInlineSize sets the maximum size of frequently executed methods to be inlined. By setting it to zero, no frequently executed methods will be inlined. It would be interesting to see a trace of which frequently executed methods gets inlined on a passing and failing run. However TraceFrequencyInlining is only available in debug builds. The problem may not be directly related to inlining, it may simply be that by disabling specific inlining decisions that the timing changes sufficiently to avoid the problem.
01-03-2012
PUBLIC COMMENTS The submitter has provided additional information: - nanoTime makes no difference. On the real failure the cessation of the heartbeat messages is noticeable to an operator. So clock issues are not the cause - They investigated the issue further and found that setting -XX:FreqInlineSize=0 makes the test pass. - When the heartbeat stops VisualVm can not attach to the VM This latter information would indicate a potential JIT issue. The inability to attach via VisualVM could indicate the system can't reach a safepoint, which in turn indicates a thread may be executing a loop in which there is no safepoint poll. However this is just speculation. I can not reproduce the issue on my system with SE 7 b147 32-bit client/server; SE 8 b23 32-bit client/server and SE8 b02 64-bit server. Typical Usage: java TestSleep 0 30000 Typical output shows 100ms intervals as expected: ... [14:58:10.685] woken up ok... [14:58:10.785] woken up ok... [14:58:10.885] woken up ok... [14:58:10.985] woken up ok... calculation finished in 18059 ms [14:58:11.085] woken up ok... Even tried running multiple instances at once - still unable to reproduce.
01-03-2012
PUBLIC COMMENTS The test program reports failure if a Thread.sleep(100) appears to take more then 1000ms as measured by System.currentTimeMillis(). But the measurement is taken across the following code: long time = System.currentTimeMillis(); Thread.sleep(100); String now = fmt.format(Calendar.getInstance().getTime()); System.out.println("[" + now + "] woken up ok..."); if (System.currentTimeMillis() - time > 1000) { System.err.println("Test failed"); throw new RuntimeException("Test failed"); } Could the submitter please test with currentTimeMillis() replaced by System.nanoTime(). A possible explanation here is an unstable time source causing false failures to be reported. The submitter is expecting that the thread will always get a chance to execute such that the code never reports a delay of more than 1000. But there are numerous potential interactions that could invalidate that assumption, including external load. In terms of the test program itself JIT compilation and GC are potential sources of interference - though verbose GC reporting does not indicate that GC occurs. Certainly there is nothing specific to the for-each construct that should cause such anomalies - particularly when applied to an array! But a change in construct can certainly affect the relative timing. It is possible that there is some interaction with JIT compilation that causes the unexpected delay - for example if the JIT decides to compile code internal to fmt.format(Calendar.getInstance().getTime()); then that might delay the thread long enough to trigger the failure report. However I have been unable to reproduce the problem on my quad-core i5 Windows 7 system. I've also traced PrintCompilation and experimented with the compile threshold but to no avail.
25-02-2012