JDK-8174075 : Random performance dropouts during parallel execution
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 8,9
  • Priority: P3
  • Status: Closed
  • Resolution: Duplicate
  • OS: generic
  • CPU: generic
  • Submitted: 2017-02-06
  • Updated: 2017-02-08
  • Resolved: 2017-02-07
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 9
9Resolved
Related Reports
Duplicate :  
Description
FULL PRODUCT VERSION :
Java(TM) SE Runtime Environment (build 1.8.0_92)
Java(TM) SE Runtime Environment (build 1.8.0_121)

FULL OS VERSION :
Windows 10 professional Version	10.0.14393 Build 14393
Linux  2.6.32-358.11.1.el6.x86_64 #1 SMP Wed Jun 12 03:34:52 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

A DESCRIPTION OF THE PROBLEM :
When running the attached source code we experience random performance dropouts. 

Running the same code snippet on jre-7, or with "-Xint" flag or inlining the "gradient" method, fixed the problem.

Also we found out that the overall runtime of our code went up by a factor of 4 when switching from JRE-7 to JRE-8. Any explanation for this?


THE PROBLEM WAS REPRODUCIBLE WITH -Xint FLAG: No

THE PROBLEM WAS REPRODUCIBLE WITH -server FLAG: Yes

REGRESSION.  Last worked in version 7u76

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Run the attached code on JRE-8 and inspect the output.

EXPECTED VERSUS ACTUAL BEHAVIOR :
There should be no performance drop on any of the threads running in parallel. In particular the numerical computation should not influence the Logger thread in out example.
ERROR MESSAGES/STACK TRACES THAT OCCUR :
Instead we observe this output:

Start
........
-> levels took: 2.2 sec.
...........................
-> levels took: 6.6 sec.
............................................
-> levels took: 11.0 sec.
..................................
-> levels took: 8.3 sec.
..................
-> levels took: 10.7 sec.

Log-131 took: 6.3 sec., expected 250 ms.
....
-> levels took: 13.1 sec.

Log-136 took: 12.1 sec., expected 250 ms.
........................................................
Log-193 took: 1.5 sec., expected 250 ms.

-> levels took: 15.5 sec.
....
Log-198 took: 16.9 sec., expected 250 ms.
-> levels took: 17.9 sec.

............
Log-211 took: 17.3 sec., expected 250 ms.
-> levels took: 20.3 sec.

...........................................................................................
-> levels took: 22.7 sec.
.........................................................
-> levels took: 25.1 sec.
Log-360 took: 10.8 sec., expected 250 ms.
..............................................................................................................
-> levels took: 27.4 sec.


REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
/**
 * @author Marc Flesch
 */
public class MSSTest {

	public static void main(String[] args) {
		submit(new MSS(), "MSS");
		submit(new BusyLog(250), "Log");
	}

	private static void submit(Runnable r, String name) { new Thread(r, name).start();}
	private static  String dt(long t1, long t2) { return ((t2 - t1) / 100) / 10.0  + " sec."; }

	private static final class BusyLog implements Runnable {
		private final long timeout;

		BusyLog(long timeout) {
			this.timeout = timeout;
		}

		@Override
		public void run() {
			int counter = -1;
			while (true) {
				long t1 = System.currentTimeMillis();
				while(System.currentTimeMillis() - t1 < timeout) {
					Math.log(Float.NaN);
				}
				long t2 = System.currentTimeMillis();
				counter++;
				String dt = dt(t1, t2);
				if (dt.equals(dt(0, timeout))) {
					System.out.print(".");
				} else {
					System.err.println("\nLog-" + counter + " took: " + dt + ", expected " + timeout + " ms.");
				}
			}
		}
	}

	private static class MSS implements Runnable {

		public void run() {
			System.out.println("Start");
			int binCount = 2000000;
			float mean = 0.76f;
			float sigma = 0.75f;
			float[] levels = new float[binCount];

			for (int binWindow = 2; binWindow < 64; binWindow+=4) {
				int n = levels.length;
				float[] gradient = new float[n];
				float inv2BinWidth = 1f / (binWindow * binWindow);
				float mean4 = mean / 4;
				float sigma2 = 4 / (sigma * sigma);
				float ms2 = mean / (sigma * sigma);
				int window = 3 * binWindow;
				float[] exponents = new float[window + 1];
				for (int i = 0; i <= window; i++) {
					exponents[i] = (float) (i * Math.exp(-0.5 * i * i * inv2BinWidth));
				}
				long t1 = System.currentTimeMillis();
				gradient(levels, n, gradient, mean4, sigma2, ms2, window, exponents);
				long t2 = System.currentTimeMillis();
				System.out.println("\n-> levels took: " + dt(t1, t2));
			}
			System.err.println("Done.");
			System.exit(0);
		}

		private void gradient(float[] levels, int n, float[] gradient, float mean4, float sigma2, float ms2, int window, float[] exponents) {
			for (int i = 0; i < n; i++) {
				float invB;
				int d = 0;
				if (levels[i] < mean4) {
					invB = sigma2;
				} else {
					invB = ms2 / levels[i];
				}
				for (int j = i + 1; j < n; j++) {
					d++;
					float inv_i;
					if (levels[j] < mean4) {
						inv_i = sigma2;
					} else {
						inv_i = ms2 / levels[j];
					}
					float r = levels[j] - levels[i];
					float val = -0.5f * r * r;
					gradient[i] += exponents[d] * Math.exp(val * invB);
					gradient[j] -= exponents[d] * Math.exp(val * inv_i);
					if (d == window) {
						break;
					}
				}
			}
		}
	}
}

---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
If you could provide a work around for us that would be great. Thanks.


Comments
I was just curious so played with this. Didn't expect anyone else to pick it up so quickly :)
07-02-2017

Interesting that in 9 latest builds it runs much, much faster overall and pauses are much shorter when they occur. It is also no longer fixed by disabling TieredCompilation.
07-02-2017

Yes, it's JDK-6869327. I just verified that it's fixed with -XX:+UseCountedLoopSafepoints. David, please assign bugs to yourself if you are looking into them to avoid double-investigation.
07-02-2017

I'm unclear what good output for this test looks like but presumably there should not be any "Log-" lines. Using 1.8.0-b132 (JDK 8 GA) Running with -Xint seems okay. Running with -client also seems okay. Running with -server has the problem and based on slow response to ctrl-C I would suspect we are hitting a long safepoint pause. Running with -server -Xcomp delays the onset of the problem. Running with -server -XX:-TieredCompilation does not have the problem. This looks like it might be JDK-6869327.
07-02-2017

I guess it is introduced by 1.8.0-ea-b18 1.8.0-ea-b17 - Pass 1.8.0-ea-b18 - Fail
07-02-2017

This issue is reproducible in both 8 and 9. It is an regression introduced in JDK8 ea build, below are the results 7u45 - No performance dropout 8u b132 - Performance dropout 8u121 - Performance dropout 9 ea b154 - Performance dropout Below is the result == -sh-4.1$ /opt/java/jdk-9_ea-154/bin/javac MSSTest.java -sh-4.1$ /opt/java/jdk-9_ea-154/bin/java MSSTest Start . -> levels took: 0.3 sec. ... -> levels took: 0.8 sec. ...... -> levels took: 1.4 sec. ..... -> levels took: 2.0 sec. Log-15 took: 0.8 sec., expected 250 ms. ... Log-19 took: 1.2 sec., expected 250 ms. -> levels took: 2.0 sec. ............. -> levels took: 3.2 sec. .............. -> levels took: 3.6 sec. .................. -> levels took: 4.5 sec. ...................... -> levels took: 5.3 sec. ...................... -> levels took: 5.5 sec. .................. Log-127 took: 1.8 sec., expected 250 ms. -> levels took: 6.2 sec. ............................ -> levels took: 7.2 sec. ............ -> levels took: 6.2 sec. Log-168 took: 3.4 sec., expected 250 ms. ................... Log-188 took: 1.8 sec., expected 250 ms. -> levels took: 6.6 sec. ... Log-192 took: 6.2 sec., expected 250 ms. -> levels took: 7.0 sec. .................................. -> levels took: 8.7 sec. Done. ==
07-02-2017

This problem shows up with JDK 8 b18 because TieredCompilation was enabled by default (see JDK-2218997). It's not a regression. [~dholmes], the performance improvement that you are seeing starts with JDK 9 b88 and is due to JDK-8132207 which improved the performance of Math.exp() which is heavily used by the benchmark. Closing this as duplicate of JDK-6869327. The customer should use -XX:+UseCountedLoopSafepoints to avoid performance drops. But this increases overall execution time: With XX:+UseCountedLoopSafepoints: real 1m16.525s user 2m31.167s sys 0m0.389s With XX:-UseCountedLoopSafepoints: real 1m10.251s user 2m3.003s sys 0m0.543s
07-02-2017