Duplicate :
|
|
Duplicate :
|
|
Relates :
|
|
Relates :
|
Name: gm110360 Date: 09/22/2004 FULL PRODUCT VERSION : java version "1.5.0-rc" Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0-rc-b63) Java HotSpot(TM) Server VM (build 1.5.0-rc-b63, mixed mode) ADDITIONAL OS VERSION INFORMATION : Microsoft Windows 2000 [Version 5.00.2195] EXTRA RELEVANT SYSTEM CONFIGURATION : 1.7GHz Xeon processor (dual processor but test is single threaded) A DESCRIPTION OF THE PROBLEM : This is refinement of the bug I filed earlier with an internal review ID of: 311233 Using double math just after a JNI call results in a dramatic performance slowdown (factor of 5 in the test case below) under 1.5.0rc-server. This slowdown does not occur under 1.4.2-server or any client version. I've included a simple program below that demonstrates the problem. VTune reports an extreme level of Streaming SIMD Input Assists in this case, though I don't know why this should occur STEPS TO FOLLOW TO REPRODUCE THE PROBLEM : Create a simple JNI function that simply always returns the value 1.0f and compile it into a dll. Then run the program below to time various operations. Notice that mixing double math operations and JNI calls results in a dramatic and unexplained performance loss by roughly a factor of 5! EXPECTED VERSUS ACTUAL BEHAVIOR : EXPECTED - These are the results I get when running under 1.4.2-server avg 2.4 ns total 4.70E-2 s for assign (~ 4.0 cycles) avg 2.3 ns total 4.60E-2 s for mult (~ 3.9 cycles) avg 162.5 ns total 3.25E0 s for JNI (~ 276.2 cycles) avg 175.0 ns total 3.50E0 s for JNI and mult (~ 297.5 cycles) avg 2.4 ns total 4.70E-2 s for assign (~ 4.0 cycles) avg 2.4 ns total 4.70E-2 s for mult (~ 4.0 cycles) avg 162.5 ns total 3.25E0 s for JNI (~ 276.2 cycles) avg 168.0 ns total 3.36E0 s for JNI and mult (~ 285.6 cycles) ACTUAL - These are the results I get under 1.5.0rc-server java version "1.5.0-rc" Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0-rc-b63) Java HotSpot(TM) Server VM (build 1.5.0-rc-b63, mixed mode) avg 2.4 ns total 4.70E-2 s for assign (~ 4.0 cycles) avg 2.4 ns total 4.70E-2 s for mult (~ 4.0 cycles) avg 194.5 ns total 3.89E0 s for JNI (~ 330.6 cycles) avg 892.2 ns total 1.78E1 s for JNI and mult (~ 1516.7 cycles) avg 2.4 ns total 4.70E-2 s for assign (~ 4.0 cycles) avg 2.4 ns total 4.70E-2 s for mult (~ 4.0 cycles) avg 193.0 ns total 3.86E0 s for JNI (~ 328.0 cycles) avg 890.6 ns total 1.78E1 s for JNI and mult (~ 1514.0 cycles) Note that JNI calls have gotten 1.2x more expensive and mixing JNI calls and double multiplies has gotten 5.3x more expensive under 1.5.0rc-server as compared to 1.4.2-server REPRODUCIBILITY : This bug can be reproduced always. ---------- BEGIN SOURCE ---------- import java.text.DecimalFormat; import java.util.Random; /** Test that show a dramatic slowdown for programs that attempt to use * floating point math after a JNI call. * * @author bjw Bruce Walter, Cornell Program of Computer Graphics 2004 */ public class JNIOpsTest { //target total number of repetitions of the operation public static final int opTargetRepetitions = 20000000; //size of arrays that are operated on public static final int arraySize = 10000; //number of times we need to process each array to reach total target count public static final int reps = opTargetRepetitions/arraySize; //pretty print the output numbers to make them easier to read public static final DecimalFormat decForm = new DecimalFormat("###0.0"); public static final DecimalFormat sciForm = new DecimalFormat("0.00E0"); //my processor is a 1.7GHz Xeon (actually it is a dual processor, but this test is single threaded) public static final double ghzProcSpeed = 1.7; //my processor is 1.7GHz //native function that simply always returns the value 1.0 public static native float funcStaticVoid(); static { System.load("C:/IntelCode/JNITest/JNITest/Release/JNITest.dll"); } public static void runTimingTest(TestOp op, double result[], double src[], boolean print) { long time = System.currentTimeMillis(); for(int i=0;i<reps;i++) { op.performOp(result,src); } time = System.currentTimeMillis() - time; double denom = 1000000.0/(reps*src.length); if (print) { String ps = decForm.format(time*denom); while (ps.length()<6) ps = " "+ps; ps = "avg "+ps+" ns total "+sciForm.format(time/1000.0)+" s"; while (ps.length()<32) ps += " "; ps = ps+" for "+op.toString(); while (ps.length()<50) ps += " "; System.out.println(ps+"\t(~ "+decForm.format(time*denom*ghzProcSpeed)+" cycles)"); } } public static void main(String[] args) throws InterruptedException { double src[] = new double[arraySize]; double result[] = new double[arraySize]; Random ran = new Random(5232482349538L); //set the src array to be random values between zero and one exclusive for(int i=0;i<src.length;i++) { do { src[i] = ran.nextDouble(); } while (src[i] == 0); } TestOp tests[] = { new AssignOp(), new MultOp(), new JNIOp(), new JNIMulOp()}; //warm up hotspot for(int i=0;i<tests.length;i++) { runTimingTest(tests[i],result,src,false); } //now run the real tests and print the timings for(int i=0;i<tests.length;i++) { runTimingTest(tests[i],result,src,true); } //do it again to show the timings are reasonably consistent for(int i=0;i<tests.length;i++) { runTimingTest(tests[i],result,src,true); } } public abstract static class TestOp { public abstract void performOp(double result[], double src[]); } public static class AssignOp extends TestOp { public String toString() { return "assign"; } public void performOp(double result[], double src[]) { for(int i=0;i<src.length;i++) { result[i] = src[i]; } } } public static class MultOp extends TestOp { public String toString() { return "mult"; } public void performOp(double result[], double src[]) { for(int i=0;i<src.length;i++) { result[i] = 0.143*src[i]; } } } public static class JNIOp extends TestOp { public String toString() { return "JNI"; } public void performOp(double result[], double src[]) { for(int i=0;i<src.length;i++) { JNIOpsTest.funcStaticVoid(); } } } public static class JNIMulOp extends TestOp { public String toString() { return "JNI and mult"; } public void performOp(double result[], double src[]) { for(int i=0;i<src.length;i++) { JNIOpsTest.funcStaticVoid(); result[i] = 0.143*src[i]; } } } } ---------- END SOURCE ---------- CUSTOMER SUBMITTED WORKAROUND : None that I know of other than not upgrading to 1.5.0rc and staying with 1.4.2 Release Regression From : 1.4.2 The above release value was the last known release where this bug was known to work. Since then there has been a regression. (Incident Review ID: 311434) ====================================================================== ###@###.### 10/12/04 19:40 GMT
|