Duplicate :
|
J2SE Version (please include all output from java -version flag): java version "1.5.0_01-ea" Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_01-ea-b04) Java HotSpot(TM) Server VM (build 1.5.0_01-ea-b04, mixed mode) Does this problem occur on J2SE 1.3, 1.4.x or 1.5? Yes / No (pick one) Occurs on 1.5.0 and all later releases Operating System Configuration Information (be specific): Microsoft Windows 2000 [Version 5.00.2195] Hardware Configuration Information (be specific): 1.7GHz Xeon processor (dual processor but test is single threaded) Bug Description: I believe that bug 5105765 was closed prematurely for the following reasons: 1) Although ABI is ambiguous on this point, I believe there are strong arguments for considering native code that changes the SSE control flags (in the mxcsr register) to as being buggy or erroneous rather than as exhibiting acceptable behavior. 2) Even if changing the SSE control flags is defined as allowed behavior, there is a cheaper solution to guard against it than what is currently implemented in Java 1.5.0 -server on x86 processors. Let me start with the second point first. Setting the SSE control flags in the mxcsr register is a serializing and hence very expensive operation. Moreover, in most cases it is unnecessary as most native code is well behaved and does not change the SSE control flags. Thus in most cases, it is cheaper to check to see if the SSE control flags have been changed first and then only resetting them if actually necessary. While there is still some cost for this check since reading the mxcsr register is somewhat expensive, it is much less expensive than actually setting the mxcsr register as demonstrated in the code included at the end of this bug report. Thus if you feel that native code should be allowed to change the SSE control flags, you can reduce the cost of correcting such changes by only changing the mxcsr register if it was actually changed. Now back to the first point. The official documents are unfortunately somewhat ambiguous about whether a procedure or function should be allowed to change the SSE control flags (ie should the mxcsr be treated as volatile or caller-saved). In the IA-32 Intel Architecture Software Developers Manual Volume 1, section 11.6.10.2 describes how to save SSE state across a procedure call including both the XMM and MXCSR registers using the appropriate instructions if required. However the next section, 11.6.10.3, titled "Caller-Save Requirement for Procedure and Function Calls" requires only saving the XMM registers (and does not mention the MXCSR register). It explicitly says that "The primary reason for using the caller-save convention [for the XMM registers] is to prevent performance degradation". On page 5-21 of the Intel Software Optimization manual, it states that "Frequent changes to the MXCSR register should be avoided since there is a penalty associated with writing this register" and on page 2-59 it makes clear that writing the mxcsr register is an expensive serializing instruction that is expected to be used infrequently. From this evidence, I think we can reasonably conclude that the caller-save requirement was meant to apply to the XMM registers only and not the MXCSR register. For further empirical evidence, we can see that this is precisely how other compilers treat the MXCSR register. Neither the Intel nor the Microsoft c++ compilers will automatically insert a save and restore of the mxcsr register around a procedure call even when it is impossible for them to prove that mxcsr register has not changed (for example when calling through a function pointer). However they will both automatically save and restore the XMM registers before and after a procedure call. Also note that by convention the x87 floating point control word is not treated as volatile and is not saved and restored around a procedure call by any compiler I know of including Java. It seems very reasonable that the SSE control register, mxcsr, should be treated analogously to the x87 control register, fcw. In Agner Fog's survey of the calling conventions used by various C++ compilers and operating systems for x86 systems, he states that "The floating point control word and bit 6-15 of the MXCSR register must be saved and restored by any procedure that changes them, except for procedures that have the purpose of changing these". http://www.agner.org/assem/calling_conventions.pdf In other words, the mxcsr should be treated as callee-saved (or non-volatile) unless the programmer explicitly states otherwise. Thus I hope that I have convinced you that if the native code invoked by a JNI call does change the SSE control register, mxcsr, that this should be treated as a bug (like writing data to random memory locations). However it is a bug that can be easily detected by either checking the mxcsr register after each JNI call or preferably by only checking when a command-line flag such as the -Xcheck:jni flag is set. This would then not impose any extra performance penalty on Java programs using non-buggy native code, while still allowing the error to be detected when desired. Steps to Reproduce (be specific): REPRODUCIBILITY : This bug can be reproduced always. STEPS TO FOLLOW TO REPRODUCE THE PROBLEM : The following program demonstrates the problem. That JNI calls have become more expensive and in some cases by a factor of 5x. I also included code to demonstrate that testing to see if the mxcsr register has actually changed is cheaper than the current behavior of always setting it regardless of its current value. EXPECTED VERSUS ACTUAL BEHAVIOR : EXPECTED - java version "1.4.2_03" Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_03-b02) Java HotSpot(TM) Server VM (build 1.4.2_03-b02, mixed mode) mxcsr 8064 avg 2.4 ns total 4.70E-2 s for assign (~ 4.0 cycles) avg 2.4 ns total 4.70E-2 s for mult (~ 4.0 cycles) avg 148.4 ns total 2.97E0 s for JNI (~ 252.3 cycles) avg 162.5 ns total 3.25E0 s for JNI and mult (~ 276.2 cycles) avg 76.6 ns total 1.53E0 s for Save&Restore MXCSR (~ 130.2 cycles) avg 14.9 ns total 2.97E-1 s for Save&Test MXCSR (~ 25.2 cycles) avg 2.3 ns total 4.60E-2 s for assign (~ 3.9 cycles) avg 2.4 ns total 4.70E-2 s for mult (~ 4.0 cycles) avg 148.5 ns total 2.97E0 s for JNI (~ 252.4 cycles) avg 161.0 ns total 3.22E0 s for JNI and mult (~ 273.6 cycles) avg 75.8 ns total 1.52E0 s for Save&Restore MXCSR (~ 128.9 cycles) avg 14.9 ns total 2.97E-1 s for Save&Test MXCSR (~ 25.2 cycles) ACTUAL - java version "1.5.0_01-ea" Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_01-ea-b04) Java HotSpot(TM) Server VM (build 1.5.0_01-ea-b04, mixed mode) mxcsr 8064 avg 2.3 ns total 4.60E-2 s for assign (~ 3.9 cycles) avg 2.4 ns total 4.70E-2 s for mult (~ 4.0 cycles) avg 193.8 ns total 3.88E0 s for JNI (~ 329.4 cycles) avg 887.5 ns total 1.78E1 s for JNI and mult (~ 1508.8 cycles) avg 75.8 ns total 1.52E0 s for Save&Restore MXCSR (~ 128.9 cycles) avg 14.9 ns total 2.97E-1 s for Save&Test MXCSR (~ 25.2 cycles) avg 2.4 ns total 4.70E-2 s for assign (~ 4.0 cycles) avg 2.3 ns total 4.60E-2 s for mult (~ 3.9 cycles) avg 194.6 ns total 3.89E0 s for JNI (~ 330.7 cycles) avg 889.8 ns total 1.78E1 s for JNI and mult (~ 1512.7 cycles) avg 76.6 ns total 1.53E0 s for Save&Restore MXCSR (~ 130.1 cycles) avg 14.9 ns total 2.97E-1 s for Save&Test MXCSR (~ 25.2 cycles) CUSTOMER SUBMITTED WORKAROUND : None that are good. This bug causes at least a 10% slowdown in our real-world large rendering application and will affect any Java program that uses the server JVM and makes lots of JNI calls (for example, programs using JOGL to access openGL frequently). One can use the client JVM or disable the use of SSE but these cause even larger slowdowns in our application than this bug does, and thus are not attractive alternatives. We will stick with Java 1.4.2 until these issues resolved. Include test programs - JNIOpsTestv2.java for java code and JNIOpsTestv2.c for c code programs. ###@###.### 11/4/04 20:24 GMT