JDK-6190938 : JNI calls become more expensive by a factor of 5x and causes application at least a 10% slowdown
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 5.0u1
  • Priority: P3
  • Status: Closed
  • Resolution: Duplicate
  • OS: windows_2000
  • CPU: x86
  • Submitted: 2004-11-04
  • Updated: 2010-10-07
  • Resolved: 2004-11-05
Related Reports
Duplicate :  
Description
J2SE Version (please include all output from java -version flag):
  java version "1.5.0_01-ea"
  Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_01-ea-b04)
  Java HotSpot(TM) Server VM (build 1.5.0_01-ea-b04, mixed mode)

Does this problem occur on J2SE 1.3, 1.4.x or 1.5?  Yes / No (pick one)
  Occurs on 1.5.0 and all later releases

Operating System Configuration Information (be specific):
  Microsoft Windows 2000 [Version 5.00.2195]

Hardware Configuration Information (be specific):
  1.7GHz Xeon processor (dual processor but test is single threaded)

Bug Description:

I believe that bug 5105765 was closed prematurely for the following reasons:
   1) Although ABI is ambiguous on this point, I believe there are strong
      arguments for considering native code that changes the SSE control 
      flags (in the mxcsr register) to as being buggy or erroneous rather 
      than as exhibiting acceptable behavior.

   2) Even if changing the SSE control flags is defined as allowed behavior,
      there is a cheaper solution to guard against it than what is currently
      implemented in Java 1.5.0 -server on x86 processors.

Let me start with the second point first.  Setting the SSE control 
flags in the mxcsr register is a serializing and hence very expensive
operation. Moreover, in most cases it is unnecessary as most native code 
is well behaved and does not change the SSE control flags.  Thus in 
most cases, it is cheaper to check to see if the SSE control flags have 
been changed first and then only resetting them if actually necessary.  
While there is still some cost for this check since reading the mxcsr 
register is somewhat expensive, it is much less expensive than actually 
setting the mxcsr register as demonstrated in the code included at the end 
of this bug report.  Thus if you feel that native code should be allowed 
to change the SSE control flags, you can reduce the cost of correcting such
changes by only changing the mxcsr register if it was actually changed.

Now back to the first point.  The official documents are unfortunately
somewhat ambiguous about whether a procedure or function should be 
allowed to change the SSE control flags (ie should the mxcsr be treated 
as volatile or caller-saved).   In the IA-32 Intel Architecture Software
Developers Manual Volume 1, section 11.6.10.2 describes how to save SSE
state across a procedure call including both the XMM and MXCSR registers
using the appropriate instructions if required.  However the next section,
11.6.10.3, titled "Caller-Save Requirement for Procedure and Function 
Calls" requires only saving the XMM registers (and does not mention the 
MXCSR register).  It explicitly says that "The primary reason for using the
caller-save convention [for the XMM registers] is to prevent performance
degradation". On page 5-21 of the Intel Software Optimization manual, it
states that "Frequent changes to the MXCSR register should be avoided 
since there is a penalty associated with writing this register" and on 
page 2-59 it makes clear that writing the mxcsr register is an expensive 
serializing instruction that is expected to be used infrequently.  From 
this evidence, I think we can reasonably conclude that the caller-save
requirement was meant to apply to the XMM registers only and not the MXCSR
register.

For further empirical evidence, we can see that this is precisely how 
other compilers treat the MXCSR register.  Neither the Intel nor the 
Microsoft c++ compilers will automatically insert a save and restore of 
the mxcsr register around a procedure call even when it is impossible for 
them to prove that mxcsr register has not changed (for example when calling
through a function pointer).  However they will both automatically save and
restore the XMM registers before and after a procedure call.  Also note that 
by convention the x87 floating point control word is not treated as volatile
and is not saved and restored around a procedure call by any compiler I know 
of including Java.  It seems very reasonable that the SSE control register,
mxcsr, should be treated analogously to the x87 control register, fcw.

In Agner Fog's survey of the calling conventions used by various C++
compilers and operating systems for x86 systems, he states that "The
floating point control word and bit 6-15 of the MXCSR register must be
saved and restored by any procedure that changes them, except for
procedures that have the purpose of changing these".
http://www.agner.org/assem/calling_conventions.pdf
In other words, the mxcsr should be treated as callee-saved (or
non-volatile) unless the programmer explicitly states otherwise.

Thus I hope that I have convinced you that if the native code invoked by
a JNI call does change the SSE control register, mxcsr, that this should
be treated as a bug (like writing data to random memory locations).
However it is a bug that can be easily detected by either checking the
mxcsr register after each JNI call or preferably by only checking
when a command-line flag such as the -Xcheck:jni flag is set.  This
would then not impose any extra performance penalty on Java programs
using non-buggy native code, while still allowing the error to be
detected when desired.


Steps to Reproduce (be specific):


REPRODUCIBILITY :
   This bug can be reproduced always.

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :

The following program demonstrates the problem.  That JNI calls have
become more expensive and in some cases by a factor of 5x.  I also
included code to demonstrate that testing to see if the mxcsr register
has actually changed is cheaper than the current behavior of always
setting it regardless of its current value.


EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -

java version "1.4.2_03"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_03-b02)
Java HotSpot(TM) Server VM (build 1.4.2_03-b02, mixed mode)
mxcsr 8064
avg    2.4 ns   total 4.70E-2 s  for assign             (~ 4.0 cycles)
avg    2.4 ns   total 4.70E-2 s  for mult               (~ 4.0 cycles)
avg  148.4 ns   total 2.97E0 s   for JNI                (~ 252.3 cycles)
avg  162.5 ns   total 3.25E0 s   for JNI and mult       (~ 276.2 cycles)
avg   76.6 ns   total 1.53E0 s   for Save&Restore MXCSR (~ 130.2 cycles)
avg   14.9 ns   total 2.97E-1 s  for Save&Test MXCSR    (~ 25.2 cycles)
avg    2.3 ns   total 4.60E-2 s  for assign             (~ 3.9 cycles)
avg    2.4 ns   total 4.70E-2 s  for mult               (~ 4.0 cycles)
avg  148.5 ns   total 2.97E0 s   for JNI                (~ 252.4 cycles)
avg  161.0 ns   total 3.22E0 s   for JNI and mult       (~ 273.6 cycles)
avg   75.8 ns   total 1.52E0 s   for Save&Restore MXCSR (~ 128.9 cycles)
avg   14.9 ns   total 2.97E-1 s  for Save&Test MXCSR    (~ 25.2 cycles)

ACTUAL -

java version "1.5.0_01-ea"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_01-ea-b04)
Java HotSpot(TM) Server VM (build 1.5.0_01-ea-b04, mixed mode)
mxcsr 8064
avg    2.3 ns   total 4.60E-2 s  for assign             (~ 3.9 cycles)
avg    2.4 ns   total 4.70E-2 s  for mult               (~ 4.0 cycles)
avg  193.8 ns   total 3.88E0 s   for JNI                (~ 329.4 cycles)
avg  887.5 ns   total 1.78E1 s   for JNI and mult       (~ 1508.8 
cycles)
avg   75.8 ns   total 1.52E0 s   for Save&Restore MXCSR (~ 128.9 cycles)
avg   14.9 ns   total 2.97E-1 s  for Save&Test MXCSR    (~ 25.2 cycles)
avg    2.4 ns   total 4.70E-2 s  for assign             (~ 4.0 cycles)
avg    2.3 ns   total 4.60E-2 s  for mult               (~ 3.9 cycles)
avg  194.6 ns   total 3.89E0 s   for JNI                (~ 330.7 cycles)
avg  889.8 ns   total 1.78E1 s   for JNI and mult       (~ 1512.7 
cycles)
avg   76.6 ns   total 1.53E0 s   for Save&Restore MXCSR (~ 130.1 cycles)
avg   14.9 ns   total 2.97E-1 s  for Save&Test MXCSR    (~ 25.2 cycles)

CUSTOMER SUBMITTED WORKAROUND :
None that are good.  This bug causes at least a 10% slowdown in our 
real-world large rendering application and will affect any Java program 
that uses the server JVM and makes lots of JNI calls (for example, programs
using JOGL to access openGL frequently).

One can use the client JVM or disable the use of SSE but these cause 
even larger slowdowns in our application than this bug does, and thus are 
not attractive alternatives.  We will stick with Java 1.4.2 until these 
issues resolved.


Include test programs - JNIOpsTestv2.java for java code and JNIOpsTestv2.c for
c code programs.
###@###.### 11/4/04 20:24 GMT