United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
Bug ID: JDK-7013347 allow crypto functions to be called inline to enhance performance
JDK-7013347 : allow crypto functions to be called inline to enhance performance

Details
Type:
Enhancement
Submit Date:
2011-01-19
Status:
Resolved
Updated Date:
2012-03-22
Project Name:
JDK
Resolved Date:
2012-02-07
Component:
hotspot
OS:
generic
Sub-Component:
compiler
CPU:
generic
Priority:
P2
Resolution:
Fixed
Affected Versions:
solaris_11
Fixed Versions:
hs23 (b13)

Related Reports
Backport:
Backport:
Relates:
Relates:
Relates:
Relates:
Relates:

Sub Tasks

Description
Using hardware acceleration & Solaris crypto with Java has shown to be less than optimal due to java memory allocation issues and the number of layers to go through the JNI to get to the OS library to do a simple operation.  When prototyping a T3 JCE provider, which optimized the solaris side, showed no performance gain because of memory management.

While those java memory management issues may have improved, the reality is that we need to have a quicker and more direct way to get to the OS libraries.  It is the hope that by having a more direct access to the library, that we can eliminate some of the memory and other java calls that are needed use JNI and leave that work to the OS library were it's faster.  The quicker access also allows us to get better performance from smaller byte sizes of crypto/message digest operations.

This is of particular importance for the new crypto hardware acceleration in T4 and the availability of the libsoftcrypto/libmd libaries that allow quick, lightweight access.  This combination could be a great perfomance improvement for Solaris/SPARC, potentically record numbers (?).

                                    

Comments
EVALUATION

7013347: allow crypto functions to be called inline to enhance performance
Reviewed-by: kvn

This is a long one.

The synopsis of this is slightly misleading.  This doens't allow
direct calls to native routines from Java but it does attempt to
reduce the overhead of using JNI for specific use cases while still
maintaining the safety invariants that JNI provdies.  For native code
that runs in a bounded time JNI provides a function called
GetPrimtiveArrayCritical which may provide direct access to the body
of Java arrays of primitive.  In Hotspot this is accomplished by
suppressing garbage collection while these pointers are exposed to
native code.  This is accomplished with the GC_locker class which is
basically a readers/writers lock.  Note that the GC_locker doesn't
suppress safepointing, just garbage collections.  There are many
operations which require a safepoint to make forward progress, so
suppressing them indefinitely isn't acceptable.

This RFE provides is a shorthand for the use of
GetPrimtiveArrayCritical by defining an alternate native calling
convention that only allows the use of primitive or arrays of
primtive.  The native method must also be static since non-static
methods are passed the receiver as an argument and Java objects aren't
allowed.  Synchronization and exceptions aren't allowed either.  The
Java code calling these natives is fee to use all of those features so
it's not that onerous of a restriction.

The benefits of this approach are that JVM can more quickly do the
work inline that would normally be done by the
GetPrimtiveArrayCritical/ReleasePrimtiveArrayCritical function calls.
Calling back into the JVM through JNI requires synchronization with
the JVM and each upcall adds a minimum overhead to the native routine.
This helps to reduce the overhead to a more fixed cost per call.  It
also simplifies the work that the caller must do since synchronization
and exceptions aren't allowed.  For now this work is being done in the
existing native wrapper generation but with some more simplification
this could be more easily inlined directly into the caller.

The signature of the native routine follows the same name mangling as
normal JNI methods but they start with JavaCritical_ instead of Java_.
Any array arguments are unpacked into a pair of arguments, the length
followed by a pointer to the body of the array.  If the incoming array
is NULL then the body pointer is NULL and the length is 0.

Currently this is a JDK private interface while we gain some
experience with it but it will likely become a more standard
extension.  It's also an optional extension so a native library is
required to provide the normal point in addition to the alternate
entry point.

The changes consist of three parts.  The first is the lookup logic
that finds the alternate native entry point.  JNI critical natives
currently can only be found through dynamic lookup.  JNI
RegisterNatives doesn't know about these functions so there's no way
to provide the alternate entry point.

The second part is the lazy critical entry logic.  The fix for 7129164
introduced code that computed the JNI active count during
safepointing.  Now as part of that computation, if a thread is seen to
be in thread_in_native state and the nmethod on the top of stack is a
critical native wrapper, then the critical count for that thread is
incremented and the suspend flags are set so that when the nmethod
returns the native code it will call back into the runtime and do the
unlock of the critical native.

The last part are the native wrappers themselves.  When compiling a
critical native wrapper, they emit a new check of GC_locker::_needs_gc
and they call into the runtime if it's true.  This keeps them from
starting new JNI critical sections if a GC has been requested.  The
arguments are unpacked following the alternate calling convention and
the method is called as it normally would be.

On return the wrapper checks the suspend flags as it normally would
and calls back into the runtime where is might have to block and force
a GC if it's the last thread exiting the GC_locker.  This required
some slightly different handling of the final transition back to
thread_in_Java since we have to allow blocking.

The wrappers are only generated differently if they are compiling a
critical native so it shouldn't have much effect on normal execution.
The only library currently taking advantage of this is the new ucrypto
provider on Solaris.  For some crypto operations it improves
throughput by 20% or more because the crypto routines are fast enough
that the JNi overhead is significant.  It's expected that other parts
of the JDK will take advantage of it going forward and hopefully it
can be tightened up further.

Tested with new crypto provider and microbenchmark test case.  Also
ran runthese.
                                     
2012-02-06
EVALUATION

http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/0382d2b469b2
                                     
2012-02-06
EVALUATION

http://hg.openjdk.java.net/lambda/lambda/hotspot/rev/0382d2b469b2
                                     
2012-03-22



Hardware and Software, Engineered to Work Together