Bug ID: JDK-7013347 allow crypto functions to be called inline to enhance performance

Type: Enhancement
Component: hotspot
Sub-Component: compiler
Affected Version: solaris_11

Priority: P2
Status: Resolved
Resolution: Fixed
OS: generic
CPU: generic

Submitted: 2011-01-19
Updated: 2019-11-08
Resolved: 2012-02-07

JDK 7	JDK 8	Other
7u4Fixed	8Fixed	hs23Fixed

Using hardware acceleration & Solaris crypto with Java has shown to be less than optimal due to java memory allocation issues and the number of layers to go through the JNI to get to the OS library to do a simple operation.  When prototyping a T3 JCE provider, which optimized the solaris side, showed no performance gain because of memory management.

While those java memory management issues may have improved, the reality is that we need to have a quicker and more direct way to get to the OS libraries.  It is the hope that by having a more direct access to the library, that we can eliminate some of the memory and other java calls that are needed use JNI and leave that work to the OS library were it's faster.  The quicker access also allows us to get better performance from smaller byte sizes of crypto/message digest operations.

This is of particular importance for the new crypto hardware acceleration in T4 and the availability of the libsoftcrypto/libmd libaries that allow quick, lightweight access.  This combination could be a great perfomance improvement for Solaris/SPARC, potentically record numbers (?).

EVALUATION http://hg.openjdk.java.net/lambda/lambda/hotspot/rev/0382d2b469b2

22-03-2012

EVALUATION http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/0382d2b469b2

06-02-2012

EVALUATION 7013347: allow crypto functions to be called inline to enhance performance Reviewed-by: kvn This is a long one. The synopsis of this is slightly misleading. This doens't allow direct calls to native routines from Java but it does attempt to reduce the overhead of using JNI for specific use cases while still maintaining the safety invariants that JNI provdies. For native code that runs in a bounded time JNI provides a function called GetPrimtiveArrayCritical which may provide direct access to the body of Java arrays of primitive. In Hotspot this is accomplished by suppressing garbage collection while these pointers are exposed to native code. This is accomplished with the GC_locker class which is basically a readers/writers lock. Note that the GC_locker doesn't suppress safepointing, just garbage collections. There are many operations which require a safepoint to make forward progress, so suppressing them indefinitely isn't acceptable. This RFE provides is a shorthand for the use of GetPrimtiveArrayCritical by defining an alternate native calling convention that only allows the use of primitive or arrays of primtive. The native method must also be static since non-static methods are passed the receiver as an argument and Java objects aren't allowed. Synchronization and exceptions aren't allowed either. The Java code calling these natives is fee to use all of those features so it's not that onerous of a restriction. The benefits of this approach are that JVM can more quickly do the work inline that would normally be done by the GetPrimtiveArrayCritical/ReleasePrimtiveArrayCritical function calls. Calling back into the JVM through JNI requires synchronization with the JVM and each upcall adds a minimum overhead to the native routine. This helps to reduce the overhead to a more fixed cost per call. It also simplifies the work that the caller must do since synchronization and exceptions aren't allowed. For now this work is being done in the existing native wrapper generation but with some more simplification this could be more easily inlined directly into the caller. The signature of the native routine follows the same name mangling as normal JNI methods but they start with JavaCritical_ instead of Java_. Any array arguments are unpacked into a pair of arguments, the length followed by a pointer to the body of the array. If the incoming array is NULL then the body pointer is NULL and the length is 0. Currently this is a JDK private interface while we gain some experience with it but it will likely become a more standard extension. It's also an optional extension so a native library is required to provide the normal point in addition to the alternate entry point. The changes consist of three parts. The first is the lookup logic that finds the alternate native entry point. JNI critical natives currently can only be found through dynamic lookup. JNI RegisterNatives doesn't know about these functions so there's no way to provide the alternate entry point. The second part is the lazy critical entry logic. The fix for 7129164 introduced code that computed the JNI active count during safepointing. Now as part of that computation, if a thread is seen to be in thread_in_native state and the nmethod on the top of stack is a critical native wrapper, then the critical count for that thread is incremented and the suspend flags are set so that when the nmethod returns the native code it will call back into the runtime and do the unlock of the critical native. The last part are the native wrappers themselves. When compiling a critical native wrapper, they emit a new check of GC_locker::_needs_gc and they call into the runtime if it's true. This keeps them from starting new JNI critical sections if a GC has been requested. The arguments are unpacked following the alternate calling convention and the method is called as it normally would be. On return the wrapper checks the suspend flags as it normally would and calls back into the runtime where is might have to block and force a GC if it's the last thread exiting the GC_locker. This required some slightly different handling of the final transition back to thread_in_Java since we have to allow blocking. The wrappers are only generated differently if they are compiling a critical native so it shouldn't have much effect on normal execution. The only library currently taking advantage of this is the new ucrypto provider on Solaris. For some crypto operations it improves throughput by 20% or more because the crypto routines are fast enough that the JNi overhead is significant. It's expected that other parts of the JDK will take advantage of it going forward and hopefully it can be tightened up further. Tested with new crypto provider and microbenchmark test case. Also ran runthese.

06-02-2012

Relates :	JDK-8191360 - Lookup of critical JNI method causes duplicate library loading with leaking handler
Relates :	JDK-8167408 - Invalid critical JNI function lookup
Relates :	JDK-7145024 - Crashes in ucrypto related to C2
Relates :	JDK-8233343 - Deprecate -XX:+CriticalJNINatives flag which implements JavaCritical native functions
Relates :	JDK-7150051 - incorrect oopmap in critical native
Relates :	JDK-7088989 - Improve the performance for T4 by utilizing the newly provided crypto APIs
Relates :	JDK-7144405 - JumbleGC002 assert(m->offset() == pc_offset) failed: oopmap not found