JDK-8046943 : JEP 246: Leverage CPU Instructions for GHASH and RSA
  • Type: JEP
  • Component: security-libs
  • Sub-Component: javax.crypto
  • Priority: P2
  • Status: Closed
  • Resolution: Delivered
  • Fix Versions: 9
  • Submitted: 2014-06-16
  • Updated: 2017-03-06
  • Resolved: 2016-06-08
Related Reports
Relates :  
Relates :  
Relates :  
Sub Tasks
JDK-8069112 :  
JDK-8069538 :  
JDK-8069539 :  
JDK-8076358 :  
JDK-8076359 :  
Description
Summary
-------

Improve the performance of GHASH and RSA cryptographic operations by
leveraging recently-introduced SPARC and Intel x64 CPU instructions.


Success Metrics
---------------

The support for AES-CBC included in JDK 8 (see [JEP 164](http://openjdk.java.net/jeps/164))
shows about an 8x improvement over the pure software-based
implementation.  Different algorithms will vary, but we should see
similar significant performance gains.


Motivation
----------

The less we use native libraries, such as PKCS#11, the fewer complicated
code and memory issues are caused by interacting with complex native
APIs.  The fewer JNI calls to native libraries, the faster the crypto.
By implementing crypto operations directly in the JVM we can control
their implementation and management through a built-in provider, thereby
providing out-of-the-box support.


Description
-----------

No existing APIs will modified or extended.

### Algorithms

The existing implementation invokes AES instructions in HotSpot when
those instructions are supported.  In addition to CBC mode there are
optimizations that help AES and CBC to work fast together.  The
instructions and optimizations replace the current SunJCE byte-code
methods.  The plan is to implement similar optimizations for GCM and RSA
which can greatly benefit from hardware assistance.  Both AES-GCM and RSA
are part of the TLS cipher suites.

GHASH, which is part of GCM, will be accelerated using `pclmulqdq` on
Intel x64 and `xmul`/`xmulhi` on SPARC.

RSA will be accelerated by using Bit Manipulation Instruction Set 2.  It is likely 
that other asymmetric algorithms will benefit from from these changes, but 
they will be measured by RSA.   SPARC instructions were not added given
their complexity and limitations of the 'montmul' and 'montsqr' instructions.
Using the native library provides complete RSA functionality without the down
side.  Additionally because RSA is a slow operation, JNI and native API layers
most likely cost little in the overall all performance picture.

### Providers

The management of algorithms is an important issue which has become more
complicated over time.  An extreme case is the default provider
configuration for Solaris.  The SunPKCS11 provider is ahead of SunJCE in
the provider list.  The SunPKCS11 provider supports all the hardware
accelerated and optimized algorithms for Solaris.  To use the JDK 8
AES-CBC support, SunJCE must be moved ahead of SunPKCS11.  For an
application that only needs AES-CBC, such as a performance test, the
other algorithms are not needed, so this works.  However, for
applications that use multiple algorithms, other algorithms will run
using unaccelerated software-based implementations instead of hardware
accelerated implementations from SunPKCS11.  Other OSes
can also have this problem when NSS (configured via the SunPKCS11
provider) is used.  

As a result, a new security property has been added to the `java.security` file, `jdk.security.provider.preferred`, to allow certain algorithm and algorithm groups to be directed to a particular provider before the ordered provider list is checked.  This property is intended for advanced users and is not set by default.  With many different versions of x86 and SPARC CPUs in current use, setting a default would likely lead to performance regressions for older systems and require continuous maintenance as new CPUs provide more support.  Additionally, existing JDK configurations such as FIPS 140 or other specialized providers could unknowingly be directed toward a different provider.  Thus, it is best for the `jdk.security.provider.preferred` property to be unset by default but let vendors and advanced users set the property to what their CPUs support.

Testing
-------

Existing Known Answer Tests (KAT) should suffice for functional testing.
There will be a significant amount of performance testing using existing
benchmarks and internal tests.

Comments
TOI no longer needed as confirmed by Sustaining
06-03-2017

Moved out due date waiting for Perf Plan to be resolved.
23-11-2015

moved due date out so test can sure to be integrated
06-11-2015

Development complete on 10/19/2015
20-10-2015

Final code analysis. As of 2/12/2015
12-02-2015

Initial findings of GCM performance on x86 after CLMUL was used. It shows a 34x performance increase over the software implementation.
05-08-2014