Please review the following webrev which adds intrinsic support to
allow some of the com/sun/crypto/provider methods to use AES
instructions when a processor supports such instructions.
Modern x86 processors have AES instructions to accelerate AES
encryption and decryption but Hotspot does not have a way to
generate such instructions. There is a way to hook in a native
crypto library using PKCS11 and there are a few native libraries
that support hardware AES instructions. However, these native
* do not scale well with multiple threads
* are not supported on all platforms, for instance Hotspot does
not have PKCS11 support on 64-bit Windows.
* can be confusing to configure.
Since this webrev adds intrinsic support for the default
com/sun/crypto/provider classes, they are supported on all platforms
and there is no additional configuration required. Measurements have
shown that they scale very well will multiple threads.
The rest of this mail describes the scope of the intrinsics and
summarizes the source file changes.
-- Tom Deneau
Scope of the Intrinsics
When creating a cipher the application specifies a "transformation"
consisting of "algorithm/mode/padding". For more details see
* These intrinsics kick in only when the algorithm part is "AES". A
single block in AES is always 16 bytes and there are intrinsics
for encrypting or decrypting a single block. These single-block
intrinsics can work with any mode that uses AES and with any of
the three AES key sizes (128, 192 or 256 bit).
* A more optimized multi-block intrinsic can kick in if the
algorithm/mode is "AES/CBC" (Cipher Block Chaining). Again all
three AES key sizes are supported. There is no technical reason
why we couldn't do multi-block intrinsics for the other modes
(eg, ECB) but I want to get some feedback from the reviewers on
the implementation before charging off on this path.
* The padding part is handled by java routines outside of these
Summary of Changes
Defined the aes instructions which are used by the stub routines.
Actual stub code for the aes intrinsics. As described earlier there
are both single-block and multi-block intrinsic stubs.
Note that the stubs make use of the "expanded key" which gets
created each time the key changes. The expanded key is used by both
the java code and the intrinsic AES instructions.
The java code stores the "expanded key" in big-endian 32-bit
integers. The x86 AES instructions require the expanded key to be
in little-endian 128-bit words. Hence the pshufb instructions to
get the key into the little-endian format
Detect and store the aes capability bit in cpuid. A global boolean
command line flag UseAES can be used to turn off AES even if the
hardware supports it.
The usual definitions of class names, method names and signatures
for the java methods that are being intrinsified and the signatures
for the stubs
Up until now, every intrinsic was replacing a routine that was
loaded by the "default" (NULL) class loader.
com/sun/crypto/provider is not loaded by the default class
loader so we had to add a check here.
escape analysis knows about certain stubs, but if it sees a leaf
stub it also checks against a predefined list. So the new intrinsic
names were added to the list.
The main logic for building up the calls to the stubs at compile
time, assuming the platform has a stub and the global flags have
not turned these intrinsics off.
A new helper routine to load a field from an object was added since
we ended up loading fields in a few places.
For best performance, we wanted to hook into the multi-block
encrypt and decrypt methods such as in CipherBlockChaining.java.
This code is not AES-specific but handles CBC mode for any
algorithm. (The algorithm part is handled by the enclosed
Thus at runtime we want to do the equivalent of an instanceof check
on embeddedCipher and either call the stub (if it is AESCrypt) or
call the original java code (if it is some other algorithm
type). For the CipherBlockChaining.decrypt there is a further
runtime check that the source and destination are not the same
array which, because of the way CBC works would require cloning the
Vladimir added some infrastructure to generate predicated
intrinsics to solve the above problem. A particular intrinsic need
only specify that it is predicated, and generate the particular
guard node which if false will take the Java path. This
infrastructure can be used for future intrinsics that have to make
such a runtime choice. These changes from Vladimir are in
callGenerator.cpp, doCall.cpp, and a small bit in library_call.cpp.
global flags were added to
* turn off either AES encryption or AES decryption intrinsics separately
* turn off the multi-block CBC/AES intrinsics.
By default all of the above are on. These are really there for
testing, for example one could encrypt using Java and decrypt using
Also, a UseAES flag to ignore the hardware capability as described above.