Bug ID: JDK-8209862 CipherCore performance improvement

Type: Bug
Component: security-libs
Sub-Component: javax.crypto
Affected Version: 12

Priority: P3
Status: Resolved
Resolution: Fixed
OS: generic
CPU: generic

Submitted: 2018-08-22
Updated: 2019-03-19
Resolved: 2018-10-15

JDK 11	JDK 12	JDK 8	Other
11.0.2Fixed	12 b16Fixed	8u201Fixed	openjdk7uFixed

Please, consider performance improvement for CipherCore.
http://cr.openjdk.java.net/~skuksenko/crypto/8209862/

Preface. 
https://bugs.openjdk.java.net/browse/JDK-8207775 add required data zeroing. That causes massive performance regression:
Regressions caused by JDK-8207775
 (Legend: <algorithm> <keyLength>/<dataSize> <regression Lin64>/<regression Win64>)
AESBench.decrypt        
AES/CBC/NoPadding___          128/01024      -17.4% / -3.9%
AES/CBC/NoPadding___         128/16384      -3.8% / -4.3%
AES/CBC/PKCS5Padding  128/16384      -8.2% / -6.0% 
AES/ECB/NoPadding___          128/01024       -7.3% / -7.6%
AES/ECB/PKCS5Padding   128/16384             0 / -8.6%

AESGSMBench.decrypt     
AES/GCM/NoPadding       128/01024        -4.4% / -3.9%

AESBench.encrypt        
AES/CBC/PKCS5Padding    128/16384           0 / -2.60%

DESedeBench.decrypt     
DESede/CBC/NoPadding___        168/16384        0 / -7.20%          
DESede/CBC/PKCS5Padding 168/16384         0 / -3.70%

DESedeBench.encrypt     
DESede/ECB/NoPadding___        168/16384        0 / -7.30%

In general negative performance effect caused by zeroing can't avoided. But in some cases, CipherCore can be optimized.
Here is list of performance speedup by suggested patch:
Performance improvements by suggested modification
(Legend: <algorithm> <keyLength>/<dataSize> <speedup Lin64>/<speedup Win64>)
AESBench.decrypt        
AES/CBC/NoPadding___         128/_1024     68.10% / 40.20%
AES/CBC/NoPadding___         128/16384   52.20% / 79.10%
AES/CBC/PKCS5Padding  128/16384   38.70% / 72.60%
AES/ECB/NoPadding___         128/_1024     29.40% / 23.90%
AES/ECB/NoPadding___         128/16384   11.60% / 33.50%
AES/ECB/PKCS5Padding  128/16384   15.30% / 38.30%

AESGSMBench.decrypt     
AES/GCM/NoPadding___         128/_1024      7.10% / 7.10%
AES/GCM/NoPadding___         128/16384    9.20% / 2.10%
AES/GCM/PKCS5Padding  128/16384    9.00% / 0   

AESBench.encrypt        
AES/CBC/PKCS5Padding    128/16384    2.50% / 0   
AES/ECB/NoPadding___           128/_1024               0  / 10.50%

DESedeBench.decrypt     
DESede/CBC/PKCS5Padding 168/16384               0 / 3.40%   
DESede/ECB/NoPadding___        168/16384     4.00% / 4.40%
DESede/ECB/PKCS5Padding 168/16384               0 / 5.00%   

DESedeBench.encrypt     
DESede/ECB/NoPadding___       168/16384     6.50% / 0   
DESede/CBC/PKCS5Padding 168/16384     3.90% / 4.10%

That not only covers almost all regression caused by additional zeroing, but gives additional performance benefits.

The idea of the modification:
- CipherCore contains 2 methods:
  doFinal(byte[], int, int)
  doFinal(byte[], int, int, byte[], int )
  The first method allocates output array internally and invokes the second doFinal. 
- At the same time, the second doFinal method contains a lot of checks and additional actions to work properly with user-provider output array. All these actions may be avoided if output array was allocated internally.

What was done:
- Some parts of the code (which can't be eliminated by knowing output array details) from method doFinal(byte[], int, int, byte[], int) were extracted to other methods (checkReinit(),prepareInputBuffer(),checkOutputCapacity()).
- doFinal(byte[], int, int, byte[], int ) was manually inlined to doFinal(byte[], int, int).
- massive manual constant propagation and dead code elimination (I have to note that hotspot JIT is unable to perform all such optimizations, JIT doesn't have enough information).

The key performance factor here is not elimination of some checks. But the fact that we can avoid unnecessary data copying and corresponds zeroing.

Changing to type 'bug'. Given the performance regression, I think it better suits the matter.

19-10-2018

Fix Request: Performance edits to address regressions that came in from JDK-8207775 (a fix pending 11.0.x integration). Security jtreg and TCK testing performed.

15-10-2018

Thanks for the suggested patch. I'll need to take a closer look.

04-09-2018

Relates :	JDK-8209538 - Micros-Crypto performance regressions in 12+6 need investigation
Relates :	JDK-8207775 - Better management of CipherCore buffers