JDK-8177784 : Use CounterMode intrinsic for AES/GCM
  • Type: Bug
  • Component: security-libs
  • Sub-Component: javax.crypto
  • Affected Version: 9,10
  • Priority: P2
  • Status: Closed
  • Resolution: Fixed
  • Submitted: 2017-03-29
  • Updated: 2020-07-30
  • Resolved: 2017-04-12
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 10 JDK 8 JDK 9
10Fixed 8u261Fixed 9 b166Fixed
Related Reports
Relates :  
Description
The GCM mode of operation is essentially CTR mode with an additional operation that provides authenticity. That means that we can reuse parts of the CTR mode code in GCM. The benefit of this arrangement is that there is an intrinsic for AES/CTR that significantly improves the performance of this operation on systems with AES instructions. The current GCM code has a loop that invoked the AES intrinsic one block at a time. It would be much more efficient to invoke the AES/CTR intrinsic on the entire buffer.
Comments
Working on 8u backport. Applies with changing paths. It seems straight backport causes the regression on AESGCMBench: https://cr.openjdk.java.net/~shade/8177784/perf-8u.txt -- looking closer tomorrow. I suspect it needs the actual AES CTR intrinsic in 8u first (JDK-8143925).
30-07-2020

Fix request approved.
11-04-2017

Fix Request This fix should go into jdk9 because of the performance gain of 2-2.5x for a minimal change to take advantage of the intrinsic already in jdk9. The risk is very low as the CounterMode code has been in jdk for a while. The GCTR and CounterMode code was very similar. Existing security tests, including the GCM KAT (Known Answer Tests), as well has the hotspot tests for the intrinsics passed. No new tests are needed because this purely performance. http://cr.openjdk.java.net/~ascarpino/8177784/webrev/
10-04-2017

Using the existing counter intrinsic certainly helps speed up the first half of the operation. The second half, assembling the tag in parallel still needs to be investigated. There two can be split as the counter part is easy change for a huge gain.
30-03-2017

I've attached some benchmark results (gcm_baseline.txt and gcm_improved.txt) that show the improved performance that we get from applying gcm_ctr_patch. The result shows a 246% speedup at a 16K data size. Baseline: Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units AESGCMBench.encrypt AES/GCM/NoPadding 16384 128 thrpt 40 20596.881 �� 520.247 ops/s AESGCMBench.encrypt AES/GCM/PKCS5Padding 16384 128 thrpt 40 20301.049 �� 1123.646 ops/s Improved: Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units AESGCMBench.encrypt AES/GCM/NoPadding 16384 128 thrpt 40 50713.480 �� 1450.879 ops/s AESGCMBench.encrypt AES/GCM/PKCS5Padding 16384 128 thrpt 40 50362.121 �� 1170.086 ops/s
29-03-2017

The file gcm_ctr_patch is a JDK 10 patch that I used as a proof of concept. This patch is a terrible hack, but it demonstrates the sort of performance improvements we can expect from using the CTR intrinsic in GCM.
29-03-2017