JDK-8295698 : AArch64: test/jdk/sun/security/ec/ed/EdDSATest.java failed with -XX:+UseSHA3Intrinsics
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 20
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • CPU: aarch64
  • Submitted: 2022-10-19
  • Updated: 2025-06-09
  • Resolved: 2022-11-17
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 20
20 b25Fixed
Related Reports
Blocks :  
Relates :  
Description
On sha3 feature supported hardware, test case test/jdk/sun/security/ec/ed/EdDSATest.java failed with -XX:+UseSHA3Intrinsics

Here shows the snippet of error log.

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x0000ffffa01ae000, pid=125589, tid=125592
#
# JRE version: OpenJDK Runtime Environment (20.0) (fastdebug build 20-internal-adhoc..jdk-src)
# Java VM: OpenJDK 64-Bit Server VM (fastdebug 20-internal-adhoc..jdk-src, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-aarch64)
# Problematic frame:
# v  ~StubRoutines::sha3_implCompressMB 0x0000ffffa01ae000
#
# Core dump will be written. Default location: /tmp/core.125589
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
#


Attached please find the full error log file.
Comments
Concerning the performance issue on Graviton 3 (not the original bug itself). There is now a GPR implementation of SHA3 by JDK-8337666, which is faster than C2 (and simd) variants on Graviton and other machines. I'd like to proceed with making it the default for hardware that doesn't support extensions and for Neoverse, another alternative is to enable SIMD version only for Apple Silicon.
09-06-2025

Changeset: 2f728d0c Author: Dong Bo <dongbo@openjdk.org> Committer: Tobias Hartmann <thartmann@openjdk.org> Date: 2022-11-17 09:05:43 +0000 URL: https://git.openjdk.org/jdk/commit/2f728d0cbb366b98158ca8b2acf4b6f58df2fd52
17-11-2022

[~dongbo] Sure. Let me take a look.
02-11-2022

[~haosun] Hi, I have raised a PR, could you please help to review? Thanks.
02-11-2022

A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/10939 Date: 2022-11-02 03:06:21 +0000
02-11-2022

Perfomance impovements on our pre-silicon simulated platform stays all the same. The latency and thoughput of crypto SHA3 ops are designed to be 1 cpu cycles and 2 pipes respectively. We also tested SHA3 instrinsics on M1, ~50% performance improvements observed. The JMH results are shown below. The performance with the patch attached demonstrates negligible difference with the main stream code. [Default, -XX:-UseSHA3Intrinsics] Benchmark (digesterName) (length) (provider) Mode Cnt Score Error Units MessageDigests.digest SHA3-224 64 DEFAULT thrpt 15 3559.274 ± 7.416 ops/ms MessageDigests.digest SHA3-224 16384 DEFAULT thrpt 15 30.594 ± 0.132 ops/ms MessageDigests.digest SHA3-256 64 DEFAULT thrpt 15 3557.043 ± 5.414 ops/ms MessageDigests.digest SHA3-256 16384 DEFAULT thrpt 15 29.022 ± 0.193 ops/ms MessageDigests.digest SHA3-384 64 DEFAULT thrpt 15 3591.42 ± 5.311 ops/ms MessageDigests.digest SHA3-384 16384 DEFAULT thrpt 15 22.918 ± 0.186 ops/ms MessageDigests.digest SHA3-512 64 DEFAULT thrpt 15 3613.872 ± 6.279 ops/ms MessageDigests.digest SHA3-512 16384 DEFAULT thrpt 15 16.395 ± 0.17 ops/ms MessageDigests.getAndDigest SHA3-224 64 DEFAULT thrpt 15 3211.923 ± 5.847 ops/ms MessageDigests.getAndDigest SHA3-224 16384 DEFAULT thrpt 15 30.066 ± 0.158 ops/ms MessageDigests.getAndDigest SHA3-256 64 DEFAULT thrpt 15 3128.264 ± 93.023 ops/ms MessageDigests.getAndDigest SHA3-256 16384 DEFAULT thrpt 15 28.475 ± 0.141 ops/ms MessageDigests.getAndDigest SHA3-384 64 DEFAULT thrpt 15 3202.693 ± 31.153 ops/ms MessageDigests.getAndDigest SHA3-384 16384 DEFAULT thrpt 15 22.454 ± 0.158 ops/ms MessageDigests.getAndDigest SHA3-512 64 DEFAULT thrpt 15 3287.311 ± 3.882 ops/ms MessageDigests.getAndDigest SHA3-512 16384 DEFAULT thrpt 15 16.385 ± 0.148 ops/ms [-XX:+UnlockDiagnosticVMOptions +UseSHA3Intrinsics] Benchmark (digesterName) (length) (provider) Mode Cnt Score Error Units MessageDigests.digest SHA3-224 64 DEFAULT thrpt 15 5497.51 ± 6.228 ops/ms MessageDigests.digest SHA3-224 16384 DEFAULT thrpt 15 55.065 ± 0.03 ops/ms MessageDigests.digest SHA3-256 64 DEFAULT thrpt 15 5501.651 ± 20.973 ops/ms MessageDigests.digest SHA3-256 16384 DEFAULT thrpt 15 51.88 ± 0.024 ops/ms MessageDigests.digest SHA3-384 64 DEFAULT thrpt 15 5486.111 ± 7 ops/ms MessageDigests.digest SHA3-384 16384 DEFAULT thrpt 15 39.779 ± 0.022 ops/ms MessageDigests.digest SHA3-512 64 DEFAULT thrpt 15 5444.476 ± 44.857 ops/ms MessageDigests.digest SHA3-512 16384 DEFAULT thrpt 15 27.621 ± 0.026 ops/ms MessageDigests.getAndDigest SHA3-224 64 DEFAULT thrpt 15 4680.77 ± 6.453 ops/ms MessageDigests.getAndDigest SHA3-224 16384 DEFAULT thrpt 15 54.693 ± 0.05 ops/ms MessageDigests.getAndDigest SHA3-256 64 DEFAULT thrpt 15 4680.802 ± 6.011 ops/ms MessageDigests.getAndDigest SHA3-256 16384 DEFAULT thrpt 15 51.52 ± 0.042 ops/ms MessageDigests.getAndDigest SHA3-384 64 DEFAULT thrpt 15 4635.309 ± 18.437 ops/ms MessageDigests.getAndDigest SHA3-384 16384 DEFAULT thrpt 15 39.547 ± 0.036 ops/ms MessageDigests.getAndDigest SHA3-512 64 DEFAULT thrpt 15 4727.525 ± 9.322 ops/ms MessageDigests.getAndDigest SHA3-512 16384 DEFAULT thrpt 15 27.477 ± 0.014 ops/ms IMHO, the performance benifit of SHA3 intrinsics depends on the micro architecture, it should be switched on/off based on the running platform. For those who are interested in running the JMH on M1/MacOS, modification below is needed to enable SHA3Intriniscs by default. Other features, i.e. UseSHA, can not be automatically detected neither, seems current hardware feature detection logic does not work on MacOS. --- a/src/hotspot/cpu/aarch64/vm_version_aarch64.cpp +++ b/src/hotspot/cpu/aarch64/vm_version_aarch64.cpp @@ -334,15 +334,15 @@ void VM_Version::initialize() { FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); } - if (UseSHA && VM_Version::supports_sha3()) { + // if (UseSHA && VM_Version::supports_sha3()) { // Do not auto-enable UseSHA3Intrinsics until it has been fully tested on hardware - // if (FLAG_IS_DEFAULT(UseSHA3Intrinsics)) { - // FLAG_SET_DEFAULT(UseSHA3Intrinsics, true); - // } - } else if (UseSHA3Intrinsics) { - warning("Intrinsics for SHA3-224, SHA3-256, SHA3-384 and SHA3-512 crypto hash functions not available on this CPU."); - FLAG_SET_DEFAULT(UseSHA3Intrinsics, false); - } + if (FLAG_IS_DEFAULT(UseSHA3Intrinsics)) { + FLAG_SET_DEFAULT(UseSHA3Intrinsics, true); + } + //} else if (UseSHA3Intrinsics) { + // warning("Intrinsics for SHA3-224, SHA3-256, SHA3-384 and SHA3-512 crypto hash functions not available on this CPU."); + // FLAG_SET_DEFAULT(UseSHA3Intrinsics, false); + //}
01-11-2022

[~haosun] Thanks for the testing. So now we can say that the modifications in this patch does not leading to the performance regression. I'm not quite surprised by the regression on Graviton3. As I mentioned in last comment, all crypto SHA3 ops in Neoverse V1 take 2 cpu cycles and only have one execution pipe. I believe the critical part of the SHA3 instrinsics is `keccak()` loop, i.e. `rounds24_loop`. It is almost the same as code sequence shown in ARM architecture reference manual, section `K10.2.2 Use of the SHA3 instructions`. And the code snippet seems quite straight, I'm afraid there is little we can do from software side. Because we do not have a real hardware yet, I'll re-check the performance benifits via simulation platform. If performance still goes fine, then a PR for fix this crash issue. Thanks.
27-10-2022

[~dongbo] Thanks for your prompt rely. The following shows the JMH results without the patch, i.e. using the latest JDK mainstream code base. I'm afraid we still got similar performance regression. Hope the data will be useful to you. Thanks~ ``` # Disable sha3 intrinsics (Before) # run_numactl make test TEST=micro:MessageDigests Benchmark (digesterName) (length) (provider) Mode Cnt Score Error Units MessageDigests.digest SHA3-224 64 DEFAULT thrpt 2 2192.973 ops/ms MessageDigests.digest SHA3-224 16384 DEFAULT thrpt 2 19.920 ops/ms MessageDigests.digest SHA3-256 64 DEFAULT thrpt 2 2239.828 ops/ms MessageDigests.digest SHA3-256 16384 DEFAULT thrpt 2 18.896 ops/ms MessageDigests.digest SHA3-384 64 DEFAULT thrpt 2 2207.647 ops/ms MessageDigests.digest SHA3-384 16384 DEFAULT thrpt 2 14.873 ops/ms MessageDigests.digest SHA3-512 64 DEFAULT thrpt 2 2239.472 ops/ms MessageDigests.digest SHA3-512 16384 DEFAULT thrpt 2 10.633 ops/ms MessageDigests.getAndDigest SHA3-224 64 DEFAULT thrpt 2 1901.341 ops/ms MessageDigests.getAndDigest SHA3-224 16384 DEFAULT thrpt 2 19.814 ops/ms MessageDigests.getAndDigest SHA3-256 64 DEFAULT thrpt 2 1920.579 ops/ms MessageDigests.getAndDigest SHA3-256 16384 DEFAULT thrpt 2 18.793 ops/ms MessageDigests.getAndDigest SHA3-384 64 DEFAULT thrpt 2 1942.584 ops/ms MessageDigests.getAndDigest SHA3-384 16384 DEFAULT thrpt 2 14.870 ops/ms MessageDigests.getAndDigest SHA3-512 64 DEFAULT thrpt 2 1951.758 ops/ms MessageDigests.getAndDigest SHA3-512 16384 DEFAULT thrpt 2 10.649 ops/ms # Enable sha3 intrinsics (After) # run_numactl make test TEST=micro:MessageDigests MICRO="VM_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:+UseSHA3Intrinsics" Benchmark (digesterName) (length) (provider) Mode Cnt Score Error Units MessageDigests.digest SHA3-224 64 DEFAULT thrpt 2 1524.555 ops/ms MessageDigests.digest SHA3-224 16384 DEFAULT thrpt 2 14.789 ops/ms MessageDigests.digest SHA3-256 64 DEFAULT thrpt 2 1534.675 ops/ms MessageDigests.digest SHA3-256 16384 DEFAULT thrpt 2 13.935 ops/ms MessageDigests.digest SHA3-384 64 DEFAULT thrpt 2 1518.855 ops/ms MessageDigests.digest SHA3-384 16384 DEFAULT thrpt 2 10.674 ops/ms MessageDigests.digest SHA3-512 64 DEFAULT thrpt 2 1515.757 ops/ms MessageDigests.digest SHA3-512 16384 DEFAULT thrpt 2 7.399 ops/ms MessageDigests.getAndDigest SHA3-224 64 DEFAULT thrpt 2 1398.380 ops/ms MessageDigests.getAndDigest SHA3-224 16384 DEFAULT thrpt 2 14.716 ops/ms MessageDigests.getAndDigest SHA3-256 64 DEFAULT thrpt 2 1415.706 ops/ms MessageDigests.getAndDigest SHA3-256 16384 DEFAULT thrpt 2 13.869 ops/ms MessageDigests.getAndDigest SHA3-384 64 DEFAULT thrpt 2 1416.226 ops/ms MessageDigests.getAndDigest SHA3-384 16384 DEFAULT thrpt 2 10.638 ops/ms MessageDigests.getAndDigest SHA3-512 64 DEFAULT thrpt 2 1425.434 ops/ms MessageDigests.getAndDigest SHA3-512 16384 DEFAULT thrpt 2 7.382 ops/ms ```
27-10-2022

[~haosun] Thanks for the testing. You're right, SHAKE256 has been supported in since JDK-8166597, which is ealier than JDK-8252204. I made the mistake by looking at the wrong commit number. ``` STDERR: java.lang.RuntimeException: Actual array: 0c8b70e543f25783999a7f9c4765b2f6104c5900650a0c4ff571d666fb0986aa73ef862b92d9cee98b3010ae8ea478ddbae2a421da83243f0056a96159ac37e83c751f88b9e7b9ed33878bb8a9130343e773c9 5bff7b3e4839145b81434b74e216a2069d646db517967a9b042ceb2d6f3100, Expected array:0c8b70e543f25783999a7f9c4765b2f6104c5900650a0c4ff571d666fb0986aa73ef862b92d9cee98b3010ae8ea478ddbae2a421da83243f0009d347606b8916e 2de717623532bfcf6ecbb5ea83acd9701914afda7cdc13217402b288e33e759c89c30a2cc6d8926db756623d763bf150a00 at EdDSATest.equals(EdDSATest.java:351) at EdDSATest.signAndVerify(EdDSATest.java:226) ... ``` For the array mismatch error, I think it is another issue will caused by the old `digest_length` logic. I ran the test serveral times, both segment fault crash and the mismatch error can be observed. The implentaion in linux kernel was referenced for SHA3 intrinsics: https://github.com/torvalds/linux/blob/b229b6ca5abbd63ff40c1396095b1b36b18139c3/arch/arm64/crypto/sha3-ce-core.S#L43. We witnessed ~30% improvements on our pre-silicon simulation platform with the code in the latest JDK mainstream. The cpu core used by Graviton3 is Neoverse V1. All its crypto SHA3 ops, e.g. `eor3 v25.16B, v12.16B, v7.16B, v2.16B`, take 2 cpu cycles and only have one execution pipe. The instructions manipulate general purpose registers, like `eor x1, x2, x3`, take only 1 cycle and have 4 execution pipes. Perhaps that's why Gravitio3 has the performance regression. Besides that, with the patch I uploaded in this issue, the SHA3 instrinsics will execute 1 or 2 more branch instructions than the mainstream code. Would you mind to run the performance test without the patch? So that we can narrow down the scope of the performance issue. Thanks.
27-10-2022

Hi [~dongbo], here shows the data of JMH case MessageDigests.java on Graviton3. Note-1: the following update is made to cover more SHA3 algorithms. ``` --- a/test/micro/org/openjdk/bench/java/security/MessageDigests.java +++ b/test/micro/org/openjdk/bench/java/security/MessageDigests.java @@ -53,7 +53,7 @@ public class MessageDigests { @Param({"64", "16384"}) private int length; - @Param({"md5", "SHA-1", "SHA-224", "SHA-256", "SHA-384", "SHA-512", "SHA3-256", "SHA3-512"}) + @Param({"md5", "SHA-1", "SHA-224", "SHA-256", "SHA-384", "SHA-512", "SHA3-224", "SHA3-256", "SHA3-384", "SHA3-512"}) private String digesterName; @Param({"DEFAULT"}) ``` Note-2: the performance testing is run with your patch. ``` # Disable sha3 intrinsics (Before) # make test TEST=micro:MessageDigests Benchmark (digesterName) (length) (provider) Mode Cnt Score Error Units MessageDigests.digest md5 64 DEFAULT thrpt 5 3311.755 ± 0.831 ops/ms MessageDigests.digest md5 16384 DEFAULT thrpt 5 25.948 ± 0.002 ops/ms MessageDigests.digest SHA-1 64 DEFAULT thrpt 5 10024.275 ± 3.415 ops/ms MessageDigests.digest SHA-1 16384 DEFAULT thrpt 5 94.973 ± 0.003 ops/ms MessageDigests.digest SHA-224 64 DEFAULT thrpt 5 9851.082 ± 5.359 ops/ms MessageDigests.digest SHA-224 16384 DEFAULT thrpt 5 98.633 ± 0.006 ops/ms MessageDigests.digest SHA-256 64 DEFAULT thrpt 5 9827.080 ± 7.214 ops/ms MessageDigests.digest SHA-256 16384 DEFAULT thrpt 5 98.623 ± 0.014 ops/ms MessageDigests.digest SHA-384 64 DEFAULT thrpt 5 2406.742 ± 1.053 ops/ms MessageDigests.digest SHA-384 16384 DEFAULT thrpt 5 20.387 ± 0.005 ops/ms MessageDigests.digest SHA-512 64 DEFAULT thrpt 5 2386.029 ± 1.177 ops/ms MessageDigests.digest SHA-512 16384 DEFAULT thrpt 5 20.387 ± 0.007 ops/ms MessageDigests.digest SHA3-224 64 DEFAULT thrpt 5 2306.868 ± 0.965 ops/ms MessageDigests.digest SHA3-224 16384 DEFAULT thrpt 5 19.983 ± 0.005 ops/ms MessageDigests.digest SHA3-256 64 DEFAULT thrpt 5 2328.226 ± 1.473 ops/ms MessageDigests.digest SHA3-256 16384 DEFAULT thrpt 5 18.987 ± 0.005 ops/ms MessageDigests.digest SHA3-384 64 DEFAULT thrpt 5 2321.988 ± 0.146 ops/ms MessageDigests.digest SHA3-384 16384 DEFAULT thrpt 5 14.960 ± 0.002 ops/ms MessageDigests.digest SHA3-512 64 DEFAULT thrpt 5 2347.715 ± 0.924 ops/ms MessageDigests.digest SHA3-512 16384 DEFAULT thrpt 5 10.691 ± 0.001 ops/ms MessageDigests.getAndDigest md5 64 DEFAULT thrpt 5 2742.342 ± 0.529 ops/ms MessageDigests.getAndDigest md5 16384 DEFAULT thrpt 5 25.855 ± 0.002 ops/ms MessageDigests.getAndDigest SHA-1 64 DEFAULT thrpt 5 6722.189 ± 2.467 ops/ms MessageDigests.getAndDigest SHA-1 16384 DEFAULT thrpt 5 94.617 ± 0.072 ops/ms MessageDigests.getAndDigest SHA-224 64 DEFAULT thrpt 5 6420.922 ± 15.793 ops/ms MessageDigests.getAndDigest SHA-224 16384 DEFAULT thrpt 5 98.231 ± 0.033 ops/ms MessageDigests.getAndDigest SHA-256 64 DEFAULT thrpt 5 6456.705 ± 19.116 ops/ms MessageDigests.getAndDigest SHA-256 16384 DEFAULT thrpt 5 98.259 ± 0.036 ops/ms MessageDigests.getAndDigest SHA-384 64 DEFAULT thrpt 5 1935.874 ± 19.817 ops/ms MessageDigests.getAndDigest SHA-384 16384 DEFAULT thrpt 5 20.274 ± 0.035 ops/ms MessageDigests.getAndDigest SHA-512 64 DEFAULT thrpt 5 1951.265 ± 6.875 ops/ms MessageDigests.getAndDigest SHA-512 16384 DEFAULT thrpt 5 20.281 ± 0.030 ops/ms MessageDigests.getAndDigest SHA3-224 64 DEFAULT thrpt 5 1954.348 ± 16.831 ops/ms MessageDigests.getAndDigest SHA3-224 16384 DEFAULT thrpt 5 19.579 ± 0.004 ops/ms MessageDigests.getAndDigest SHA3-256 64 DEFAULT thrpt 5 1991.660 ± 0.981 ops/ms MessageDigests.getAndDigest SHA3-256 16384 DEFAULT thrpt 5 18.688 ± 0.011 ops/ms MessageDigests.getAndDigest SHA3-384 64 DEFAULT thrpt 5 1992.789 ± 0.324 ops/ms MessageDigests.getAndDigest SHA3-384 16384 DEFAULT thrpt 5 14.704 ± 0.098 ops/ms MessageDigests.getAndDigest SHA3-512 64 DEFAULT thrpt 5 2005.937 ± 1.201 ops/ms MessageDigests.getAndDigest SHA3-512 16384 DEFAULT thrpt 5 10.507 ± 0.004 ops/ms # Enable sha3 intrinsics (After) # make test TEST=micro:MessageDigests MICRO="VM_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:+UseSHA3Intrinsics" Benchmark (digesterName) (length) (provider) Mode Cnt Score Error Units MessageDigests.digest md5 64 DEFAULT thrpt 5 3312.531 ± 0.923 ops/ms MessageDigests.digest md5 16384 DEFAULT thrpt 5 25.948 ± 0.003 ops/ms MessageDigests.digest SHA-1 64 DEFAULT thrpt 5 10120.345 ± 2.913 ops/ms MessageDigests.digest SHA-1 16384 DEFAULT thrpt 5 94.971 ± 0.011 ops/ms MessageDigests.digest SHA-224 64 DEFAULT thrpt 5 9850.342 ± 5.056 ops/ms MessageDigests.digest SHA-224 16384 DEFAULT thrpt 5 98.630 ± 0.005 ops/ms MessageDigests.digest SHA-256 64 DEFAULT thrpt 5 9932.454 ± 8.612 ops/ms MessageDigests.digest SHA-256 16384 DEFAULT thrpt 5 98.620 ± 0.007 ops/ms MessageDigests.digest SHA-384 64 DEFAULT thrpt 5 2404.487 ± 1.394 ops/ms MessageDigests.digest SHA-384 16384 DEFAULT thrpt 5 20.375 ± 0.004 ops/ms MessageDigests.digest SHA-512 64 DEFAULT thrpt 5 2348.372 ± 2.656 ops/ms MessageDigests.digest SHA-512 16384 DEFAULT thrpt 5 20.382 ± 0.004 ops/ms MessageDigests.digest SHA3-224 64 DEFAULT thrpt 5 1572.587 ± 41.909 ops/ms MessageDigests.digest SHA3-224 16384 DEFAULT thrpt 5 14.798 ± 0.002 ops/ms MessageDigests.digest SHA3-256 64 DEFAULT thrpt 5 1568.188 ± 44.675 ops/ms MessageDigests.digest SHA3-256 16384 DEFAULT thrpt 5 13.943 ± 0.007 ops/ms MessageDigests.digest SHA3-384 64 DEFAULT thrpt 5 1567.505 ± 0.812 ops/ms MessageDigests.digest SHA3-384 16384 DEFAULT thrpt 5 10.680 ± 0.001 ops/ms MessageDigests.digest SHA3-512 64 DEFAULT thrpt 5 1574.162 ± 1.321 ops/ms MessageDigests.digest SHA3-512 16384 DEFAULT thrpt 5 7.402 ± 0.001 ops/ms MessageDigests.getAndDigest md5 64 DEFAULT thrpt 5 2728.874 ± 0.406 ops/ms MessageDigests.getAndDigest md5 16384 DEFAULT thrpt 5 25.851 ± 0.003 ops/ms MessageDigests.getAndDigest SHA-1 64 DEFAULT thrpt 5 6698.834 ± 2.710 ops/ms MessageDigests.getAndDigest SHA-1 16384 DEFAULT thrpt 5 94.612 ± 0.072 ops/ms MessageDigests.getAndDigest SHA-224 64 DEFAULT thrpt 5 6347.114 ± 26.600 ops/ms MessageDigests.getAndDigest SHA-224 16384 DEFAULT thrpt 5 98.221 ± 0.048 ops/ms MessageDigests.getAndDigest SHA-256 64 DEFAULT thrpt 5 6389.373 ± 29.867 ops/ms MessageDigests.getAndDigest SHA-256 16384 DEFAULT thrpt 5 98.255 ± 0.055 ops/ms MessageDigests.getAndDigest SHA-384 64 DEFAULT thrpt 5 1946.505 ± 13.586 ops/ms MessageDigests.getAndDigest SHA-384 16384 DEFAULT thrpt 5 20.270 ± 0.036 ops/ms MessageDigests.getAndDigest SHA-512 64 DEFAULT thrpt 5 1948.869 ± 10.119 ops/ms MessageDigests.getAndDigest SHA-512 16384 DEFAULT thrpt 5 20.276 ± 0.034 ops/ms MessageDigests.getAndDigest SHA3-224 64 DEFAULT thrpt 5 1403.128 ± 11.181 ops/ms MessageDigests.getAndDigest SHA3-224 16384 DEFAULT thrpt 5 14.772 ± 0.067 ops/ms MessageDigests.getAndDigest SHA3-256 64 DEFAULT thrpt 5 1415.765 ± 0.259 ops/ms MessageDigests.getAndDigest SHA3-256 16384 DEFAULT thrpt 5 13.912 ± 0.086 ops/ms MessageDigests.getAndDigest SHA3-384 64 DEFAULT thrpt 5 1414.935 ± 0.184 ops/ms MessageDigests.getAndDigest SHA3-384 16384 DEFAULT thrpt 5 10.643 ± 0.011 ops/ms MessageDigests.getAndDigest SHA3-512 64 DEFAULT thrpt 5 1426.190 ± 0.346 ops/ms MessageDigests.getAndDigest SHA3-512 16384 DEFAULT thrpt 5 7.386 ± 0.001 ops/ms ``` Unfortunately, from the data we can see that performance regression occurred when using sha3 intrinsics. Do you have any clue about this?
27-10-2022

Hi [~dongbo], thanks a lot for your quick fix! I built the latest code from upstream with your patch, and tier1~3 passed on sha3 supported hardware. I also launched the JMH performance testing and will post back the result when the test finished. Besides, you mentioned that "SHAKE256 is supported by JDK after JDK-8252204, so we missed this before", but I doubt that. I think SHAKE256 is already there when supporting JDK-8252204. In my local test, I reverted the code back to the commit JDK-8252204, and EdDSATest.java failed as well. Here shows the snippet of the error log. ``` STDERR: java.lang.RuntimeException: Actual array: 0c8b70e543f25783999a7f9c4765b2f6104c5900650a0c4ff571d666fb0986aa73ef862b92d9cee98b3010ae8ea478ddbae2a421da83243f0056a96159ac37e83c751f88b9e7b9ed33878bb8a9130343e773c9 5bff7b3e4839145b81434b74e216a2069d646db517967a9b042ceb2d6f3100, Expected array:0c8b70e543f25783999a7f9c4765b2f6104c5900650a0c4ff571d666fb0986aa73ef862b92d9cee98b3010ae8ea478ddbae2a421da83243f0009d347606b8916e 2de717623532bfcf6ecbb5ea83acd9701914afda7cdc13217402b288e33e759c89c30a2cc6d8926db756623d763bf150a00 at EdDSATest.equals(EdDSATest.java:351) at EdDSATest.signAndVerify(EdDSATest.java:226) at EdDSATest.testSignature(EdDSATest.java:206) at EdDSATest.test(EdDSATest.java:123) at EdDSATest.main(EdDSATest.java:82) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:564) at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:312) at java.base/java.lang.Thread.run(Thread.java:832) JavaTest Message: Test threw exception: java.lang.RuntimeException JavaTest Message: shutting down test STDOUT: Case Algo:EdDSA, Param:null, Intitiate with random:true Passed. Case Algo:Ed25519, Param:null, Intitiate with random:true Passed. Case Algo:Ed448, Param:null, Intitiate with random:true TEST RESULT: Failed. Execution failed: `main' threw exception: java.lang.RuntimeException: Actual array: 0c8b70e543f25783999a7f9c4765b2f6104c5900650a0c4ff571d666fb0986aa73ef862b92d9cee98b3010ae8ea478ddbae2a42 1da83243f0056a96159ac37e83c751f88b9e7b9ed33878bb8a9130343e773c95bff7b3e4839145b81434b74e216a2069d646db517967a9b042ceb2d6f3100, Expected array:0c8b70e543f25783999a7f9c4765b2f6104c5900650a0c4ff571d666fb0986aa73 ef862b92d9cee98b3010ae8ea478ddbae2a421da83243f0009d347606b8916e2de717623532bfcf6ecbb5ea83acd9701914afda7cdc13217402b288e33e759c89c30a2cc6d8926db756623d763bf150a00 ``` Perhaps, EdDSATest.java behaved slightly differently between QEMU (the test environment used in JDK-8252204) and the real hardware as I used. Or the test failure was omitted by mistake somehow when implementing JDK-8252204.
26-10-2022

Hi, I upload a fix for this issue, see the attachment. As mentioned by [~haosun], the cause of the crash is `block_size == 200 - 2 * digest_length` is not true for SHAKE128 and SHAKE256. The `digest_length` are variable (input by user) for SHAKE128 and SHAKE256: digest_length block_size SHA3-224 28 144 SHA3-256 32 136 SHA3-384 48 104 SHA3-512 64 72 SHAKE128 variable 168 SHAKE256 variable 136 SHAKE256 is supported by JDK after JDK-8252204, so we missed this before. The main idea of the fix is to pass the `block_size` and use it to distingush these SHA3 functions. Tests `test/jdk/sun/security/ec/ed/EdDSATest.java` and `./test/jdk/sun/security/provider/MessageDigest/SHA3.java` both passed with this fix. We don't have a hardware with SHA3 yet, so the tests are executed with QEMU. More tests are still running. [~haosun] could you please also help to test this on real hardware? I think we also need performance results, theJMH is available at `test/micro/org/openjdk/bench/java/security/MessageDigests.java`. Thanks a lot.
25-10-2022

Thanks, assigning this to [~dongbo]!
25-10-2022

Hi, I've reproduced the crash, will also look into and try to fix this. Thanks.
25-10-2022

[~thartmann] Thanks for letting us know. I will check with the original author of the work. [~dongbo] Could you please take a look if we missed anything before? Thanks.
25-10-2022

I'd like to share my observations. `200 - 2 * digestLength` is used to compute `block_size`. See https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#L4015 It should be correct for SHA3-224/256/384/512, since `WIDTH - c` is used as the blockSize in SHA3 constructor. See https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/sun/security/provider/SHA3.java#L74. Note that WIDTH is 200, and `c` is 2 times of the digestLength for SHA3-224/256/384/512. See https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/sun/security/provider/SHA3.java#L299~L332 However, for SHAKE-256, `c` is NOT always 2 times of the digestLength. See https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/sun/security/provider/SHAKE256.java#L32. Here, `d` can be 64 or 114 for ed448. As a result, in the generated sha3 intrinsics, integer underflow would occur, i.e. turning `ofs` as a negative value. As a result, infinite loop is executed and memory OOB for `buf` is triggered. Hence, in my opinion, we may also want to pass another argument to the sha3 intrinsic, i.e. blockSize. The above is my conjecture, and I will make further testing.
24-10-2022

[~fyang] since you authored/sponsored JDK-8252204, any plans to look into this?
24-10-2022

Updated ILW = Crash, single test with diagnostic and non-default flag, -XX:-UseSHA3Intrinsics = HLM = P3
24-10-2022

ILW = crash; one test with -XX:+UseSHA3Intrinsics; no workaround if -XX:+UseSHA3Intrinsics is needed = HLH = P2
20-10-2022