Currently there are 2 options: C2 generated code and ARMv8.2-SHA intrinsic impemented in JDK-8252204.
C2 code uses GPRs and has been recently improved after JDK-8333867 but still is not optimal, in particular because of extra register spills.
Current intrinsic is slower than C2 variant on some platforms that support the required extensions like Graviton 3 https://bugs.openjdk.org/browse/JDK-8295698?focusedId=14532815&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14532815 but provides good speed-up on others like Apple Silicon.
It is possible to implement a GPR intrinsic version that will be available for any platform and work faster than C2 version.