JDK-8327036 : [macosx-aarch64] SIGBUS in MarkActivationClosure::do_code_blob reached from Unsafe_CopySwapMemory0
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 17,21
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • OS: os_x
  • CPU: aarch64
  • Submitted: 2024-02-29
  • Updated: 2024-04-08
  • Resolved: 2024-03-06
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 17 JDK 21
17.0.11Fixed 21.0.4-oracleFixed
Related Reports
Relates :  
Relates :  
Relates :  
Description
Unsafe_CopySwapMemory0 uses JVM_ENTRY_FROM_LEAF which transitions form native into the vm using ThreadInVMfromNative.
It is an invariant on Mac OS AARCH64 that a thread must have WXWrite before doing the transition into the vm [1].

In Unsafe_CopySwapMemory0 we don't switch to WXWrite. Because of this we can get at SIGBUS if a handshake is pending where we modify a nmethod in MarkActivationClosure::do_code_blob

Stack: [0x0000000171f24000,0x0000000172127000],  sp=0x0000000172124cd0,  free space=2051k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.dylib+0xfc6c60]  MarkActivationClosure::do_code_blob(CodeBlob*)+0x74
V  [libjvm.dylib+0x1022a84]  JavaThread::nmethods_do(CodeBlobClosure*)+0x114
V  [libjvm.dylib+0x75c4f8]  HandshakeOperation::do_handshake(JavaThread*)+0x70
V  [libjvm.dylib+0x75e054]  HandshakeState::process_by_self(bool)+0x3a8
V  [libjvm.dylib+0xe0f130]  SafepointMechanism::process(JavaThread*, bool)+0x5c
V  [libjvm.dylib+0x535758]  ThreadStateTransition::transition_from_native(JavaThread*, JavaThreadState)+0x1f8
V  [libjvm.dylib+0x4334f0]  ThreadInVMfromNative::ThreadInVMfromNative(JavaThread*)+0xb0
V  [libjvm.dylib+0x106832c]  Unsafe_CopySwapMemory0(JNIEnv_*, _jobject*, _jobject*, long, _jobject*, long, long, long)+0xdc
J 915  jdk.internal.misc.Unsafe.copySwapMemory0(Ljava/lang/Object;JLjava/lang/Object;JJJ)V java.base@17.0.11-internal (0 bytes) @ 0x0000000115f636dc [0x0000000115f63640+0x000000000000009c]
[...]

While we cannot get the very same crash in jdk21 (nmethod sweeper was removed in jdk20 with JDK-8290025) other handshakes that modify the code cache (e.g. DeoptimizeMarkedClosure) will crash too.

Reproduce:
It's not easy to reproduce this directly but when running test/jdk/sun/nio/cs/FindDecoderBugs.java with -XX:+AssertWXAtThreadSync on Mac OS AARCH64 a corresponding assertion fails because of the issue.

Note that the issue is fixed with JDK-8310644 in JDK 22 and later as mdoerr pointed out in his comment below.
JDK 11 is not affected because there's no backport of JDK-8302736 to JDK 11 which removes switching to WXWrite from VM_LEAF_BASE used by UNSAFE_LEAF.

[1] https://github.com/openjdk/jdk/blob/0583f7357480c0500daa82f490b2fcc05f2fb65a/src/hotspot/share/runtime/interfaceSupport.inline.hpp#L253-L259
Comments
A pull request was submitted for review. URL: https://git.openjdk.org/jdk17u/pull/391 Date: 2024-03-12 10:12:11 +0000
12-03-2024

Critical request [17u] I think we should bring this to 17.0.11. Why? * we now figured that this is a regression in 17.0.9 * the change is simple and of low risk, similar coding is used in other places * the effect is a crash which is a severe error * the problem can occur in many scenarios
12-03-2024

Fix request (17u) 17u is affected because test/jdk/sun/nio/cs/FindDecoderBugs.java hits the assertion with -XX:+AssertWXAtThreadSync. I would like to do the backport to avoid the crashes with SIGBUS. The backport applies cleanly. Because of this, the testing, and the small size of the fix I consider the risk low. Testing: I've verified that test/jdk/sun/nio/cs/FindDecoderBugs.java succeeds with -XX:+AssertWXAtThreadSync. GHA: riscv64 failure is caused by https://bugs.openjdk.org/browse/JDK-8326960 The fix passed our CI testing: JTReg tests: tier1-4 of hotspot and jdk. All of Langtools and jaxp. JCK, SPECjvm2008, SPECjbb2015, Renaissance Suite, and SAP specific tests (also with ParallelGC). Testing was done with fastdebug builds on the main platforms and also on Linux/PPC64le.
08-03-2024

A pull request was submitted for review. URL: https://git.openjdk.org/jdk17u-dev/pull/2269 Date: 2024-03-06 09:17:26 +0000
08-03-2024

Changeset: ad1d3248 Author: Richard Reingruber <rrich@openjdk.org> Date: 2024-03-06 09:05:25 +0000 URL: https://git.openjdk.org/jdk21u-dev/commit/ad1d32484a8130c9b641cff38c07e8544b3fd271
06-03-2024

Fix request (21u) I would like to fix the issue to avoid the crashes with SIGBUS. JDK-8310644 fixed this issue as a side effect in JDK 22. I don't want to backport JDK-8310644 though because it is comparatively large. Testing: I've verified that test/jdk/sun/nio/cs/FindDecoderBugs.java succeeds with -XX:+AssertWXAtThreadSync. The fix passed our CI testing: JTReg tests: tier1-4 of hotspot and jdk. All of Langtools and jaxp. JCK, SPECjvm2008, SPECjbb2015, Renaissance Suite, and SAP specific tests (also with ParallelGC). Testing was done with fastdebug builds on the main platforms and also on Linux/PPC64le. Because of the testing and the small size of the fix I consider the risk low.
05-03-2024

Issue is fixed with JDK-8310644 in JDK 22 and later. JDK 21 and older need this fix.
01-03-2024

A pull request was submitted for review. URL: https://git.openjdk.org/jdk21u-dev/pull/305 Date: 2024-02-29 14:49:21 +0000
01-03-2024

Seems https://bugs.openjdk.org/browse/JDK-8284072 dealt with similar issues, these ThreadWXEnable - related issues on macOS aarch64 show up causing crashes here and there every year ...
29-02-2024