JDK-8327990 : [macosx-aarch64] Various tests fail with -XX:+AssertWXAtThreadSync
  • Type: Bug
  • Component: hotspot
  • Sub-Component: jfr
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • OS: os_x
  • CPU: aarch64
  • Submitted: 2024-03-12
  • Updated: 2024-06-10
  • Resolved: 2024-03-21
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 21 JDK 22 JDK 23
21.0.5Fixed 22.0.2Fixed 23 b16Fixed
Related Reports
Relates :  
Relates :  
Description
On macOS/aarch64 a thread must switch to WXWrite mode before entering the VM [1]
Entering the VM without WXWrite can cause sporadic crashes as described in JDK-8327036.

Below are locations that violate the invariant. For each location a test is given that fails if executed with with -XX:+AssertWXAtThreadSync.

JfrJvmtiAgent::retransform_classes()
https://github.com/openjdk/jdk/blob/cfd9209e03176bd8e02acd74b51a16f3113fbd21/src/hotspot/share/jfr/instrumentation/jfrJvmtiAgent.cpp#L149
jfr_set_enabled()
https://github.com/openjdk/jdk/blob/0776fff0e321c3f541354404b3ec0aa1992923a0/src/hotspot/share/jfr/jni/jfrJniMethod.cpp#L104
Tests:
jdk/jfr/event/runtime/TestClassLoadEvent.java
compiler/intrinsics/klass/CastNullCheckDroppingsTest.java

JvmtiExport::get_jvmti_interface
https://github.com/openjdk/jdk/blob/c342188fd978dd94e7788fb0fb0345fd8c0eaa9a/src/hotspot/share/prims/jvmtiExport.cpp#L385
Test:serviceability/jvmti/vthread/GetThreadState/GetThreadStateTest.java

GetCarrierThread
https://github.com/openjdk/jdk/blob/c342188fd978dd94e7788fb0fb0345fd8c0eaa9a/src/hotspot/share/prims/jvmtiExtensions.cpp#L133
Test:serviceability/jvmti/vthread/GetThreadStateMountedTest/GetThreadStateMountedTest.java

JfrRecorderService::emit_leakprofiler_events
https://github.com/openjdk/jdk/blob/c342188fd978dd94e7788fb0fb0345fd8c0eaa9a/src/hotspot/share/jfr/recorder/service/jfrRecorderService.cpp#L702
Test:jdk/jfr/event/oldobject/TestObjectAge.java

JfrJavaEventWriter::flush
https://github.com/openjdk/jdk/blob/c342188fd978dd94e7788fb0fb0345fd8c0eaa9a/src/hotspot/share/jfr/writers/jfrJavaEventWriter.cpp#L126
JfrStorage::register_full
https://github.com/openjdk/jdk/blob/c342188fd978dd94e7788fb0fb0345fd8c0eaa9a/src/hotspot/share/jfr/recorder/storage/jfrStorage.cpp#L294
Test:jdk/jfr/api/consumer/TestRecordingFileWrite.java

JfrIntrinsicSupport::write_checkpoint
https://github.com/openjdk/jdk/blob/d3f3011d56267360d65841da3550eca79cf1575b/src/hotspot/share/jfr/support/jfrIntrinsics.cpp#L54
Test:jdk/jfr/threading/TestManyVirtualThreads.java

[1] https://github.com/openjdk/jdk/blob/0583f7357480c0500daa82f490b2fcc05f2fb65a/src/hotspot/share/runtime/interfaceSupport.inline.hpp#L253-L259
Comments
A pull request was submitted for review. URL: https://git.openjdk.org/jdk21u-dev/pull/580 Date: 2024-05-16 11:54:19 +0000
06-06-2024

same here, please prepare the PR and get it reviewed in case it's not clean.
06-06-2024

Fix request (21u) The backport from 22u to 21u applies cleanly. Risk is low. Testing: I've verified every added WXState change if it is actually required. The fix passed our CI testing with AssertWXAtThreadSync enabled: JTReg tests: tier1-4 of hotspot and jdk. All of Langtools and jaxp. JCK, SPECjvm2008, SPECjbb2015, Renaissance Suite, and SAP specific tests.
05-06-2024

Fix request (22u) I would like to backport this to jdk 22 to get clean testing with -XX:+AssertWXAtThreadSync and avoid crashes because the VM is entered with the wrong WXState. The latter is very unlikely nevertheless aiming for clean tests with -XX:+AssertWXAtThreadSync will help to reduce the probability that users are affected by regressions in this area. The original change did not apply cleanly because JfrRecorderService::emit_leakprofiler_events() does not exist in jdk 22 so I left out the hunk in jfrRecorderService.cpp. I consider the risk of the backport low. It could happen that dynamically generated code is called after changing to WXWrite and before the transition to _thread_in_vm (this would crash) but during the review of the original change it has been found that this does not happen. Testing: I've verified every added WXState change if it is actually required. The fix passed our CI testing with AssertWXAtThreadSync enabled: JTReg tests: tier1-4 of hotspot and jdk. All of Langtools and jaxp. JCK, SPECjvm2008, SPECjbb2015, Renaissance Suite, and SAP specific tests.
09-04-2024

A pull request was submitted for review. URL: https://git.openjdk.org/jdk22u/pull/133 Date: 2024-04-08 14:48:14 +0000
09-04-2024

> Any place where we find that we need to switch to 'W' the first question I would want us to ask is, why are we in 'X', is that expected, or is some code holding onto 'X' for too long? In general it is because we were outside the VM and code outside the VM defaults to X. E.g the JavaCallWrapper sets X mode before invoking the Java method, so if that Java method then does something to call back into the VM we have to switch to W mode. > If we do then I thought a JavaThread should only be in "W" for as short as possible. My understanding from the original macOS port that had to implement this, is that the transition can be expensive so trades-off fine-grained, narrow scoped, control, for performance. Hence the general rule of W inside the VM, X outside the VM, and switch to X when needed in the VM. Someone would have to do a detailed analysis to find every location that specifically needs X or specifically needs Y to determine which default would be more performant overall. Also note that W is needed in more cases than just the obvious generation of JIT'd code, there is also stub generation and updating of oops within stubs (apparently).
26-03-2024

It might be good for stability to do that. But is "W" even a good default from a security point of view for a JavaThread? I thought "W" would be a good default for JIT threads. I'm not an expert in this area and I don't know if we actually want to benefit from W^X neither (do we?). If we do then I thought a JavaThread should only be in "W" for as short as possible.
22-03-2024

All the jvmti calls follow similar pattern: MACOS_AARCH64_ONLY(ThreadWXEnable __wx(WXWrite, current_thread)); ThreadInVMfromNative __tiv(current_thread); VM_ENTRY_BASE(jvmtiError, jvmti_Allocate , current_thread)
22-03-2024

I agree that we need to make sure that all the place identified here need the WX state to be in 'W', which is the "default", however, I wish that we could find a way to make sure that we turn OFF non-default state as soon as possible, not sprinkle out 'W' state transitions. 'W' is the default, we should make an effort to make sure that 'X' is only turned on for min duration possible. Any place where we find that we need to switch to 'W' the first question I would want us to ask is, why are we in 'X', is that expected, or is some code holding onto 'X' for too long?
22-03-2024

If 'W' is our default state, why not just enforce and set WX state to 'W' here and if it doesn't match simply print out a warning to a log?
22-03-2024

A related question - should we change the default of AssertWXAtThreadSync globals-flag from false to true (so that the checking/assert is always done on macOS aarch64) ? Benefit would be that such issues like the ones fixed in this PR are seen earlier. (however it may cost a bit of performance)
22-03-2024

Changeset: e41bc42d Author: Richard Reingruber <rrich@openjdk.org> Date: 2024-03-21 14:09:42 +0000 URL: https://git.openjdk.org/jdk/commit/e41bc42deb22615c9b93ee639d04e9ed2bd57f64
21-03-2024

> Shouldn't the ThreadInVMfromNative transition be responsible for setting WXWrite mode? This would be reasonable in my opinion. In my PR I've hoisted setting WXWrite mode in JfrJvmtiAgent::retransform_classes() because multiple instance of ThreadInVMfromNative are reached. This is likely not even necessary. Still exceptions could be made if there are usages of ThreadInVMfromNative where it is needed.
13-03-2024

A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/18238 Date: 2024-03-12 15:05:00 +0000
13-03-2024

Shouldn't the ThreadInVMfromNative transition be responsible for setting WXWrite mode?
13-03-2024