JDK-8321371 : SpinPause() not implemented for bsd_aarch64/macOS
Type:Enhancement
Component:hotspot
Sub-Component:runtime
Affected Version:22
Priority:P4
Status:Resolved
Resolution:Fixed
OS:os_x
CPU:aarch64
Submitted:2023-12-05
Updated:2024-01-19
Resolved:2024-01-08
The Version table provides details related to the release that this issue/RFE will be addressed.
Unresolved : Release in which this issue/RFE will be addressed. Resolved: Release in which this issue/RFE has been resolved. Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.
The SpinPause() function only returns 0 on bsd_aarch64.
Copy the implementation from linux_aarch64.
Comments
Changeset: fc047508
Author: Fredrik Bredberg <fbredberg@openjdk.org>
Committer: Erik Ă–sterlund <eosterlund@openjdk.org>
Date: 2024-01-08 13:30:23 +0000
URL: https://git.openjdk.org/jdk/commit/fc047508170ab666857d740ccf541c2c3b612277
08-01-2024
The PR implements SpinPause() for MacOS on AArch64 and makes it possible to choose between none, nop, isb and yield by using the OnSpinWaitInst option. The same functionality is found on AArch64 based Linux platforms.
In order to avoid a costly call to os::current_thread_enable_wx() on MacOS the implementation doesn't call the StubRoutines::aarch64::spin_wait stub (as it's done on Linux). The implementation is instead hard coded into SpinPause().
21-12-2023
A pull request was submitted for review.
URL: https://git.openjdk.org/jdk/pull/16994
Date: 2023-12-06 14:01:49 +0000
11-12-2023
After having some internal discussions, it seems like the most reasonable thing to do is to implement SpinPause() using a single inline yield instruction. This way we get rid of both the call and the WX stuff.
This solution also showed better performance figures than the OnSpinWaitInst options did.
The reason for using the yield instruction instead of the the isb instruction (which showed slightly better performance figures) is that the yield instruction is meant for this kind of use cases. So even if isb is slightly better on today's silicon, yield is likely to be better in the long run.
08-12-2023
I was running some performance tests after removing ObjectMonitor::NotRunnable() (see JDK-8320317).
The performance went up on Linux x86 and Windows x86 by approximately 12%, but went down with roughly the same amount on macOS AArch64. The performance decreased only slightly on Linux AArch64. So I stated to focus on the differences between macOS and Linux on AArch64 and found out that SpinPause() is implemented on Linux but not on macOS.
So I copied the source from Linux to macOS (or bsd_aarch64 if you'd like) and re-run the tests. This seemed to help bringing back macOS to the Linux level on AArch64.
I do agree that the overhead of doing the stub call is already similar to whatever hint we finally emit in the stub, and that the WX transition back and forth is likely to be quite bad.
My measurements showed that among the different OnSpinWaitInst options, "isb" generated the best result. If we could get rid of the OnSpinWaitInst options and just hard code an isb instruction (or any other instruction that people can agree upon) like it's done on x86, that would probably be best. For now I just wanted macOS AArch64 to be on par with Linux after the removal of NotRunnable().
About measuring the actual cost. Some of the performance tests show notoriously unstable values, when run multiple times. I've focused on the ones that I feel is stable.
07-12-2023
I remember trying the same thing along with JDK-8318986, but I realized SpinPause() is only used from the quite hot VM native code, and so it would probably affect GC and runtime performance. The actual Thread.onSpinWait from Java code should be already handled by intrinsics. I suspect the overhead of doing the stub call is already similar to whatever hint we finally emit in the stub, but the WX transition back and forth is likely to be quite bad to make often. SpinWait is quite likely used in busy loops, so this would add up.
So if we are doing this, we need to check how much does this actually cost.
06-12-2023
See what you mean, but I wasn't really thinking about different ports to BSD, but rather the MacOS port, which is utilizing the stuff in src/hotspot/os_cpu/bsd_aarch64/.
My thought was to implement SpinPause() for MacOS by copying the implementing from src/hotspot/os_cpu/linux_aarch64/os_linux_aarch64.cpp to src/hotspot/os_cpu/bsd_aarch64/os_bsd_aarch64.cpp.
05-12-2023
Ports to BSD are maintained downstream, there may be an implementation in one of the downstream repos.