Bug ID: JDK-8318776 Require supports

JDK-8318776 : Require supports_cx8 to always be true

Type: Enhancement
Component: hotspot
Sub-Component: runtime
Affected Version: 22

Priority: P3
Status: Resolved
Resolution: Fixed

Submitted: 2023-10-25
Updated: 2023-11-27
Resolved: 2023-11-23

Versions (Unresolved/Resolved/Fixed)

The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.

JDK 22
22 masterFixed

Related Reports

Blocks :	JDK-8320052 - Zero: Use __atomic built-ins for atomic RMW operations
Blocks :	JDK-8319777 - Zero: Support 8-byte cmpxchg
Blocks :	JDK-8320582 - Zero: Misplaced CX8 enablement flag
Relates :	JDK-8315149 - Add hsperf counters for CPU time of internal GC threads
Relates :	JDK-8180620 - Clarify VarHandle mixed-access subtleties
Relates :	JDK-8316961 - Fallback implementations for 64-bit Atomic::{add,xchg} on 32-bit platforms

Description

Regardless of platform size (32-bit or 64-bit) the Java language has always required that the underlying platform (or the VM) provides a means of performing atomic load and store of 64-bit values, for volatile long and double support.

Since Java 5 the java.util.concurrent.atomic package introduced APIs that provide a range of atomic operations, the most fundamental being a compare-and-swap (CAS), also known as a compare-exchange, out of which other atomic operations can be constructed if there is no direct platform support. This capability was later extended to the VarHandle API as well.

While all platforms needed a mechanism for 64-bit load and store, not all platforms support a 64-bit CAS, internally known a cmpxchg8. To address that the supports_cx8 flag was introduced so that on platforms without cmpxchg8 native support, it could be emulated via other techniques e.g. locking. (Note this is not without its own issues as all accesses to the field must be done in a way that is consistent with the use of locking by cmpxchg8 - word-tearing is a real risk).

Internal to the VM we also have use of lock-free algorithms and atomic operations, with the latter defined via atomic.hpp. Originally in that code we needed to check supports_cx8 for platforms without 64-bit support, but in practice we tended to avoid using 64-bit fields in such cases so we could avoid the complexity of introducing lock-based emulation.

Unfortunately, when the atomic interface in the VM was templatized and redesigned, it appears that the fact cmpxchg8 may not be available was overlooked and supports_cx8 is not consulted. Consequently if someone introduced an atomic operation on a 64-bit field they would get a linkage error on platforms without cmpxchg8 - so again if this happened we tended to back away from using a 64-bit field.

Along the way the access API in the VM was introduced, which also provided atomic ops on oops and did consult supports_cx8 with a lock-based fallback.

We have now reached a point where there are cases where we do want 64-bit atomic operations but we don't want the complexity of dealing with platforms that don't support it. So we want to require that supports_cx8 always be assumed true (the VM could abort at runtime if run on a platform where it is not true) and we can then proceed with 64-bit atomics in the VM and also remove all the lock-based fallbacks in the access API and in the Java APIs.

The OpenJDK has limited support for 32-bit platforms these days: PPC32 was dropped a long time ago; Windows 32-bit is now a deprecated port (but supports cmpxchg8 anyway); leaving only ARM32 as a platform of potential concern. But even then we support cmpxchg8 in all known modern implementations, as described in os_cpu/linux_arm/atomic_linux_arm.hpp:

/*
* Atomic long operations on 32-bit ARM
* ARM v7 supports LDREXD/STREXD synchronization instructions so no problem.
* ARM < v7 does not have explicit 64 atomic load/store capability.
* However, gcc emits LDRD/STRD instructions on v5te and LDM/STM on v5t
* when loading/storing 64 bits.
* For non-MP machines (which is all we support for ARM < v7)
* under current Linux distros these instructions appear atomic.
* See section A3.5.3 of ARM Architecture Reference Manual for ARM v7.
* Also, for cmpxchg64, if ARM < v7 we check for cmpxchg64 support in the
* Linux kernel using _kuser_helper_version. See entry-armv.S in the Linux
* kernel source or kernel_user_helpers.txt in Linux Doc.
*/

So the practical reality is that we do not expect to encounter any mainstream OpenJDK platform where we don't in fact have support for cmpxchg8.

Comments

Changeset: c75c3887 Author: David Holmes <dholmes@openjdk.org> Date: 2023-11-23 22:23:42 +0000 URL: https://git.openjdk.org/jdk/commit/c75c38871ee7b5c9f7f0c195d649c16967f786bb
23-11-2023
A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/16625 Date: 2023-11-13 04:38:35 +0000
13-11-2023
I think so, once I have a few cycles. See JDK-8319777; I'll leave that one unassigned in case others would like to jump on it.
09-11-2023
[~shade] are you able to take care of that? I have no way to build and/or test a 32-bit zero configuration.
08-11-2023
I believe just defining supports_cx8() for Zero would be enough, as we are supposed to have all the fallbacks from libatomic already in place. And I think they already work, as there is plenty of code that does not check for supports_cx8().
08-11-2023
> for the user code that actually checks supports_cx8(), Zero does not go to 64-bit atomics Okay so this needs to be fixed in Zero first. Is it as simple as just defining supports_cx8 as true always, or do you think actual code changes may be needed?
08-11-2023
I am looking into Zero, because if we look at compatibility questions, Zero is the last resort for all platforms. That's the thing, I don't see where Zero sets VM_Version::supports_cx8 = true. I don't see any mention of in in vm_version_zero.cpp, for example. So for 32-bit Zero supports_cx8() = false; 64-bit Zero would get supports_cx8() = true via SUPPORTS_NATIVE_CX8 macro. So nominally we want to figure out what happens for 32-bit Zero if we require supports_cx8() = true. And in that regard, I think we are mostly fine: Atomic::PlatformCmpxchg<8> in atomic_linux_zero.hpp is defined with __atomic_compare_exchange, which should work. So I guess that means two things: a) for the user code that actually checks supports_cx8(), Zero does not go to 64-bit atomics; b) for the VM code that just goes for 64-bit atomics, everything still works. Which makes the argument for removing supports_cx8() even stronger, plus gives us another case -- in addition to ARMv6 -- we want to check. The caveat is with e.g. Atomic::PlatformXchg<8> in atomic_linux_zero.hpp that is defined with __sync_lock_test_and_set built-in, which, AFAICS from GCC docs is not guaranteed to work out of the box for 64-bit values, and would require libatomic as the fallback. So what I was trying to figure out if we either need to switch __sync* built-ins to __atomic* built-ins, or make sure we link libatomic for Zero consistently. And now I remembered that we actually _do_ link libatomic for this: https://github.com/openjdk/jdk/blob/abad0408e8317b43c2cd5bc3d324ff199aa289f5/make/autoconf/libraries.m4#L97-L110. So this is a non-issue currently.
27-10-2023
Zero defines SUPPORTS_NATIVE_CX8 for 64-bit platforms (cpu/zero/globalDefinitions_zero.hpp) but will otherwise presumably rely on the vm_version initialization to make that decision. I don't fully understand what you are saying about Zero, if it needs changes, then it is already broken for any platforms where supports_cx8 would not be true. ???
27-10-2023
I agree it is time to assume that architectures support 64-bit CAS out of the box. Another datapoint: the GCC _atomic_* built-ins support 1,2,4,8-bytes ops out of box. The only optional thing there seems to be 16-bytes ops. https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html. Zero VM uses the legacy GCC _sync_* built-ins. They are specified to generate a warning and generate an external call to _n method. AFAIU, those methods would be handled by libatomic then. We used to have libatomic dependency for RISC-V, but removed it after implementing more `Atomic` fallbacks. So it would probably be prudent to migrate _sync_ -> _atomic_* built-ins for Zero before we switch. Related, now that I am looking at it, I don't see where Zero sets supports_cx8 = true, should probably be done with _atomic_* migration. https://gcc.gnu.org/onlinedocs/gcc/_005f_005fsync-Builtins.html
26-10-2023
JDK-8315149 is an example where the 64-bit atomic operation is desired.
26-10-2023