Bug ID: JDK-8186787 clang-4.0 SIGSEGV in Unsafe

JDK 10	Other	Other
10 b36Fixed	openjdk8u462Fixed	openjdk8u462,shenandoah8u462Fixed

[jdk8u-fix-request] Approval Request from zzambers Fix to unsafe to avoid tier1 failures (crashes) on Mac, small change, low risk, GHA tests OK
26-02-2025
A pull request was submitted for review. Branch: master URL: https://git.openjdk.org/jdk8u-dev/pull/553 Date: 2024-07-25 15:00:49 +0000
25-07-2024
URL: http://hg.openjdk.java.net/jdk/jdk/rev/9289fcb41aae User: jwilhelm Date: 2017-12-07 09:00:22 +0000
07-12-2017
URL: http://hg.openjdk.java.net/jdk/hs/rev/9289fcb41aae User: eosterlund Date: 2017-11-30 20:08:02 +0000
30-11-2017
OK, TIL a little about std::atomic_signal_fence. All programs are always "racing" with any defined signal handlers, but here any reorderings are due to the compiler, not the hardware. I see that std::atomic_signal_fence has a memory_order argument to fine-tune what reorderings are permitted, but in practice we're probably happy with just the strongest form of compiler barrier since no actual CPU instructions are emitted.
28-11-2017
[~kbarrett] I consider the compiler_barrier a platform-dependent mechanism that I introduced that may or may not be used by OrderAccess to implement intended semantics on a given platform. The semantics of the various different compiler intrinsics are substandard (as opposed to std::atomic_signal_fence()). Therefore, I would not like to expose it if volatile has standardized semantics that fit the bill in this case. [~martin] In this case I do not see that the problem is even that there is a race necessarily. If the compiler re-orders the involved accesses, it will crash with a single thread if you use unsafe to access unmapped memory. And we definitely should not crash then with or without a race due to incorrect compiler reordering.
28-11-2017
I'm not following the hotspot engineering details, but it looks like you're on your way to fixing it right. I still think that Java-level relaxed atomics, whether via VarHandle or Unsafe, should not result in a native crash in case of a "data race". It's OK if the hotspot implementation ends us using non-portable (perhaps compiler-specific) code to avert the native crash - do what you need to do (although I continue to like migration to C++11 atomics).
28-11-2017
We already have atomic_signal_fence (or could trivially have it) on every supported platform, just under a different name. It's called "compiler_barrier" in many of our platform-specific orderAccess files. (Note that Visual Studio has had atomic_signal_fence since VS2012.)
27-11-2017
Needless to say, we do not have std::atomic_signal_fence(). And for a single thread, the volatile reordering rules are perfectly predicatable. The three accesses (start crash protection, perform the access, stop crash protection) are all sequence points when made volatile, meaning that the accesses will not be reordered by the compiler, and hence the access is safely guarded w.r.t. the signal handler.
27-11-2017
While I think using MO_VOLATILE here will avoid the problem, I don't think that's the best solution. To me, that looks no better than the original volatile cast approach; the only difference is we now have a better understanding of what's going wrong, so know why adding strategic volatile keywords can dodge the problem. I think better is to have the guarded scope object ensure supposedly guarded operations within its scope can't escape. Note that this is exactly the case for which C++11 std::atomic_signal_fence() exists.
21-11-2017
There is a decorator for this. By making the accesses in MemoryAccess MO_VOLATILE, we ensure that they are not reordered w.r.t. _thread->set_doing_unsafe_access(true/false) (which is volatile). Will fix.
21-11-2017
I was thinking these were the quick-access "leaf" functions, but they aren't. The intrinsic forms are used for performance. And of course the intrinsic forms must also be correct wrt. barriers and ordering.
21-11-2017
UNSAFE_ENTRY is a JVM_ENTRY and can safepoint. How performance critical are these functions?
21-11-2017
I believe Martin's reasoning is as follows: - Java allows safe data races - Java data races can be implemented by Unsafe using C++ statements - Use of C++ with data races is undefined ----------------------------------------------------- -> Unsafe must use C++ atomic (as they are the only operations with defined semantics when involved in data races), or at a minimum "sprinkling of volatile" But this is not reasonable. We use C++ with data races and hardware level atomic accesses and memory barriers all through the VM. If this is enacting "C++ undefined behaviour" then we just have to live with that. Unsafe is not special in that regard - though as Kim notes ensuring we maintain ordering with the guard is important. Use of a compiler barrier seems reasonable - communication about the access is within the same thread. Hardware barriers would completely kill the performance that these direct unsafe accesses are providing.
20-11-2017
I think the problem here is in GuardUnsafeAccess, or perhaps its calls to _thread->set_doing_unsafe_access. JavaThread's underlying _doing_unsafe_access member is declared volatile, but that doesn't help at all if the intended to be guarded code is not itself in some way ordered with respect to those volatile assignments. The proposed sprinkling of volatile works around this problem by providing such an ordering. But that would need to be done in every context where GuardUnsafeAccess is used. Better, and clearer, would be to add some kind of memory barriers to GuardUnsafeAccess; since it is guarding a scoped region, ensure operations in that region can't leak out of the region due to optimizations. I suspect a compiler barrier at the beginning of GuardUnsafeAccess() and at the beginning of ~GuardUnsafeAccess() would suffice. Unfortunately, OrderAccess doesn't provide just a compiler barrier, even though nearly every platform (except maybe zero?) has such a thing. The reason I think a compiler barrier would suffice is that either the signal handler will be run on the same stack as the bad access, or the transfer of the signal to some other stack will involve enough implicit memory ordering that stronger explicit barriers are not needed. I don't think the recent changes to use the Access class here affect this problem. I don't think Access introduces volatile where it wasn't already present, at least not in general. If a compiler barrier can't be had, or the signal transfer requires stronger explicit barriers, I think what's then needed is OrderAccess::fence() at the end of the guard constructor, and a release_store for the member assignment in the guard destructor.
20-11-2017
I disagree about the requirements for java uses of unsafe; incorrect uses of unsafe (and a racy use of the non-volatile operations is incorrect) can cause arbitrary havoc. Is unsafe (still?) accessible to untrusted code? But that's beside the point, since the problem here isn't racy java code at all, as explained in my earlier comment.
20-11-2017
I see: +// MO_UNORDERED is equivalent to JMM plain. +// * MO_UNORDERED (Default): No guarantees. +// - The compiler and hardware are free to reorder aggressively. And they will. BUT: you can't just allow the native C++ compiler to do absolutely anything with these, since racy java must be translated to non-racy (or at least non-crashy) C++. In unsafe.cpp, translate either to atomic with memory_order_relaxed or to use of C++ volatile.
20-11-2017
[~eosterlund] I'm assigning this back to you to look at to see if your changes for the Access API removed the undefined behaviour that clang optimized away (although I'm really unclear why this would crash). You replaced the put* code with: template <typename T> void put(T x) { if (oopDesc::is_null(_obj)) { GuardUnsafeAccess guard(_thread); RawAccess<>::store((T*)addr(), normalize_for_write(x)); } else { HeapAccess<>::store_at(_obj, _offset, normalize_for_write(x)); } } But it's not volatile_put_at().
20-11-2017
Kim: In my comment of 2017-08-25 15:47 I try to explain that C++ is allowed to crash in case of a data race, but Java code is not, so plain memory accesses in java code should be translated into non-plain C++ accesses. C++ 11 relaxed atomic would fit, but without that the only translation choice involves C++ volatile. So sprinkling volatile in unsafe.cpp seems justified.
14-11-2017
If the failure has something to do with race conditions (which isn't obvious from any of the discussion so far), then I submit the fault is with the caller, who should be calling Unsafe_PutByteVolatile. I'm not seeing anything in that code that looks any more suspicious than lots of other code elsewhere in Hotspot. Casting from oop to address (i.e. unsigned char ), add a byte offset, cast to T (in this case it's a byte) and dereference all seems like pretty run of the mill Hotspot code. If the compiler is going off into the weeds with that, and is inhibited from doing so by volatile littering, it seems we should have much wider problems than this report indicates. I would be disinclined to accept such a volatile littering patch without a much more thorough understanding of what is going wrong. Unfortunately, Oracle only tests a fairly narrow set of configurations. I could create a setup where I can look into this, but that won't happen quickly.
14-11-2017
Our internal testing is limited to official build tools. We rely on bleeding-edge users (some internal, most external) to encounter, report, sometimes fix, and test, issues relating to other tools.
14-11-2017
I think It's not the best use of Erik or my engineering effort to test only this patch with clang. (If the code is rewritten, it's probably fine.) Instead, I keep hoping Oracle's regular testing, especially for hotspot, includes various clang releases, especially recent ones, as a target, since gcc and clang are both important competing compilers and openjdk should work well with both.
14-11-2017
Erik's code completely replaces the code that is involved here. Whether he can test with clang-4.0 is a different matter. ~martin: can you take Erik's patch and test with clang-4.0? Assuming there is no issue I would close this as a dup of JDK-8189871.
14-11-2017
Reassiging to Erik because there are changes to unsafe in the webrev. Please check this and let us know if your code does fix this. Thanks.
14-11-2017
If this is being fixed soonish with another change, it is reasonable to close this as a dup of that or simply "will not fix". I currently have scripts that apply the easy workaround BUILD_LIBJVM_unsafe.cpp_OPTIMIZATION=NONE
14-11-2017
I think this issue will disappear with the new Access API being put in under JDK-8189871. See http://cr.openjdk.java.net/~eosterlund/8189871/webrev.06/
14-11-2017
My hotspot clang fixes are not my own highest priority at the moment. I was hoping hotspot team would take ownership.
14-11-2017
[~martin], will you be able to complete this change for jdk10 before the deadline (approx. 01-Dec-2017)? Otherwise, please move this issue to jdk11. Thanks.
13-11-2017
We do not support, as yet, the various "modes" of the C++ atomic APIs - and if/when we do then they would be in our Atomic API. To date only the "cmpxchg" functions support taking a "memory order" parameter, but the default is the existing "conservative" mode that is even stronger than C++ seq_cst. The initial attempt at introducing use of a relaxed cmpxchg into the code demonstrated very well how hard it is to do so correctly (the attempt failed and was withdrawn). When you refer to C++ "undefined behaviour" I'm looking for some specific C++ rule that is being violated. If you're simply saying that race conditions lead to "undefined behaviour" then ... okay. We're never going to fully eliminate race conditions from our C++ code - lock-free algorithms are race conditions. Luckily most plain Java read/writes are implemented with direct machine code not C++ reads/writes. Not that I understand what you are trying to get at with this - how should an API like Unsafe perform a read/write without using a C++ read/write ??? Overly aggressive compiler optimizations have often caused crashes - we have a whole bunch of special optimization level directives in the build to address that very issue. It may be worthwhile trying to determine why clang is causing a problem with this code, and then whether that is actually a legitimate thing for clang to be doing. Regardless if the volatile fixes it then I'd go with that fix rather than tweaking optimization levels. Beyond that though I'm not at all sure what you are looking for here ??
30-08-2017
>> What private "relaxed" orderAccess load/store methods are you referring to? All of load, store, ordered_load, ordered_store are private. I keep looking for equivalents to all of the C++11 atomic<> modes (except "consume"!). >> I'm not clear what the reference to undefined behaviour in C++ is? These Unsafe calls are as racy as regular Java field accesses. Java is supposed to maintain integrity (no SEGV) even in the face of racy user code, so a plain java read or write should perhaps not be translated into a plain C++ read or write, because that could cause a java race to trigger C++ undefined behavior and crash, as seems to be happening here with clang.
30-08-2017
I'm not really following all the points being made here. If clang-4 over optimises the referenced code and volatile fixes it then fix it. I'm not clear what the reference to undefined behaviour in C++ is? These Unsafe calls are as racy as regular Java field accesses. ?? What private "relaxed" orderAccess load/store methods are you referring to? The ordered_store and ordered_load template helpers? We don't have an OrderAccess API at present for arbitrary load/store memory semantics - and arguably they would be included in ther Atomic API to ensure both atomic and ordered access. FWIW we only guarantee the integrity of Unsafe calls when called via trusted JDK APis.
29-08-2017
We agree that relaxed atomics are very hard for humans to program, although some cases like String.hash are easy. Nevertheless, Java guarantees integrity of the virtual machine itself even in the presence of racy user code, and that includes Unsafe accesses. So we would like to translate racy Unsafe calls into code that does not trigger undefined behavior in C++. Ideal seems C++ "relaxed" atomic<>, but since we don't have that, adding volatile seems a reasonable thing to do. Especially given that the body of these C++ functions is not performance critical, as you say. I'd prefer a call to an OrderAccess method, but load and store are private.
28-08-2017
We can't move to C++11 atomics - they simply are not a suitable drop-in replacement. We can try to make things more compatible with C++11 atomics and there is ongoing work with orderAccess to do this. That said there are are very few people who really understand when "relaxed" accesses can be used. Also in regard to unsafe implementation the intrinsic forms are supposed to do the "real" operation while the C++ version is typically a simple, conservative form.
28-08-2017
I don't know what clang's optimizer is doing to cause these crashes, BUT it seems justified because we are translating java Unsafe calls, which are surely used in a racy way by java code, into naked loads and stores, giving undefined behavior in C++. It seems these calls should get translated into C++11 atomic<> relaxed operations, aka. "opaque" access, but we don't allow C++11 constructs in hotspot (yet), so the ancient volatile is the closest thing we have; and it works. I expected to find relaxed orderAccess::load and orderAccess::store, and they exist, but they're private! I'd really like to have C++11 atomics or a complete equivalent interface. Why can't we Just Fix orderAccess?
25-08-2017
this patch: --- a/src/share/vm/prims/unsafe.cpp +++ b/src/share/vm/prims/unsafe.cpp @@ -216,7 +216,7 @@ void put(T x) { GuardUnsafeAccess guard(_thread, _obj); - T* p = (T)addr(); + volatile T p = (volatile T)addr(); p = normalize_for_write(x); } makes the crash in Unsafe_PutByte go away, but then I have an Evil Twin crash in Unsafe_GetLong. # V [libjvm.so+0xb2afbd] Unsafe_GetLong(JNIEnv_, _jobject, _jobject, long)+0x10d Alright, we can have an Evil Twin fix for that: --- a/src/share/vm/prims/unsafe.cpp +++ b/src/share/vm/prims/unsafe.cpp @@ -205,7 +205,7 @@ T get() { GuardUnsafeAccess guard(_thread, _obj); - T p = (T)addr(); + volatile T p = (volatile T)addr(); T x = normalize_for_read(p); @@ -216,7 +216,7 @@ void put(T x) { GuardUnsafeAccess guard(_thread, _obj); - T* p = (T)addr(); + volatile T p = (volatile T)addr(); p = normalize_for_write(x); }
25-08-2017
Workaround is to add to make flags: BUILD_LIBJVM_unsafe.cpp_OPTIMIZATION=NONE
25-08-2017

Relates :	JDK-8174050 - Compilation errors with clang-4.0
Relates :	JDK-8189871 - Refactor GC barriers to use declarative semantics
Relates :	JDK-8186780 - clang fastdebug assertion failure in os_linux_x86:os::verify_stack_alignment()