JDK-8313248 : C2: setScopedValueCache intrinsic exposes nullptr pre-values to store barriers
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 21,22
  • Priority: P2
  • Status: Closed
  • Resolution: Fixed
  • Submitted: 2023-07-27
  • Updated: 2024-02-02
  • Resolved: 2023-08-04
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 21 JDK 22
21.0.1Fixed 22 b10Fixed
Related Reports
Duplicate :  
Relates :  
Relates :  
Relates :  
Description
Original found by this test:

$ CONF=linux-x86_64-server-fastdebug make images test TEST=java/lang/ScopedValue/StressStackOverflow.java TEST_VM_OPTS="-XX:+UseShenandoahGC"

STDOUT:
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/home/shipilev/shipilev-jdk/src/hotspot/share/gc/shenandoah/shenandoahMarkBitMap.cpp:127), pid=36594, tid=36748
#  assert(ShenandoahHeap::heap()->is_in(addr)) failed: Trying to access bitmap 0x00007f0d180d17e0 for address 0x0000000000000000 not in the heap.
#
# JRE version: OpenJDK Runtime Environment (22.0) (fastdebug build 22-internal-adhoc.shipilev.shipilev-jdk)
# Java VM: OpenJDK 64-Bit Server VM (fastdebug 22-internal-adhoc.shipilev.shipilev-jdk, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, shenandoah gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0x156c625]  ShenandoahMarkBitMap::check_mark(HeapWordImpl**) const+0x125
#

Stack: [0x00007f53342f4000,0x00007f53343f4000],  sp=0x00007f53343f2650,  free space=1017k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x156c625]  ShenandoahMarkBitMap::check_mark(HeapWordImpl**) const+0x125  (shenandoahMarkBitMap.cpp:127)
V  [libjvm.so+0x158a8e9]  ShenandoahSATBMarkQueueSet::filter(SATBMarkQueue&)+0x5b9  (shenandoahMarkBitMap.inline.hpp:94)
V  [libjvm.so+0x149ac13]  SATBMarkQueueSet::flush_queue(SATBMarkQueue&)+0x13  (satbMarkQueue.cpp:225)
V  [libjvm.so+0xceb146]  HandshakeOperation::do_handshake(JavaThread*)+0x46  (handshake.cpp:326)
V  [libjvm.so+0xcedde0]  VM_HandshakeAllThreads::doit()+0x5e0  (handshake.cpp:662)
V  [libjvm.so+0x17ef99e]  VM_Operation::evaluate()+0x21e  (vmOperations.cpp:71)
V  [libjvm.so+0x181171a]  VMThread::evaluate_operation(VM_Operation*)+0x15a  (vmThread.cpp:281)
V  [libjvm.so+0x18123ee]  VMThread::inner_execute(VM_Operation*)+0x22e  (vmThread.cpp:435)
V  [libjvm.so+0x181272c]  VMThread::loop()+0x6c  (vmThread.cpp:502)
V  [libjvm.so+0x1812862]  VMThread::run()+0x92  (vmThread.cpp:175)
V  [libjvm.so+0x171994f]  Thread::call_run()+0x12f  (thread.cpp:217)
V  [libjvm.so+0x135105e]  thread_native_entry(Thread*)+0x11e  (os_linux.cpp:783)
Comments
Deferral Request (JDK 21) Non-trivial fix that needs some more bake time before being released and will therefore be integrated into JDK 21.0.1.
09-08-2023

A pull request was submitted for review. URL: https://git.openjdk.org/jdk21u/pull/50 Date: 2023-08-09 10:51:18 +0000
09-08-2023

Fix Request (21u) Fixes the important Loom bug, manifests reliably with Shenandoah. Applies cleanly. Original reproducer fails without the fix, passes with it.
09-08-2023

Yes, it should, I am going to do it soon.
09-08-2023

[~shade], should this go into JDK 21(u)?
09-08-2023

Changeset: e8a37b90 Author: Aleksey Shipilev <shade@openjdk.org> Date: 2023-08-04 09:53:20 +0000 URL: https://git.openjdk.org/jdk/commit/e8a37b90db8dca4dc3653970b2d66d2faf8ef452
04-08-2023

ILW = Crash in GC due to unexpected NULL pre_val, reproducible with loom stress test and Shenandoah GC, disable _setScopedValueCache intrinsic = HMM = P2
02-08-2023

Seems to exist since the original integration of JDK-8284161. Started to reproduce with Shenandoah after JDK-8299324.
01-08-2023

G1 SATB barriers do the same thing, AFAICS, so this makes it a generic C2 ScopeValues intrinsic bug.
01-08-2023

The odd access comes from C2 intrinsic for setScopedValueCache. That access stores the oop to off-heap OopHandle in JavaThread, and enters the SATB path after JDK-8299324. There, it reloads the oop from that oopHandle, and somehow misses the nullcheck. -XX:DisableIntrinsic=_setScopedValueCache makes the test pass.
01-08-2023

More investigation. It is useful to replace the SATB queueing code with the direct runtime (slowpath) code. This would catch the nullptrs coming to SATB barrier before they get in queues. ``` __ if_then(pre_val, BoolTest::ne, kit->null()); { // logging buffer is full, call the runtime const TypeFunc *tf = ShenandoahBarrierSetC2::write_ref_field_pre_entry_Type(); __ make_leaf_call(tf, CAST_FROM_FN_PTR(address, ShenandoahRuntime::write_ref_field_pre_entry), "shenandoah_wb_pre", pre_val, tls); } __ end_if(); // (pre_val != nullptr) ``` -XX:+PreserveFramePointer allows a proper stack walk and disassembly when the failure comes from C2 SATB barrier invocation. The call to the runtime below can be seen in hs_err like this: ``` # Internal Error (/home/shipilev/shipilev-jdk/src/hotspot/share/gc/shenandoah/shenandoahRuntime.cpp:47), pid=469, tid=496 # assert(orig != nullptr) failed: should be optimized out Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x157a99d] ShenandoahRuntime::write_ref_field_pre_entry(oopDesc*, JavaThread*)+0x19d (shenandoahRuntime.cpp:47) J 331 c2 java.lang.ScopedValue$Cache.put(Ljava/lang/ScopedValue;Ljava/lang/Object;)V java.base@22-internal (88 bytes) @ 0x00007f0c4c4edadc [0x00007f0c4c4ed640+0x000000000000049c] J 321 c1 StressStackOverflow$DeepRecursion.run()V (326 bytes) @ 0x00007f0c4497846c [0x00007f0c44972d60+0x000000000000570c] J 330 c2 StressStackOverflow.fibonacci_pad1(ILjava/lang/Runnable;)J (34 bytes) @ 0x00007f0c4c4ecdc0 [0x00007f0c4c4ecc00+0x00000000000001c0] J 330 c2 StressStackOverflow.fibonacci_pad1(ILjava/lang/Runnable;)J (34 bytes) @ 0x00007f0c4c4eccbc [0x00007f0c4c4ecc00+0x00000000000000bc] ... ;; B36: # out( B8 ) <- in( B7 ) Freq: 0.000595774 0x00007f0c4c4edac9: mov (%rbx),%rdi ;*invokestatic scopedValueCache {reexecute=0 rethrow=0 return_oop=0} ; - java.lang.ScopedValue::scopedValueCache@0 (line 775) ; - java.lang.ScopedValue$Cache::put@0 (line 905) 0x00007f0c4c4edacc: mov %r15,%rsi 0x00007f0c4c4edacf: movabs $0x7f0c64a28800,%r10 ; <---- call to ShenandoahRuntime::write_ref_field_pre_entry 0x00007f0c4c4edad9: call *%r10 0x00007f0c4c4edadc: nopl 0x0(%rax,%rax,1) ;*invokestatic setScopedValueCache {reexecute=0 rethrow=0 return_oop=0} ; - java.lang.ScopedValue::setScopedValueCache@1 (line 779) ; - java.lang.ScopedValue$Cache::put@18 (line 908) ; {other} 0x00007f0c4c4edae4: jmp 0x00007f0c4c4ed778 ``` `%rdi` is supposed to be the `pre_val` for SATB barrier, and while C2 code emits the null check for it, that null check is nowhere to be seen in the generated code. So that null-check is somehow optimized away.
01-08-2023

A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/15105 Date: 2023-08-01 12:18:22 +0000
01-08-2023

I think I understand. The setter (`setScopedValueCache`) intrinsic passes `val_type` of `_gvn.type(arr)`, which is `narrowoop: java/lang/Object *[int:32] (java/lang/Cloneable,java/io/Serializable):NotNull:exact *`, and thus implies non-nullity. So when Shenandoah's SATB barrier reloads, it folds the null-check, assuming the `pre_val` is not null, due to `val_type`. This passes `nullptr` to SATB queues or slowpath, and we crash in either queue filtering or barrier code that does not expect nullptrs on SATB paths. The getter (`scopedValueCache`) constructs the `objects_type` explicitly to imply the value can be null. I think we should do the same for setter, since it can hide the "getter" from SATB barrier inside of it.
01-08-2023

I am beginning to think that GC barriers wander into red/yellow zone and get wrecked there.
27-07-2023

On Mac, it fails a bit differently: # SIGSEGV (0xb) at pc=0x00000001062e68cc, pid=76453, tid=24323 # # JRE version: OpenJDK Runtime Environment (22.0) (fastdebug build 22-internal-adhoc.shipilev.shipilev-jdk) # Java VM: OpenJDK 64-Bit Server VM (fastdebug 22-internal-adhoc.shipilev.shipilev-jdk, mixed mode, sharing, compressed oops, compressed class ptrs, shenandoah gc, bsd-aarch64) # Problematic frame: # V [libjvm.dylib+0x2368cc] ShenandoahMarkingContext::is_marked_strong(oop) const+0xcc siginfo: si_signo: 11 (SIGSEGV), si_code: 2 (SEGV_ACCERR), si_addr: 0x0000000104715e10 Stack: [0x000000016d8d0000,0x000000016dad3000], sp=0x000000016d8f7c20, free space=159k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.dylib+0x2368cc] ShenandoahMarkingContext::is_marked_strong(oop) const+0xcc V [libjvm.dylib+0x101fea4] ShenandoahSATBMarkQueueFilterFn::operator()(void const*) const+0x94 V [libjvm.dylib+0x101fc84] void SATBMarkQueueSet::apply_filter<ShenandoahSATBMarkQueueFilterFn>(ShenandoahSATBMarkQueueFilterFn, SATBMarkQueue&)+0xac V [libjvm.dylib+0xf361dc] SATBMarkQueueSet::flush_queue(SATBMarkQueue&)+0x20 V [libjvm.dylib+0x7cf10c] HandshakeOperation::do_handshake(JavaThread*)+0x5c V [libjvm.dylib+0x7d0fe0] HandshakeState::process_by_self(bool, bool)+0x2b8 V [libjvm.dylib+0xf33894] SafepointMechanism::process(JavaThread*, bool, bool)+0x80 V [libjvm.dylib+0x3e4190] ThreadStateTransition::transition_from_vm(JavaThread*, JavaThreadState, bool)+0x70 V [libjvm.dylib+0x86da0c] InterpreterRuntime::throw_StackOverflowError(JavaThread*)+0x220 j jdk.internal.vm.ScopedValueContainer.doRun(Ljava/lang/Runnable;)V+0 java.base@22-internal j jdk.internal.vm.ScopedValueContainer.run(Ljava/lang/Runnable;)V+21 java.base@22-internal j java.lang.ScopedValue$Carrier.runWith(Ljava/lang/ScopedValue$Snapshot;Ljava/lang/Runnable;
27-07-2023