JDK-8288139 : JavaThread touches oop after GC barrier is detached
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 11,17,18,19
  • Priority: P2
  • Status: Closed
  • Resolution: Fixed
  • Submitted: 2022-06-09
  • Updated: 2023-09-12
  • Resolved: 2022-06-21
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 19 JDK 20
19 b28Fixed 20Fixed
Related Reports
Blocks :  
Duplicate :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Sub Tasks
JDK-8288497 :  
Description
Similar to JDK-8286830, JavaThread touches oop after GC barrier is detached. This time in SharedRuntime::get_java_tid() call, where it tries to resolve JavaThread::_threadObj.

#21 OopHandle::resolve (this=0x7f2828ee4ee0) at /home/zhengyu/ws/jdk/src/hotspot/share/oops/oopHandle.inline.hpp:34
#22 JavaThread::threadObj (this=this@entry=0x7f27bcc75240) at /home/zhengyu/ws/jdk/src/hotspot/share/runtime/thread.cpp:796
#23 0x00007f2850b38e3c in SharedRuntime::get_java_tid (thread=0x7f27bcc75240)
    at /home/zhengyu/ws/jdk/src/hotspot/share/runtime/thread.hpp:1582
#24 0x00007f2850e207c5 in ThreadsSMRSupport::remove_thread (thread=thread@entry=0x7f27bcc75240)
    at /home/zhengyu/ws/jdk/src/hotspot/share/runtime/threadSMR.cpp:1005
#25 0x00007f2850e0ac1c in Threads::remove (p=p@entry=0x7f27bcc75240, is_daemon=is_daemon@entry=false)
    at /home/zhengyu/ws/jdk/src/hotspot/share/runtime/thread.cpp:3605
#26 0x00007f2850e13937 in JavaThread::exit (this=this@entry=0x7f27bcc75240, destroy_vm=destroy_vm@entry=false, 
    exit_type=exit_type@entry=JavaThread::normal_exit) at /home/zhengyu/ws/jdk/src/hotspot/share/runtime/thread.cpp:1540
#27 0x00007f2850e13f9b in JavaThread::post_run (this=0x7f27bcc75240) at /home/zhengyu/ws/jdk/src/hotspot/share/runtime/thread.cpp:1336
#28 0x00007f2850e132a1 in Thread::call_run (this=this@entry=0x7f27bcc75240)
    at /home/zhengyu/ws/jdk/src/hotspot/share/runtime/thread.cpp:370
#29 0x00007f28509baf3c in thread_native_entry (thread=0x7f27bcc75240) at /home/zhengyu/ws/jdk/src/hotspot/os/linux/os_linux.cpp:706
#30 0x00007f2851690609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#31 0x00007f28517ec133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Comments
A pull request was submitted for review. URL: https://git.openjdk.org/jdk17u-dev/pull/1515 Date: 2023-06-29 17:07:21 +0000
29-06-2023

Thank you Dan.
07-07-2022

[~mseledtsov]- No flags necessary. In addition to moving the code that was incorrectly accessing the oop after the GC barrier was detached, I also adding a guarantee() that would fire if the bug still existed. So all you have to do is run the runtime/Thread/ThreadObjAccessAtExit.java test. If it doesn't fail, then the bug no longer exists.
07-07-2022

Changeset: a1449886 Author: Daniel D. Daugherty <dcubed@openjdk.org> Date: 2022-06-21 16:21:03 +0000 URL: https://git.openjdk.org/jdk19/commit/a1449886004b2f0a70f1413bb19ce3ba5c914fdf
21-06-2022

A pull request was submitted for review. URL: https://git.openjdk.org/jdk19/pull/21 Date: 2022-06-15 16:06:08 +0000
15-06-2022

I'm playing around with adding code to catch this condition without Shenandoah and I've found that runtime/Thread/ThreadObjAccessAtExit.java is the perfect test for getting into this code.
13-06-2022

Okay, SATB barrier can not catch this problem, as there is only a read, no overwrite, so it will not trigger SATB barrier. I discovered the problem with iu_barrier, which it uses SATB queue but behaves differently.
13-06-2022

The barrier code is very GC specific. Obviously, Shenandoah (I believe GC1 as well) can potential catch this error, but only in marking phase with SATB barrier. I am also puzzled why it did not show up earlier - probably due to race windows: the java thread has to exit during marking phase, its SATB queue flushed and not yet inactivated.
13-06-2022

I thought the barrier code was already supposed to catch this kind of error - but obviously not! A fix will be awkward ... need to extract the tid while still safe and then pass it down to where we need it.
13-06-2022

Ouch! How did we miss that for so long? void ThreadsSMRSupport::remove_thread(JavaThread *thread) { if (ThreadIdTable::is_initialized()) { jlong tid = SharedRuntime::get_java_tid(thread); ThreadIdTable::remove_thread(tid); } We need the tid to remove the thread from the table, but we are already past the point where we can access threadObj.
13-06-2022

[~zgu] - No problem. I have an idea for how to add some sanity checking for this type of problem. I'll keep you posted...
10-06-2022

[~dcubed] No, with current code base. I was experimenting a fix for JDK-8288129, that triggered assertion failed similar to JDK-8286830.
10-06-2022

[~zgu] - Thanks for the info! That qualifies for my learn something new everyday token! Do you have a particular test case that exposes this failure mode?
10-06-2022

[~dcubed] jdk11u codebase does have GC barriers [1], ZGC [2] and Shenandoah [3] all have implementations. [1] https://github.com/openjdk/jdk11u-dev/blob/master/src/hotspot/share/gc/shared/barrierSet.hpp#L129 [2] https://github.com/openjdk/jdk11u-dev/blob/master/src/hotspot/share/gc/z/zBarrierSet.cpp#L83 [3] https://github.com/openjdk/jdk11u-dev/blob/master/src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.cpp#L145
10-06-2022

[~zgu] - Unless "GC Barriers" have been backported to '11', I don't think that this issue can affect JDK11u...
10-06-2022

src/hotspot/share/runtime/thread.cpp: void Threads::remove(JavaThread* p, bool is_daemon) { <snip> // BarrierSet state must be destroyed after the last thread transition // before the thread terminates. Thread transitions result in calls to // StackWatermarkSet::on_safepoint(), which performs GC processing, // requiring the GC state to be alive. BarrierSet::barrier_set()->on_thread_detach(p); // !! GC barrier detached here !! assert(ThreadsSMRSupport::get_java_thread_list()->includes(p), "p must be present"); // Maintain fast thread list ThreadsSMRSupport::remove_thread(p); // !! This function calls SharedRuntime::get_java_tid(thread) !!
10-06-2022

The GC barrier code that we're running afoul of was integrated via: JDK-8253180 ZGC: Implementation of JEP 376: ZGC: Concurrent Thread-Stack Processing which was integrated in JDK16-B20. So it looks like our original testing for JDK-8253180 somehow missed this case of accessing an oop after the GC barrier was detached.
10-06-2022

The offending code is rather old: 230b5768d786 (Daniel D. Daugherty 2017-12-06 15:19:30 -0500 1002) void ThreadsSMRSupport::remove_thread(JavaThread *thread) { 17dd7dc38c60 (David Holmes 2020-05-13 22:29:54 -0400 1003) 6ccf3351d7ee (Daniil Titov 2019-09-25 11:10:05 -0700 1004) if (ThreadIdTable::is_initialized()) { 6ccf3351d7ee (Daniil Titov 2019-09-25 11:10:05 -0700 1005) jlong tid = SharedRuntime::get_java_tid(thread); 6ccf3351d7ee (Daniil Titov 2019-09-25 11:10:05 -0700 1006) ThreadIdTable::remove_thread(tid); 6ccf3351d7ee (Daniil Titov 2019-09-25 11:10:05 -0700 1007) } and was added by this changeset: $ git log -r 6ccf3351d7ee^! commit 6ccf3351d7eef4b6a2ef8b33e4173416cfdcefd5 Author: Daniil Titov <dtitov@openjdk.org> Date: Wed Sep 25 11:10:05 2019 -0700 8185005: Improve performance of ThreadMXBean.getThreadInfo(long ids[], int maxDepth) Reviewed-by: sspitsyn, dholmes, dcubed, rehn The fix for JDK-8185005 was integrated in JDK14-B16 and was backported to JDK13u and JDK11u.
10-06-2022