JDK-8289091 : move oop safety check from SharedRuntime::get_java_tid() to JavaThread::threadObj()
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 17,19
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2022-06-23
  • Updated: 2023-08-30
  • Resolved: 2022-07-05
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 19 JDK 20
19 b30Fixed 20Fixed
Related Reports
Relates :  
Relates :  
Description
In the code review for:
    JDK-8288139 JavaThread touches oop after GC barrier is detached

this topic came up: 

https://github.com/openjdk/jdk19/pull/21#discussion_r898495754

Here's [~dholmes]' comment:

> I think the oop-touching-safety check should be done in
> threadObj() itself so that all callers are protected.

During the review, I agreed that threadObj() was a better placement,
but I didn't want to make that change yet because I thought it might
result in more issues that would need to be investigated. JDK-8288139
is about the failure due to the call to SharedRuntime::get_java_tid() and
I wanted to focus my fix on that exact failure mode.

This bug is for investigating moving the oop safety check from
SharedRuntime::get_java_tid() to JavaThread::threadObj().
Comments
Please note that much of this fix is being backed out by: JDK-8292302 Windows GetLastError value overwritten by ThreadLocalStorage::thread
09-09-2022

Changeset: 30e134e9 Author: Daniel D. Daugherty <dcubed@openjdk.org> Date: 2022-07-05 20:42:42 +0000 URL: https://git.openjdk.org/jdk19/commit/30e134e909c53423acd1ec20c106f4200bc10285
05-07-2022

A pull request was submitted for review. URL: https://git.openjdk.org/jdk19/pull/69 Date: 2022-06-24 19:53:31 +0000
24-06-2022

It's also useful to verify that the revised placement of the oop safety check still catches the original failure mode from "JDK-8288139 JavaThread touches oop after GC barrier is detached". Here's a temporary debugging patch: $ git diff diff --git a/src/hotspot/share/runtime/thread.cpp b/src/hotspot/share/runtime/thread.cpp index fdcfb7fe4b4..ec43fb473a2 100644 --- a/src/hotspot/share/runtime/thread.cpp +++ b/src/hotspot/share/runtime/thread.cpp @@ -3598,12 +3598,14 @@ void Threads::remove(JavaThread* p, bool is_daemon) { // that we do not remove thread without safepoint code notice { MonitorLocker ml(Threads_lock); +if (UseNewCode) { if (ThreadIdTable::is_initialized()) { // This cleanup must be done before the current thread's GC barrier // is detached since we need to touch the threadObj oop. jlong tid = SharedRuntime::get_java_tid(p); ThreadIdTable::remove_thread(tid); } +} // BarrierSet state must be destroyed after the last thread transition // before the thread terminates. Thread transitions result in calls to diff --git a/src/hotspot/share/runtime/threadSMR.cpp b/src/hotspot/share/runtime/threadSMR.cpp index 377fc05c42c..f0fbe53b2a2 100644 --- a/src/hotspot/share/runtime/threadSMR.cpp +++ b/src/hotspot/share/runtime/threadSMR.cpp @@ -1000,6 +1000,12 @@ void ThreadsSMRSupport::release_stable_list_wake_up(bool is_nested) { } void ThreadsSMRSupport::remove_thread(JavaThread *thread) { +if (!UseNewCode) { + if (ThreadIdTable::is_initialized()) { + jlong tid = SharedRuntime::get_java_tid(thread); + ThreadIdTable::remove_thread(tid); + } +} ThreadsList *new_list = ThreadsList::remove_thread(ThreadsSMRSupport::get_java_thread_list(), thread); if (EnableThreadSMRStatistics) { ThreadsSMRSupport::inc_java_thread_list_alloc_cnt(); that restores the errant code that was moved by JDK-8288139. Testing with runtime/Thread/ThreadObjAccessAtExit.java reveals that the errant threadObj() call is caught by release, fastdebug and slowdebug bits: $ do_java_test runtime/Thread/ThreadObjAccessAtExit.java INFO: GNUMAKE=make INFO: GNUMAKE version is: GNU Make 3.81 INFO: JTREG options: INFO: JOBS=4 INFO: TEST_MODE=agentvm INFO: EXTRA_PROBLEM_LISTS=ProblemList-extra.txt INFO: VM_OPTIONS= INFO: test_val=runtime/Thread/ThreadObjAccessAtExit.java Test Config: macosx-x86_64-normal-server-release INFO: TIMEOUT_FACTOR=4 Done testing Test Run macosx-x86_64-normal-server-release time: 1.01 minutes. TEST TOTAL PASS FAIL ERROR jtreg:open/test/hotspot/jtreg/runtime/Thread/ThreadObjAccessAtExit.java >> 1 0 1 0 << 1 failure(s) found in log=do_java_test.macosx-x86_64-normal-server-release.log TEST: runtime/Thread/ThreadObjAccessAtExit.java LOG: build/macosx-x86_64-normal-server-release/test-support/jtreg_open_test_hotspot_jtreg_runtime_Thread_ThreadObjAccessAtExit_java/runtime/Thread/ThreadObjAccessAtExit.jtr Saving build/macosx-x86_64-normal-server-release/test-support/jtreg_open_test_hotspot_jtreg_runtime_Thread_ThreadObjAccessAtExit_java/runtime/Thread/ThreadObjAccessAtExit.jtr as /work/shared/bug_hunt/8288139_threadObj_for_jdk19.git/test_failures.2022-06-23-153746/ThreadObjAccessAtExit.jtr.release WARNING: hs_err_pid file named in the log does not exist: '/System/Volumes/Data/work/shared/bug_hunt/8288139_threadObj_for_jdk19.git/build/macosx-x86_64-normal-server-release/test-support/jtreg_open_test_hotspot_jtreg_runtime_Thread_ThreadObjAccessAtExit_java/scratch/0/hs_err_pid45320.log' WARNING: using found hs_err_pid instead: build/macosx-x86_64-normal-server-release/test-support/jtreg_open_test_hotspot_jtreg_runtime_Thread_ThreadObjAccessAtExit_java/runtime/Thread/ThreadObjAccessAtExit/hs_err_pid45320.log Saving build/macosx-x86_64-normal-server-release/test-support/jtreg_open_test_hotspot_jtreg_runtime_Thread_ThreadObjAccessAtExit_java/runtime/Thread/ThreadObjAccessAtExit/hs_err_pid45320.log as /work/shared/bug_hunt/8288139_threadObj_for_jdk19.git/test_failures.2022-06-23-153746/hs_err_pid45320.log Test Config: macosx-x86_64-normal-server-fastdebug INFO: TIMEOUT_FACTOR=6 Done testing Test Run macosx-x86_64-normal-server-fastdebug time: 1.20 minutes. TEST TOTAL PASS FAIL ERROR jtreg:open/test/hotspot/jtreg/runtime/Thread/ThreadObjAccessAtExit.java >> 1 0 1 0 << 1 failure(s) found in log=do_java_test.macosx-x86_64-normal-server-fastdebug.log TEST: runtime/Thread/ThreadObjAccessAtExit.java LOG: build/macosx-x86_64-normal-server-fastdebug/test-support/jtreg_open_test_hotspot_jtreg_runtime_Thread_ThreadObjAccessAtExit_java/runtime/Thread/ThreadObjAccessAtExit.jtr Saving build/macosx-x86_64-normal-server-fastdebug/test-support/jtreg_open_test_hotspot_jtreg_runtime_Thread_ThreadObjAccessAtExit_java/runtime/Thread/ThreadObjAccessAtExit.jtr as /work/shared/bug_hunt/8288139_threadObj_for_jdk19.git/test_failures.2022-06-23-153746/ThreadObjAccessAtExit.jtr.fastdebug WARNING: hs_err_pid file named in the log does not exist: '/System/Volumes/Data/work/shared/bug_hunt/8288139_threadObj_for_jdk19.git/build/macosx-x86_64-normal-server-fastdebug/test-support/jtreg_open_test_hotspot_jtreg_runtime_Thread_ThreadObjAccessAtExit_java/scratch/0/hs_err_pid45526.log' WARNING: using found hs_err_pid instead: build/macosx-x86_64-normal-server-fastdebug/test-support/jtreg_open_test_hotspot_jtreg_runtime_Thread_ThreadObjAccessAtExit_java/runtime/Thread/ThreadObjAccessAtExit/hs_err_pid45526.log Saving build/macosx-x86_64-normal-server-fastdebug/test-support/jtreg_open_test_hotspot_jtreg_runtime_Thread_ThreadObjAccessAtExit_java/runtime/Thread/ThreadObjAccessAtExit/hs_err_pid45526.log as /work/shared/bug_hunt/8288139_threadObj_for_jdk19.git/test_failures.2022-06-23-153746/hs_err_pid45526.log Test Config: macosx-x86_64-normal-server-slowdebug INFO: TIMEOUT_FACTOR=12 Done testing Test Run macosx-x86_64-normal-server-slowdebug time: 1.91 minutes. TEST TOTAL PASS FAIL ERROR jtreg:open/test/hotspot/jtreg/runtime/Thread/ThreadObjAccessAtExit.java >> 1 0 1 0 << 1 failure(s) found in log=do_java_test.macosx-x86_64-normal-server-slowdebug.log TEST: runtime/Thread/ThreadObjAccessAtExit.java LOG: build/macosx-x86_64-normal-server-slowdebug/test-support/jtreg_open_test_hotspot_jtreg_runtime_Thread_ThreadObjAccessAtExit_java/runtime/Thread/ThreadObjAccessAtExit.jtr Saving build/macosx-x86_64-normal-server-slowdebug/test-support/jtreg_open_test_hotspot_jtreg_runtime_Thread_ThreadObjAccessAtExit_java/runtime/Thread/ThreadObjAccessAtExit.jtr as /work/shared/bug_hunt/8288139_threadObj_for_jdk19.git/test_failures.2022-06-23-153746/ThreadObjAccessAtExit.jtr.slowdebug WARNING: hs_err_pid file named in the log does not exist: '/System/Volumes/Data/work/shared/bug_hunt/8288139_threadObj_for_jdk19.git/build/macosx-x86_64-normal-server-slowdebug/test-support/jtreg_open_test_hotspot_jtreg_runtime_Thread_ThreadObjAccessAtExit_java/scratch/0/hs_err_pid45768.log' WARNING: using found hs_err_pid instead: build/macosx-x86_64-normal-server-slowdebug/test-support/jtreg_open_test_hotspot_jtreg_runtime_Thread_ThreadObjAccessAtExit_java/runtime/Thread/ThreadObjAccessAtExit/hs_err_pid45768.log Saving build/macosx-x86_64-normal-server-slowdebug/test-support/jtreg_open_test_hotspot_jtreg_runtime_Thread_ThreadObjAccessAtExit_java/runtime/Thread/ThreadObjAccessAtExit/hs_err_pid45768.log as /work/shared/bug_hunt/8288139_threadObj_for_jdk19.git/test_failures.2022-06-23-153746/hs_err_pid45768.log Total test time: 4.12 minutes. Here are hs_err_pid snippets from the fastdebug config: # Internal Error (/System/Volumes/Data/work/shared/bug_hunt/8288139_threadObj_for_jdk19.git/open/src/hotspot/share/runtime/thread.cpp:799), pid=45526, tid=25859 # guarantee(current != this || JavaThread::cast(current)->is_oop_safe()) failed: current cannot touch oops after its GC barrier is detached. --------------- T H R E A D --------------- Current thread (0x00007fc2cb032e10): [error occurred during error reporting (printing current thread), id 0xe0000000, Internal Error (/System/Volumes/Data/work/shared/bug_hunt/8288139_threadObj_for_jdk19.git/open/src/hotspot/share/runtime/thread.cpp:799)] Stack: [0x0000700010f77000,0x0000700011077000], sp=0x0000700011075180, free space=1016k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.dylib+0x12e2749] VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long)+0x6e9 V [libjvm.dylib+0x12e2dcb] VMError::report_and_die(Thread*, void*, char const*, int, char const*, char const*, __va_list_tag*)+0x3b V [libjvm.dylib+0x6c047d] report_vm_error(char const*, int, char const*, char const*, ...)+0xdd V [libjvm.dylib+0x122c76d] JavaThread::threadObj() const+0xdd V [libjvm.dylib+0x1233054] JavaThread::get_thread_name_string(char*, int) const+0x24 V [libjvm.dylib+0x12331e7] JavaThread::print_on_error(outputStream*, char*, int) const+0x37 V [libjvm.dylib+0x12df5ef] VMError::report(outputStream*, bool)+0xf2f V [libjvm.dylib+0x12e2749] VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long)+0x6e9 V [libjvm.dylib+0x12e2dcb] VMError::report_and_die(Thread*, void*, char const*, int, char const*, char const*, __va_list_tag*)+0x3b V [libjvm.dylib+0x6c047d] report_vm_error(char const*, int, char const*, char const*, ...)+0xdd V [libjvm.dylib+0x122c76d] JavaThread::threadObj() const+0xdd V [libjvm.dylib+0x10d5ad4] SharedRuntime::get_java_tid(Thread*)+0x64 V [libjvm.dylib+0x12422f9] ThreadsSMRSupport::remove_thread(JavaThread*)+0x29 V [libjvm.dylib+0x123016f] Threads::remove(JavaThread*, bool)+0xff V [libjvm.dylib+0x122ff4a] JavaThread::exit(bool, JavaThread::ExitType)+0xbda V [libjvm.dylib+0x122f2a7] JavaThread::post_run()+0x17 V [libjvm.dylib+0x122be48] Thread::call_run()+0x1f8 Please note that we not only catch the errant use in SharedRuntime::get_java_tid(), but we also catch the errant use in JavaThread::get_thread_name_string() which is called to report the crashing thread's name as part of the hs_err_pid report. I think that this means that if a JavaThread crashes after it has detached the GC barrier, then we'll get a secondary crash when we try to report the primary crash. Update: The errant use of threadObj() in JavaThread::get_thread_name_string() occurs when we're trying to print this part of the hs_err_pid: Java Threads: ( => current thread ) <snip> 0x00007ff8dc024210 JavaThread "GCThread" [_thread_blocked, id=24835, stack(0x000070000a97c000,0x000070000aa7c000)] =>0x00007ff8dc00e210 [error occurred during error reporting (printing all threads), id 0xe0000000, Internal Error (/System/Volumes/Data/work/shared/bug_hunt/8288139_threadObj_for_jdk19.git/open/src/hotspot/share/runtime/thread.cpp:799)] So we crash when we reach the point in the "Java Threads:" section where we find the crashing thread and we try to print info about that thread. The code is using threadObj() to access the crashing thread's name, but that's not safe when the crashing thread is after the point where the GC barrier has been detached.
23-06-2022

Moving the oop safety check logic from SharedRuntime::get_java_tid() to JavaThread::threadObj() is the easy part. The hard part is running all the testing to verify that threadObj() doesn't reveal any more unexpected oop accesses after the thread's GC barrier has been detached.
23-06-2022