JDK-8321069 : JvmtiThreadState::state_for_while_locked() returns nullptr for an attached JNI thread with a java.lang.Thread object after JDK-8319935
  • Type: Bug
  • Component: hotspot
  • Sub-Component: jvmti
  • Affected Version: 22
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • OS: os_x
  • CPU: x86_64,aarch64
  • Submitted: 2023-11-30
  • Updated: 2023-12-11
  • Resolved: 2023-12-04
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 22
22 b27Fixed
Related Reports
Relates :  
Description
Failure happens with an internal stress test.

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/System/Volumes/Data/mesos/work_dir/slaves/0db9c48f-6638-40d0-9a4b-bd9cc7533eb8-S29378/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/a638f2ca-785f-4750-b275-1ab5ebbcc980/runs/8423732b-6dd3-4839-bcbd-4913acb4d193/workspace/open/src/hotspot/share/prims/jvmtiExport.cpp:2966), pid=6067, tid=259
#  guarantee(state != nullptr) failed: exiting thread called setup_jvmti_thread_state
#
# JRE version: Java(TM) SE Runtime Environment (22.0+26) (fastdebug build 22-ea+26-2112)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 22-ea+26-2112, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, bsd-aarch64)
# Core dump will be written. Default location: core.6067
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
#


---------------  T H R E A D  ---------------

Current thread (0x00000001269d3010):  JavaThread "AppKit Thread"        [_thread_in_vm, id=259, stack(0x000000016f450000,0x000000016fc4c000) (8176K)]

Stack: [0x000000016f450000,0x000000016fc4c000],  sp=0x000000016fc47140,  free space=8156k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.dylib+0x1121d3c]  VMError::report_and_die(int, char const*, char const*, char*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long)+0x564  (jvmtiExport.cpp:2966)
V  [libjvm.dylib+0x112255c]  VMError::report_and_die(Thread*, unsigned int, unsigned char*, void*, void*)+0x0
V  [libjvm.dylib+0x559588]  print_error_for_unit_test(char const*, char const*, char*)+0x0
V  [libjvm.dylib+0xbb6248]  JvmtiEventCollector::setup_jvmti_thread_state()+0x130
V  [libjvm.dylib+0xbb6ba4]  JvmtiVMObjectAllocEventCollector::JvmtiVMObjectAllocEventCollector()+0x48
V  [libjvm.dylib+0x9e5564]  JVM_GetStackAccessControlContext+0x170
j  java.security.AccessController.getStackAccessControlContext()Ljava/security/AccessControlContext;+0 java.base@22-ea
j  java.security.AccessController.getContext()Ljava/security/AccessControlContext;+0 java.base@22-ea
j  java.lang.Thread.<init>(Ljava/lang/ThreadGroup;Ljava/lang/String;ILjava/lang/Runnable;JLjava/security/AccessControlContext;)V+226 java.base@22-ea
j  java.lang.Thread.<init>(Ljava/lang/ThreadGroup;Ljava/lang/String;)V+10 java.base@22-ea
v  ~StubRoutines::call_stub 0x00000001133c4190
V  [libjvm.dylib+0x87bbac]  JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, JavaThread*)+0x420
V  [libjvm.dylib+0x87ad78]  JavaCalls::call_special(JavaValue*, Klass*, Symbol*, Symbol*, JavaCallArguments*, JavaThread*)+0x118
V  [libjvm.dylib+0x87b05c]  JavaCalls::call_special(JavaValue*, Handle, Klass*, Symbol*, Symbol*, Handle, Handle, JavaThread*)+0x7c
V  [libjvm.dylib+0x8afa80]  JavaThread::allocate_threadObj(Handle, char const*, bool, JavaThread*)+0x39c
V  [libjvm.dylib+0x98cad0]  attach_current_thread(JavaVM_*, void**, void*, bool)+0x43c
C  [libosxapp.dylib+0x4ce4]  +[ThreadUtilities getJNIEnv]+0xc4
C  [libawt_lwawt.dylib+0x63fe0]  setUpAWTAppKit+0xe4
C  [libawt_lwawt.dylib+0x63d2c]  +[AWTStarter starter:headless:]+0x54
C  [libosxapp.dylib+0x4dc8]  +[ThreadUtilities invokeBlockCopy:]+0x1c
C  [Foundation+0x84914]  __NSThreadPerformPerform+0xd4
C  [CoreFoundation+0x84c5c]  __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__+0x1c
C  [CoreFoundation+0x84ba8]  __CFRunLoopDoSource0+0xd0
C  [CoreFoundation+0x84894]  __CFRunLoopDoSources0+0x10c
C  [CoreFoundation+0x83208]  __CFRunLoopRun+0x334
C  [CoreFoundation+0x82734]  CFRunLoopRunSpecific+0x258
C  [libjli.dylib+0xcdb0]  CreateExecutionEnvironment+0x194
C  [libjli.dylib+0x8dfc]  JLI_Launch+0x498
C  [java+0x3bbc]  main+0x184

The failing logic seems to think the thread is exiting when in fact it is attaching.
Comments
The fix for this bug is integrated in jdk-22+27-2189.
05-12-2023

Changeset: 30b5d427 Author: Daniel D. Daugherty <dcubed@openjdk.org> Date: 2023-12-04 20:53:46 +0000 URL: https://git.openjdk.org/jdk/commit/30b5d427350d03ec8b9eb39fbf06fbd1b1f66cd8
04-12-2023

Mach5 testing: Mach5 Tier1: - no failures Mach5 Tier2: - no failures Mach5 Tier3: - 1 unknown, unrelated test failure - com/sun/jdi/EATests.java#id0 failed an assert - Filed JDK-8321216 SerialGC fails "assert(_whole_heap.contains(p)) failed: Attempt to access p = 0x0000000100000000 out of bounds of card marking array's _whole_heap = [0x00000000f0000000,0x0000000100000000)" Mach5 Tier4: - no failures Mach5 Tier5: - 1 unknown, unrelated test failure: - javax/management/remote/mandatory/connection/DefaultAgentFilterTest.java failed with "ConnectException: Connection refused" Mach5 Tier6: - 2 known, unrelated test failures: - 2 X com/sun/jdi/EATests.java#id0 failed an assert - See JDK-8321216 SerialGC fails "assert(_whole_heap.contains(p)) failed: Attempt to access p = 0x0000000100000000 out of bounds of card marking array's _whole_heap = [0x00000000f0000000,0x0000000100000000)" Mach5 Tier7: - no failures Mach5 Tier8: - no failures.
04-12-2023

deleted erroneous comment
04-12-2023

A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/16934 Date: 2023-12-02 17:15:57 +0000
02-12-2023

In the fix for the following bug: JDK-8319935 Ensure only one JvmtiThreadState is created for one JavaThread associated with attached native thread JvmtiThreadState::state_for_while_locked() was changed to return nullptr for attaching JNI threads regardless of whether that JNI thread/JavaThread had a java.lang.Thread object.
02-12-2023

We have 25 sightings so far in Tier5 and Tier8. 23 on macosx-aarch64 and 2 on macosx-x64.
01-12-2023

I've been able to reproduce this failure mode on my M1 MacMini using release and fastdebug bits. My slowdebug bits run is still running. Update: I can reproduce a slowdebug bits failure also. I'm also going to try reproducing this on my X64 MBP13 since my M1 MacMini is currently rather swamped doing stress testing on the jdk-22+26 snapshot... Update: I've also been able to reproduce this failure mode on my X64 MBP13.
01-12-2023

I think the proper check might be this: $ diff src/hotspot/share/prims/jvmtiThreadState.inline.hpp.{orig,new} 85c85,87 < (thread->is_exiting() || thread->is_attaching_via_jni())) { --- > (thread->is_exiting() || > (thread->threadObj() == nullptr && thread->is_attaching_via_jni()) > )) { Only filter out a JavaThread that's attaching via JNI if it has no java.lang.Thread object.
01-12-2023

[~dholmes] also wrote this above: > I think that now we filter out attaching threads elsewhere, we can reach here > without a state object and so we assume we are exiting. And I think that this bug: JDK-8319935 Ensure only one JvmtiThreadState is created for one JavaThread associated with attached native thread modified the code that David is alluding to: src/hotspot/share/prims/jvmtiThreadState.inline.hpp: inline JvmtiThreadState* JvmtiThreadState::state_for_while_locked(JavaThread *thread, oop thread_oop) { <snip> JvmtiThreadState *state = thread == nullptr ? nullptr : thread->jvmti_thread_state(); if (state == nullptr && thread != nullptr && (thread->is_exiting() || thread->is_attaching_via_jni())) { // Don't add a JvmtiThreadState to a thread that is exiting or is attaching. // When a thread is attaching, it may not have a Java level thread object // created yet. return nullptr; } Here's the JavaThread from my slowdebug repro of the crash: (lldb) print *(JavaThread*)0x000000013686a210 (JavaThread) $1 = { Thread = { ThreadShadow = { _pending_exception = nullptr _exception_file = 0x0000000000000000 _exception_line = 0 } _nmethod_disarmed_guard_value = 1 _gc_data = { [0] = 0 [1] = 0 [2] = 17433981653976477952 [3] = 0 [4] = 0 [5] = 105553127731408 [6] = 17433981653976478193 [7] = 17433981653976478193 [8] = 17433981653976478193 [9] = 17433981653976478193 [10] = 17433981653976478193 [11] = 17433981653976478193 [12] = 17433981653976478193 [13] = 17433981653976478193 [14] = 17433981653976478193 [15] = 17433981653976478193 [16] = 17433981653976478193 [17] = 17433981653976478193 [18] = 17433981653976478193 [19] = 17433981653976478193 [20] = 17433981653976478193 [21] = 17433981653976478193 [22] = 17433981653976478193 [23] = 17433981653976478193 [24] = 17433981653976478193 [25] = 17433981653976478193 [26] = 17433981653976478193 [27] = 17433981653976478193 [28] = 17433981653976478193 [29] = 17433981653976478193 [30] = 17433981653976478193 [31] = 17433981653976478193 [32] = 17433981653976478193 [33] = 17433981653976478193 [34] = 17433981653976478193 [35] = 17433981653976478193 [36] = 17433981653976478193 [37] = 17433981653976478193 [38] = 17433981653976478193 [39] = 17433981653976478193 [40] = 17433981653976478193 [41] = 17433981653976478193 [42] = 17433981653976478193 } _threads_hazard_ptr = nullptr _threads_list_ptr = nullptr _nested_threads_hazard_ptr_cnt = 0 _suspendible_thread = false _indirectly_suspendible_thread = false _last_handle_mark = 0x000000016b1b8448 _threads_do_token = 0 _rcu_counter = 0 _missed_ic_stub_refill_verifier = nullptr _skip_gcalot = false _tlab = { _start = 0x0000000620619a50 _top = 0x0000000620619b38 _pf_top = 0x0000000620619ac8 _end = 0x0000000620620400 _allocation_end = 0x0000000620620400 _desired_size = 3495 _refill_waste_limit = 54 _allocated_before_last_gc = 0 _bytes_since_last_sample_point = 26936 _number_of_refills = 1 _refill_waste = 0 _gc_waste = 0 _slow_allocations = 0 _allocated_size = 3510 _allocation_fraction = (_average = 0.0666618347, _sample_count = 1, _weight = 35, _is_old = false, _last_sample = 0.0666618347) } _allocated_bytes = 0 _heap_sampler = (_bytes_until_sample = 27774) _statistical_info = (_start_time_stamp = 849267553345083, _define_class_count = 0) _jfr_thread_local = { _java_event_writer = nullptr _java_buffer = nullptr _native_buffer = nullptr _shelved_buffer = nullptr _load_barrier_buffer_epoch_0 = nullptr _load_barrier_buffer_epoch_1 = nullptr _checkpoint_buffer_epoch_0 = nullptr _checkpoint_buffer_epoch_1 = nullptr _stackframes = nullptr _dcmd_arena = nullptr _thread = { _ptr = nullptr } _vthread_id = 0 _jvm_thread_id = 0 _thread_id_alias = 18446744073709551615 _data_lost = 0 _stack_trace_id = 18446744073709551615 _stack_trace_hash = 0 _parent_trace_id = 0 _last_allocated_bytes = 0 _user_time = 0 _cpu_time = 0 _wallclock_time = 849267553345458 _stackdepth = 0 _entering_suspend_flag = 0 _critical_section = 0 _vthread_epoch = 0 _vthread_excluded = false _jvm_thread_excluded = false _vthread = false _notified = false _dead = false } _current_pending_raw_monitor = nullptr _run_state = PRE_CALL_RUN _unhandled_oops = 0xf1f1f1f1f1f1f1f1 _osthread = 0x00006000038e7340 _resource_area = 0x000060000336cc40 _current_resource_mark = 0x000000016b1b7fd0 _handle_area = 0x00006000036b9e10 _metadata_handles = 0x000060000336cc90 _stack_base = 0x000000016b1bc000 "" _stack_size = 67092480 _lgrp_id = -1 _owned_locks = nullptr _jvmti_env_iteration_count = 0 _ParkEvent = 0x0000000113f12e00 _hashStateW = 273326509 _hashStateX = 1323932221 _hashStateY = 842502087 _hashStateZ = 34663 _wx_init = true _wx_state = WXWrite _in_asgct = false _vm_error_callbacks = nullptr } _on_thread_list = true _threadObj = { _obj = 0x0000000133e14c20 } _vthread = { _obj = 0x0000000133e14c28 } _jvmti_vthread = { _obj = 0x0000000133e14c30 } _scopedValueCache = { _obj = 0x0000000133e14c38 } _java_call_counter = 1 _anchor = { _last_Java_sp = 0x000000016b1b80b0 _last_Java_pc = 0x000000014753fa8c "\xac\x83^\xf8\x88\x82" _last_Java_fp = 0x000000016b1b8100 } _entry_point = 0x0000000000000000 _jni_environment = { functions = 0x0000000109858bf8 } _deopt_mark = nullptr _deopt_nmethod = nullptr _vframe_array_head = nullptr _vframe_array_last = nullptr _jvmti_deferred_updates = nullptr _callee_target = 0x00007ff000488328 _vm_result = nullptr _vm_result_2 = nullptr _deferred_card_mark = { _start = nullptr _word_size = 0 } _current_pending_monitor = nullptr _current_pending_monitor_is_from_java = true _current_waiting_monitor = nullptr _active_handles = 0x0000000113f13510 _free_handle_block = nullptr _Stalled = 0 _monitor_chunks = nullptr _suspend_flags = 0 _thread_state = _thread_in_vm _poll_data = (_polling_word = 18446744073709551614, _polling_page = 4376641536) _safepoint_state = 0x0000600000aef110 _saved_exception_pc = 0x0000000000000000 _requires_cross_modify_fence = false _no_safepoint_count = 0 _visited_for_critical_count = 0 _terminated = _not_terminated _in_deopt_handler = 0 _doing_unsafe_access = false _do_not_unlock_if_synchronized = false _carrier_thread_suspended = false _is_in_VTMS_transition = false _is_in_tmp_VTMS_transition = false _is_VTMS_transition_disabler = false _jni_attach_state = _attaching_via_jni _pending_deoptimization = -1 _pending_monitorenter = false _pending_transfer_to_interpreter = false _in_retryable_allocation = false _pending_failed_speculation = 0 _jvmci = (_implicit_exception_pc = 0x0000000000000000, _alternate_call_target = 0x0000000000000000) _libjvmci_runtime = nullptr _jvmci_counters = 0x0000000000000000 _jvmci_reserved0 = 0 _jvmci_reserved1 = 0 _jvmci_reserved_oop0 = nullptr _stack_overflow_state = { _stack_guard_state = stack_guard_enabled _stack_overflow_limit = 0x00000001671dc000 "" _reserved_stack_activation = 0x000000016b1bc000 "" _shadow_zone_safe_limit = 0x00000001671e8000 "" _shadow_zone_growth_watermark = 0x000000016b1b7f80 "" _stack_base = 0x000000016b1bc000 "" _stack_end = 0x00000001671c0000 "" } _exception_oop = nullptr _exception_pc = 0x0000000000000000 _exception_handler_pc = 0x0000000000000000 _is_method_handle_return = 0 _jni_active_critical = 0 _pending_jni_exception_check_fn = 0x0000000000000000 _depth_first_number = 0 _popframe_condition = 0 _frames_to_pop_failed_realloc = 0 _cont_entry = nullptr _cont_fastpath = 0x000000016b1b8130 _cont_fastpath_thread_state = 1 _held_monitor_count = 0 _jni_monitor_count = 0 _stack_watermarks = { _head = nullptr } _handshake = { _handshakee = 0x000000013686a210 _queue = { _first = nullptr } _lock = { Mutex = { _owner = nullptr _lock = { PlatformMutex = { _impl = 0x00006000036b9e70{...} } _impl = 0x000060000336cd30 } _name = 0x0000600001f27c40 "HandshakeState_lock" _allow_vm_block = true _rank = nosafepoint _next = 0xf1f1f1f1f1f1f1f1 _last_owner = 0xf1f1f1f1f1f1f1f1 _skip_rank_check = false } } _active_handshaker = nullptr _async_exceptions_blocked = false _suspended = false _async_suspend_handshake = false } _popframe_preserved_args = 0x0000000000000000 _popframe_preserved_args_size = 0 _jvmti_thread_state = nullptr _interp_only_mode = 0 _should_post_on_exceptions_flag = 0 _thread_stat = 0x0000000113f12f70 _parker = { PlatformParker = { _counter = 0 _cur_index = -1 _mutex = { [0] = (__sig = 1297437786, __opaque = char[56] @ 0x0000000107317860) } _cond = { [0] = (__sig = 1129270852, __opaque = char[40] @ 0x00000001073178a0) [1] = (__sig = 1129270852, __opaque = char[40] @ 0x00000001073178d0) } } } _class_to_be_initialized = nullptr _SleepEvent = 0x0000000113f13100 _lock_stack = { _top = 1808 _base = { [0] = nullptr [1] = nullptr [2] = nullptr [3] = nullptr [4] = nullptr [5] = nullptr [6] = nullptr [7] = nullptr } } } The interesting fields are: _on_thread_list = true _threadObj = { _obj = 0x0000000133e14c20 } _thread_state = _thread_in_vm _terminated = _not_terminated _jni_attach_state = _attaching_via_jni _jvmti_thread_state = nullptr The specific change is: - if (state == nullptr && thread != nullptr && thread->is_exiting()) { - // Don't add a JvmtiThreadState to a thread that is exiting. + if (state == nullptr && thread != nullptr && + (thread->is_exiting() || thread->is_attaching_via_jni())) { + // Don't add a JvmtiThreadState to a thread that is exiting or is attaching. + // When a thread is attaching, it may not have a Java level thread object + // created yet. return nullptr; } Based on the JavaThread dump above, our JavaThread "AppKit Thread" is caught by this new check: "|| thread->is_attaching_via_jni()" However, our JavaThread "AppKit Thread" does have a Java level thread object: _threadObj = { _obj = 0x0000000133e14c20 } So I'm thinking that original check is incomplete...
01-12-2023

I tracked back the guarantee() call to some very, very old code that I wrote: $ git blame -r be4245e814cc29701cc425d8e66854e36eb3aef0 src/hotspot/share/prims/jvmtiExport.cpp | grep "exiting thread called setup_jvmti_thread_state" Blaming lines: 100% (3277/3277), done. 56fe18e3756a hotspot/src/share/vm/prims/jvmtiExport.cpp (Daniel D. Daugherty 2009-03-02 14:00:23 -0700 3043) guarantee(state != NULL, "exiting thread called setup_jvmti_thread_state"); $ git log 56fe18e3756a^! commit 56fe18e3756a08f3067025024d3e4df1dac75bb7 Author: Daniel D. Daugherty <dcubed@openjdk.org> Date: Mon Mar 2 14:00:23 2009 -0700 6800721: 3/4 JavaThread::jvmti_thread_state() and JvmtiThreadState::state_for() robustness Check for NULL return values from jvmti_thread_state() and state_for() and return a JVM TI error code as appropriate. Reviewed-by: coleenp, swamyv
01-12-2023

This is the interesting part of the stack trace: V [libjvm.dylib+0x622080] print_error_for_unit_test(char const*, char const*, char*)+0x0 V [libjvm.dylib+0xcff860] JvmtiEventCollector::setup_jvmti_thread_state()+0x70 V [libjvm.dylib+0xd001dc] JvmtiVMObjectAllocEventCollector::JvmtiVMObjectAllocEventCollector()+0x5c V [libjvm.dylib+0xd0021c] JvmtiVMObjectAllocEventCollector::JvmtiVMObjectAllocEventCollector()+0x1c V [libjvm.dylib+0xb5977c] JVM_GetStackAccessControlContext+0xb4 Here's the code snippets in call order: src/hotspot/share/prims/jvm.cpp: JVM_ENTRY(jobject, JVM_GetStackAccessControlContext(JNIEnv *env, jclass cls)) if (!UsePrivilegedStack) return nullptr; ResourceMark rm(THREAD); GrowableArray<Handle>* local_array = new GrowableArray<Handle>(12); JvmtiVMObjectAllocEventCollector oam; src/hotspot/share/prims/jvmtiExport.cpp: // Setup current thread to record vm allocated objects. JvmtiVMObjectAllocEventCollector::JvmtiVMObjectAllocEventCollector() { if (JvmtiExport::should_post_vm_object_alloc()) { _enable = true; setup_jvmti_thread_state(); src/hotspot/share/prims/jvmtiExport.cpp: // Setup current current thread for event collection. void JvmtiEventCollector::setup_jvmti_thread_state() { // set this event collector to be the current one. JvmtiThreadState* state = JvmtiThreadState::state_for(JavaThread::current()); // state can only be null if the current thread is exiting which // should not happen since we're trying to configure for event collection guarantee(state != nullptr, "exiting thread called setup_jvmti_thread_state"); So in my M1 MacMini slowdebug repro, the JavaThread "AppKit Thread" made a call to JVM_GetStackAccessControlContext() which creates an JvmtiVMObjectAllocEventCollector object which tries to call setup_jvmti_thread_state() but that doesn't work because the JavaThread "AppKit Thread" doesn't have a JvmtiThreadState yet. So my repro matches up with the crash endpoint that [~dholmes] documented above.
01-12-2023

Here is the failing code: // Setup current current thread for event collection. void JvmtiEventCollector::setup_jvmti_thread_state() { // set this event collector to be the current one. JvmtiThreadState* state = JvmtiThreadState::state_for(JavaThread::current()); // state can only be null if the current thread is exiting which // should not happen since we're trying to configure for event collection guarantee(state != nullptr, "exiting thread called setup_jvmti_thread_state"); I think that now we filter out attaching threads elsewhere, we can reach here without a state object and so we assume we are exiting.
30-11-2023