JDK-8239895 : assert(_stack_base != 0LL) failed: Sanity check
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 15
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • OS: windows
  • Submitted: 2020-02-25
  • Updated: 2020-04-15
  • Resolved: 2020-04-02
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 15
15 b18Fixed
Related Reports
Duplicate :  
Relates :  
Sub Tasks
JDK-8241043 :  
JDK-8241337 :  
Description
We are seeing this assert fire after the changes from JDK-8238988

---------------  T H R E A D  ---------------

Current thread (0x000000d56ca0c800):  JavaThread "MainThread" [_thread_in_vm, id=14104, stack(0x000000d56ec70000,0x000000d56ed70000)] _threads_hazard_ptr=0x000000d56966a5a0, _nested_threads_hazard_ptr_cnt=0

Stack: [0x000000d56ec70000,0x000000d56ed70000]
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [jvm.dll+0x9ebd61]  os::platform_print_native_stack+0xf1  (os_windows_x86.cpp:369)
V  [jvm.dll+0xbebe0b]  VMError::report+0xf0b  (vmerror.cpp:725)
V  [jvm.dll+0xbed6be]  VMError::report_and_die+0x8ae  (vmerror.cpp:1533)
V  [jvm.dll+0xbeddb4]  VMError::report_and_die+0x64  (vmerror.cpp:1317)
V  [jvm.dll+0x414632]  report_vm_error+0x102  (debug.cpp:264)
V  [jvm.dll+0xb8b38e]  JavaThread::is_lock_owned+0x4e  (thread.cpp:2243)
V  [jvm.dll+0xb8d194]  Threads::owning_thread_from_monitor_owner+0x74  (thread.cpp:4703)
V  [jvm.dll+0x7e7863]  JvmtiEnvBase::get_object_monitor_usage+0x1e3  (jvmtienvbase.cpp:996)
V  [jvm.dll+0x7dea7c]  JvmtiEnv::GetObjectMonitorUsage+0x4c  (jvmtienv.cpp:2847)
V  [jvm.dll+0x79de96]  jvmti_GetObjectMonitorUsage+0x136  (jvmtienter.cpp:4105)
C  [objmonusage003.dll+0x118a]

Test: vmTestbase/nsk/jvmti/GetObjectMonitorUsage/objmonusage003/TestDescription.java
Comments
URL: https://hg.openjdk.java.net/jdk/jdk/rev/6f7477dfb965 User: dholmes Date: 2020-04-02 23:13:58 +0000
02-04-2020

Updated and corrected. Okay I was wrong. I have a very old set of crib notes for the thread startup protocol and the scope of holding the Threads_lock is shown incorrectly - it covers the Thread::start() which is not the case. So while before ThreadSMR the code used to start the ServiceThread (and now the NotificationThread) would have ensured they were not visible before they actually executed their initialization code to set the stack-base etc, regular Java thread that get started would not behave this way. Hence ThreadSMR just makes those threads behave the same as Java threads now. Prior to my changes a call to on_local_stack(0, size) would simply return false - as stacks grow down we immediately reject any non-zero address. So not having _stack_base initialized was perfectly fine and handled correctly. It was wrong to add the assertion. To explain further, prior to JDK-8238988 there were uses of stack_base() which checked it was initialized, and there was a raw use of _stack_base in on_local_stack() that did not need it to be initialized (because it may not be). After JDK-8238988 all uses call is_in_stack_range() which uses stack_base() and so asserts that the stack base is initialized in all cases. We need to restore two versions of the underlying range check so that previous callers of on_local_stack do not hit the assertion.
23-03-2020

> The real fix may be that acquiring a ThreadsListHandle must grab the Threads_lock. Please don't do that. We went through a lot of trouble to make sure that the ThreadsListHandle doesn't do a lock grab.
20-03-2020

Interestingly when we start a java.lang.Thread, the call to Thread::start happens outside the Threads_lock region after we have called Threads::add. So perhaps this bug has been possible all along!
20-03-2020

The temptation may be to simply reorder: Threads::add(thread); Thread::start(thread); but I suspect that may introduce its own problems because now the new thread can start full execution before it is a known thread in the system. The real fix may be that acquiring a ThreadsListHandle must grab the Threads_lock. Or it may be that a thread has to add itself to the ThreadsList.
20-03-2020

I think this may be a consequence of ThreadSMR and the gradual reduction in the use of, and reliance on the Threads_lock. The thread is added and started whilst the Threads_lock is held, which would have once ensured the new thread would not be visible until the handshake was complete, but that is no longer the case.
20-03-2020

Thanks Patricio! There's supposed to be "handshake" between the started thread and startee thread to ensure it is only visible after it has commenced running. The Notification thread was copied from the ServiceThread and so should be good in that regard, but I will check the details.
20-03-2020

It looks like threads are being added to the Threads list (through Thread::add) before they're started (Thread::start), and only when the thread is started does it actually call record_stack_base_and_size to its own _stack_base set?
20-03-2020

Ran a further set of 10 tier6 runs, for only the ThreadMXBean tests, totalling 84 test executions in all - zero failures. I just cannot reproduce with an augmented assertion.
12-03-2020

Failures appear to all be on Windows. This is strange given the code involved should be shared code.
09-03-2020

Spotted in the jdk-15+13-455-tier6 CI job set: vmTestbase/nsk/jvmti/GetObjectMonitorUsage/objmonusage004/TestDescription.java https://mach5.us.oracle.com/mdash/jobs/mach5-one-jdk-15+13-455-tier6-20200229-1028-9084007/results?search=status%3Afailed%20AND%20-state%3Ainvalid https://mach5.us.oracle.com:10060/api/v1/results/mach5-one-jdk-15+13-455-tier6-20200229-1028-9084007-tier6-rt-nmt-jfr-vmTestbase_nsk_jvmti_quick-windows-x64-debug-465-157/log https://mach5.us.oracle.com:10060/api/v1/results/mach5-one-jdk-15+13-455-tier6-20200229-1028-9084007-tier6-rt-nmt-jfr-vmTestbase_nsk_jvmti_quick-windows-x64-debug-465-157/artifacts/hs_err_pid11872.log windows-x64-debug: win2012amd-build-test-3626.s4.javaplatfo1iad.oraclevcn.com Here's the crashing thread's stack: --------------- T H R E A D --------------- Current thread (0x0000002d2143b800): JavaThread "MainThread" [_thread_in_vm, id=11148, stack(0x0000002d23be0000,0x0000002d23ce0000)] _threads_hazard_ptr=0x0000002d21700c70, _nested_threads_hazard_ptr_cnt=0 Stack: [0x0000002d23be0000,0x0000002d23ce0000] Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [jvm.dll+0x9ecdd1] os::platform_print_native_stack+0xf1 (os_windows_x86.cpp:369) V [jvm.dll+0xbed0cb] VMError::report+0xf0b (vmerror.cpp:725) V [jvm.dll+0xbee97e] VMError::report_and_die+0x8ae (vmerror.cpp:1533) V [jvm.dll+0xbef074] VMError::report_and_die+0x64 (vmerror.cpp:1317) V [jvm.dll+0x415292] report_vm_error+0x102 (debug.cpp:264) V [jvm.dll+0xb8c5ee] JavaThread::is_lock_owned+0x4e (thread.cpp:2243) V [jvm.dll+0xb8e3f4] Threads::owning_thread_from_monitor_owner+0x74 (thread.cpp:4703) V [jvm.dll+0x7e86d3] JvmtiEnvBase::get_object_monitor_usage+0x1e3 (jvmtienvbase.cpp:996) V [jvm.dll+0x7df8ec] JvmtiEnv::GetObjectMonitorUsage+0x4c (jvmtienv.cpp:2847) V [jvm.dll+0x79f0e6] jvmti_GetObjectMonitorUsage+0x136 (jvmtienter.cpp:4105) C [objmonusage004.dll+0x118a] Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) j nsk.jvmti.GetObjectMonitorUsage.objmonusage004.check(Ljava/lang/Object;Ljava/lang/Thread;II)V+0 j nsk.jvmti.GetObjectMonitorUsage.objmonusage004.run([Ljava/lang/String;Ljava/io/PrintStream;)I+30 j nsk.jvmti.GetObjectMonitorUsage.objmonusage004.main([Ljava/lang/String;)V+9 v ~StubRoutines::call_stub j jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+0 java.base@15-ea j jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+100 java.base@15-ea j jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+6 java.base@15-ea j java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+59 java.base@15-ea j com.sun.javatest.regtest.agent.MainWrapper$MainThread.run()V+172 j java.lang.Thread.run()V+11 java.base@15-ea v ~StubRoutines::call_stub
01-03-2020

ILW = HLM = P3
25-02-2020

Still can't reproduce.
25-02-2020

Ah! I'd missed what makes this special in tier5 - we are using JFR: -XX:StartFlightRecording=dumponexit=true
25-02-2020

Ran the test 5 times through mach5 with extra debugging added and got zero failures.
25-02-2020

Failure doesn't seem to reproduce locally, on Linux.
25-02-2020

Previously we had: bool on_local_stack(address adr) const { return (_stack_base > adr && adr >= stack_end()); } which checked the range but had no assertions. With the changes we are asserting _stack_base != NULL. Unclear how this can arise as all created or attached threads should have non-NULL stack-base. That said this is asking the question of another thread so we need to see what thread that is.
25-02-2020