Bug ID: JDK-8028126 nsk/jvmti/scenarios/hotswap/HS101/hs101t006 Crashed the vm on Solaris-sparc64 fastdebug builds: only current thread can flush its registers

JDK 8	Other
8Fixed	hs25Fixed

The fix is ready for integration, but an approval is required.
14-06-2019
Release team: Approved for fixing
03-12-2013
I'm requesting to add the 8-critical-request label to the bug. The suggested fix looks safe as the reviewers already confirmed: src/share/vm/prims/jvmtiEnvThreadState.cpp @@ -267,16 +267,25 @@ } VMOp_Type type() const { return VMOp_GetCurrentLocation; } void doit() { ResourceMark rmark; // _thread != Thread::current() RegisterMap rm(_thread, false); + // There can be a race condition between a VM_Operation reaching a safepoint + // and the target thread exiting from Java execution. + // We must recheck the last Java frame still exists. + if (_thread->has_last_Java_frame()) { javaVFrame* vf = _thread->last_java_vframe(&rm); assert(vf != NULL, "must have last java frame"); Method* method = vf->method(); _method_id = method->jmethod_id(); _bci = vf->bci(); + } else { + // Clear current location as the target thread has no Java frames anymore. + _method_id = (jmethodID)NULL; + _bci = 0; } + } void get_current_location(jmethodID method_id, int bci) { method_id = _method_id; bci = _bci; } }; It is better to fix in JDK 8 because of two reasons: - the bug has a high priority P2 - the issue impacts the stability of the Nightly as other tests may fail intermittently by the same reason This issue must be present on all platforms, not only on the Solaris-sparcv9. Even though this particular assert impacts the Solaris only as it relates to flushing the sparc register window, the race condition itself can be potentially reproduced on any other platform as well. I guess, duplicated issues with different manifestation can be present in the JBS. It is not easy to isolate them, however.
27-11-2013
8-critical-request-justification: This is a crash which can potentially occur on any platform and is regularly reproduced in nightly on solaris-sparcv9. The fix is limited in scope (see above) and has been reviewed.
27-11-2013
David, thanks for the help. There is race here (as in the JDK-6997300). The JvmtiEnvThreadState::reset_current_location() observed there is last Java frame on the target thread. But it is gone at the time when a safepoint is established. void JvmtiEnvThreadState::reset_current_location(jvmtiEvent event_type, bool enabled) { assert(event_type == JVMTI_EVENT_SINGLE_STEP \|\| event_type == JVMTI_EVENT_BREAKPOINT, "must be single-step or breakpoint event"); . . . . . . . if (enabled) { // If enabling breakpoint, no need to reset. // Can't do anything if empty stack. if (event_type == JVMTI_EVENT_SINGLE_STEP && _thread->has_last_Java_frame()) { <<< Here we saw last Java frame on the target thread >>> jmethodID method_id; int bci; // The java thread stack may not be walkable for a running thread // so get current location at safepoint. VM_GetCurrentLocation op(_thread); <<< A race here if the target thread is exiting >>> VMThread::execute(&op); op.get_current_location(&method_id, &bci); set_current_location(method_id, bci); } } else if (event_type == JVMTI_EVENT_SINGLE_STEP \|\| !is_enabled(JVMTI_EVENT_SINGLE_STEP)) { // If this is to disable breakpoint, also check if single-step is not enabled clear_current_location(); } } The VM_GetCurrentLocation::doit() must re-check the thread state before the call _thread->last_java_vframe(). I will check if there is a way to bail out gracefully in this case.
22-11-2013
I think you have hit the nail on the thread, I mean head :) - we have a JavaThread that has no Java frames due to it being in the process of exiting. I suspect threads in this state should not be being queried by the GetCurrentLocation operation - the thread has already set the "is terminating" flag and should be ignored I think.
21-11-2013
It seems, the target thread has the address 0x100638800 (the "this" argument of the JavaThread::last_java_frame(): ffffffff7dd89350 __1cKJavaThreadQlast_java_vframe6MpnLRegisterMap__pnKjavaVFrame__ (100638800, ffffffff06eff128, 100638a10, 8e9a1c, ffffffff7e672d28, 85) + 48 Which is the "Debuggee Thread" from the Java Threads table below: Java Threads: ( => current thread ) 0x0000000100638800 JavaThread "Debuggee Thread" [_thread_blocked, id=14, stack(0xffffffff05600000,0xffffffff05800000)] 0x0000000100627800 JavaThread "JVMTI agent thread" daemon [_thread_blocked, id=13, stack(0xffffffff05900000,0xffffffff05b00000)] 0x00000001004ac800 JavaThread "Service Thread" daemon [_thread_blocked, id=11, stack(0xffffffff05e00000,0xffffffff06000000)] 0x000000010049a000 JavaThread "C1 CompilerThread1" daemon [_thread_blocked, id=10, stack(0xffffffff06100000,0xffffffff06200000)] 0x000000010043e000 JavaThread "C2 CompilerThread0" daemon [_thread_blocked, id=9, stack(0xffffffff06300000,0xffffffff06400000)] 0x000000010043c000 JavaThread "Signal Dispatcher" daemon [_thread_blocked, id=8, stack(0xffffffff06500000,0xffffffff06700000)] 0x00000001003d5000 JavaThread "Finalizer" daemon [_thread_blocked, id=7, stack(0xffffffff06800000,0xffffffff06a00000)] 0x00000001003d2800 JavaThread "Reference Handler" daemon [_thread_blocked, id=6, stack(0xffffffff06b00000,0xffffffff06d00000)] 0x0000000100112000 JavaThread "main" [_thread_in_native, id=2, stack(0xffffffff7af00000,0xffffffff7b100000)] The "Debuggee Thread" stack trace from the pstack dump is: ----------------- lwp# 14 / thread# 14 -------------------- ffffffff7eeda178 ___lwp_cond_wait (10063a750, 10063a738, 1, ffffffff, ffffffff7eebfef0, ffffffffffffffff) + 8 ffffffff7db277a0 __1cCosNPlatformEventEpark6M_v_ (10063a700, 17cc00, 10063a738, 10063a750, ffffffff7e376829, ffffffff7e672d28) + 140 ffffffff7da950d4 __1cHMonitorFILock6MpnGThread__v_ (100110ba8, 100638800, ffffffff7e72d128, 126800, 0, 10043f801) + 184 ffffffff7da96478 __1cHMonitorElock6MpnGThread__v_ (100110ba8, 100638800, 1ffc, ffffffff7e7818f0, ffffffff7e672d28, ffffffff7e72d8a1) + 370 ffffffff7d886ff4 __1cLJvmtiExportOcleanup_thread6FpnKJavaThread__v_ (100638800, 0, 1082f0, ffffffff7e77b018, debd8c, 108000) + 5c ffffffff7dd82c38 __1cKJavaThreadEexit6Mbn0AIExitType__v_ (ffffffff057ff838, 20000000, ffffffff7e465623, 100638800, 1, ffffffff7e672d28) + fa8 ffffffff7dd81a58 __1cKJavaThreadRthread_main_inner6M_v_ (100638800, 10063aca8, 1, ffffffff7e672d28, ffffffff7e72d8a1, ffffffff60104480) + 3a8 ffffffff7dd8162c __1cKJavaThreadDrun6M_v_ (100638800, 0, 100639218, 100638a00, 1, ffffffff7e672d28) + 40c ffffffff7db183b0 java_start (100638800, bb000, 10063aca8, ffffffff7e371ebc, ffffffff7e672d28, 1) + 250 ffffffff7eed6020 _lwp_start (0, 0, 0, 0, 0, 0) As we can see above, the thread is at exit: JavaThread::exit() -> JvmtiExport::cleanup_thread(). This thread does not have Java frames any more at this moment. Not very sure my analysis is correct though. :)
20-11-2013
Right. I suspect, it is another manifestation of the JDK-6997300 or something similar. Note this comment in the make_walkable(): // We always flush in case the profiler wants it but we won't mark // the windows as flushed unless we have a last_Java_frame It can be the case when there is no last_Java_frame. Still looking at it...
20-11-2013
But my point was that the thread _should_ have been walkable if we are at a safepoint - so perhaps we needs to see which thread is causing the problem and what it is doing.
20-11-2013
I thought the same but apparently it is not the case. This is a fragment from the hs_err dump: # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/tmp/jprt/P1/150447.amurillo/s/src/cpu/sparc/vm/frame_sparc.cpp:718), pid=29913, tid=5 # guarantee(Thread::current() == (Thread)thread) failed: only current thread can flush its registers # # JRE version: Java(TM) SE Runtime Environment (8.0-b115) (build 1.8.0-ea-fastdebug-b115) # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.0-b58-internal-201311081504.amurillo.hs25-b58-snapshot-fastdebug mixed mode solaris-sparc compressed oops) # Core dump written. Default location: /export/local/aurora/sandbox/results/ResultDir/hs101t006/core or core.29913 # # If you would like to submit a bug report, please visit: # http://bugreport.sun.com/bugreport/crash.jsp # --------------- T H R E A D --------------- Current thread (0x00000001003c5800): VMThread [stack: 0xffffffff06e00000,0xffffffff06f00000] [id=5] Stack: [0xffffffff06e00000,0xffffffff06f00000], sp=0xffffffff06efea60, free space=1018k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x12379ac] void VMError::report_and_die()+0x714;; __1cHVMErrorOreport_and_die6M_v_+0x714 V [libjvm.so+0x71daf8] void report_vm_error(const char,int,const char,const char)+0x78;; __1cPreport_vm_error6Fpkci11_v_+0x78 V [libjvm.so+0x7e76bc] void JavaFrameAnchor::make_walkable(JavaThread)+0x68;; __1cPJavaFrameAnchorNmake_walkable6MpnKJavaThread__v_+0x68 V [libjvm.so+0x1189358] javaVFrameJavaThread::last_java_vframe(RegisterMap)+0x50;; __1cKJavaThreadQlast_java_vframe6MpnLRegisterMap__pnKjavaVFrame__+0x50 V [libjvm.so+0xc6b3c8] void VM_GetCurrentLocation::doit()+0xd0;; __1cVVM_GetCurrentLocationEdoit6M_v_+0xd0 V [libjvm.so+0x1267a50] void VM_Operation::evaluate()+0xf8;; __1cMVM_OperationIevaluate6M_v_+0xf8 V [libjvm.so+0x1263994] void VMThread::evaluate_operation(VM_Operation)+0x254;; __1cIVMThreadSevaluate_operation6MpnMVM_Operation__v_+0x254 V [libjvm.so+0x1264454] void VMThread::loop()+0x6a4;; __1cIVMThreadEloop6M_v_+0x6a4 V [libjvm.so+0x12633fc] void VMThread::run()+0xe4;; __1cIVMThreadDrun6M_v_+0xe4 V [libjvm.so+0xf183b8] java_start+0x258;; java_start+0x258 VM_Operation (0xffffffff05aff228): GetCurrentLocation, mode: safepoint, requested by thread 0x0000000100627800 This is the the make_walkable() code: void JavaFrameAnchor::make_walkable(JavaThread* thread) { if (walkable()) return; // Eventually make an assert guarantee(Thread::current() == (Thread)thread, "only current thread can flush its registers"); <<< This assert was fired !!! >>> // We always flush in case the profiler wants it but we won't mark // the windows as flushed unless we have a last_Java_frame intptr_t sp = StubRoutines::Sparc::flush_callers_register_windows_func()(); if (last_Java_sp() != NULL ) { capture_last_Java_pc(sp); } } As we can see above, the walkable() returned false for the target thread at safepoint. The code of walkable() is: // Is stack walkable inline bool walkable( void) { return _flags & flushed; }
20-11-2013
I thought all threads had to be walkable when they were at a safepoint?
20-11-2013
This current issue is that we run a VM_Operation to collect the current location at safepoint for the target thread top frame. It is done in the VM_GetCurrentLocation::doit() on the VM thread. But the issue is that target thread can be non-walkable. To make it walkable its registers must be flushed. It can be done only if the target thread is current which is not true because the VM thread is current. I do not see a safe (non-risky) solution yet to fix this issue at this stage of development cycle. This area was always problematic (please, refer to the JDK-6997300 above that has not been fixed yet). I'd suggest to defer this to post JDK 8 release but it depends on how frequently this issue is reproducible. Any other opinions? Please, share your concerns if any. There is still some work to do: - check the repro rate of the bug (need to fix the rerun script for that) - look for possible safe temporary solutions to work around the issue
20-11-2013
The test was in the excluded list for half of year from 2013-04-18 to 2013-11-01 because of the test bug: https://bugs.openjdk.java.net/browse/INTJDK-7603327 Hotswap tests failed with JVMTI_ERROR_INVALID_CLASS_FORMAT I suspect, a regression sneaked into the Hotspot at the time the test was not functional. Please, note, the fix for INTJDK-7603327 was integrated into the SQE testbase build b18 which is not a default build yet. Even in the build b18 the test still has issues because it can not be re-run with both rerun.sh nor hs101t006.tlog: sspitsyn@hsdev-10 sh rerun.sh Error: Could not find or load main class package=nsk sspitsyn@hsdev-10 sh hs101t006.tlog Error: Could not find or load main class package=nsk The failure has intermittent nature, and so, the test must be run in the loop to reproduce the issue. But the re-run functionality is broken for this test and, I suspect, for other HotSwap scenarios tests as well.
20-11-2013
This issue is on a similar topic as the old bug which is still in unresolved state: https://bugs.openjdk.java.net/browse/JDK-6997300 assert(has_last_Java_frame()) failed: must have last_Java_sp() when suspended with JVMTI test
20-11-2013
Serguei can you please took into this? it has high prio. We need to 1) evaluate if this must be fixed for jdk 8 and then in either case motive our decision as justification for either a critical or defer request, 2) fix the issue Thanks, Tuva
15-11-2013
ILW=HLH=P2, Crash, has happened twice, no known workaround
13-11-2013

Duplicate :	JDK-6997300 - assert(has_last_Java_frame()) failed: must have last_Java_sp() when suspended with JVMTI test
Relates :	JDK-8030027 - nsk/jvmti/scenarios/hotswap/HS101/hs101t006 Crashed the vm on Linux-amd64: SIGSEGV in JavaThread::last_java_vframe(RegisterMap*)+0xfa