JDK-6440070 : NotifyFramePop throws unexpected error: JVMTI_ERROR_THREAD_NOT_SUSPENDED
  • Type: Bug
  • Component: hotspot
  • Sub-Component: jvmti
  • Affected Version: 5.0,5.0u16,6
  • Priority: P2
  • Status: Closed
  • Resolution: Fixed
  • OS: linux,linux_oracle_6.0
  • CPU: x86
  • Submitted: 2006-06-17
  • Updated: 2012-02-01
  • Resolved: 2006-07-20
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other JDK 6
5.0u29Fixed 6 b92Fixed
Related Reports
Duplicate :  
Relates :  
Relates :  
Description
We got 
  nsk/jvmti/unit/ForceEarlyReturn/earlyretbase

test failure in SPB nightly.
I was also able to reproduce it in SCA.

The test fails under:
 JDK:      JDK6b88 (it also fails at least starting from JDK6b77, didn't check with other)
 platform: linux-amd64
 flags:    -server -Xcomp
 hosts:    vm-v20z-2.sfbay  (2 x AMD Opteron 2.2 GHz, SLES 9.0)
           opteron003.sfbay (2 x AMD Opteron 1.4 GHz, RedHat Linux 4.0AS)

I wasn't able to reproduce it with '-Xmixed' and '-Xint' as well as with -client.
And I was not able to reproduce it on single CPU machine.
So, this makes me think that it could be 'compiler2' bug and not JVMTI, but I am not sure.

The bug looks like it is not related to EarlyReturn functionality, it seems related to
JVMTI functions SuspendThread and NotifyFramePop.

Here is the log:
[2006-06-16T23:16:20.19] earlyretThread (Thread[Tested Thread,5,main]): notifying main thread
[2006-06-16T23:16:23.66] earlyretThread (Thread[Tested Thread,5,main]): inside activeMethod()
[2006-06-16T23:16:23.66] Going to suspend the thread...
[2006-06-16T23:16:23.66] VM option '+UseUnsupportedDeprecatedJVMPI'
[2006-06-16T23:16:23.66] >>>>>>>> Invoke SuspendThread()
[2006-06-16T23:16:23.66] <<<<<<<< SuspendThread() is successfully done
[2006-06-16T23:16:23.66] Forcing early return...
[2006-06-16T23:16:23.66] (NotifyFramePop) unexpected error: JVMTI_ERROR_THREAD_NOT_SUSPENDED (13)
[2006-06-16T23:16:23.66] TEST: failed to force early return
[2006-06-16T23:16:23.66] >>>>>>>> Invoke ResumeThread()
[2006-06-16T23:16:23.66] <<<<<<<< ResumeThread() is successfully done
[2006-06-16T23:16:23.66] TEST FAILED: a tested frame wasn't returned
[2006-06-16T23:16:23.66] Expected result: 1995 - Java Birth Year
[2006-06-16T23:16:23.66] Returned result: 0
[2006-06-16T23:16:23.66] Wrong result returned
[2006-06-16T23:16:23.66] earlyretThread (Thread[Tested Thread,5,main]): exiting...
[2006-06-16T23:16:23.66] # Test level exit status: 97


As you can see NotifyFramePop throws JVMTI_ERROR_THREAD_NOT_SUSPENDED error but
at the same time (according to the log file) SuspendThread() was finished successfully.

The bug is not always reproduced. Steps to reproduce:
1. cp /home/ep155969/bugs/earlyretbase/ep155969.Linux.amd64/earlyretbase/earlyretbase.tlog YOUR_DIR
2. cp /home/ep155969/bugs/earlyretbase/ep155969.Linux.amd64/earlyretbase/reproduce_bug.sh YOUR_DIR
3. cd YOUR_DIR
4. bash reproduce_bug.sh earlyretbase.tlog
Also, a couple of new tests are affected by this bug:
	nsk/jvmti/ForceEarlyReturn/ForceEarlyReturn001
	nsk/jvmti/ForceEarlyReturn/ForceEarlyReturn002

Comments
SUGGESTED FIX Please see the attached 6440070-webrev-cr0.tgz for the proposed fix that was sent out for code review (round 0).
29-06-2006

EVALUATION debug_bits on a failing run is 0x210011. The bits are commented in src/share/vm/runtime/thread.cpp: // set in is_ext_suspend_completed(): // 0x00000001 - routine was entered // 0x00000010 - routine return false at end // 0x00000100 - thread exited (return false) // 0x00000200 - suspend request cancelled (return false) // 0x00000400 - thread suspended (return true) // 0x00001000 - thread is in a suspend equivalent state (return true) // 0x00002000 - thread is native and walkable (return true) // 0x00004000 - thread is native_trans and walkable (needed retry) // // set in wait_for_ext_suspend_completion(): // 0x00010000 - routine was entered // 0x00020000 - suspend request cancelled before loop (return false) // 0x00040000 - thread suspended before loop (return true) // 0x00080000 - suspend request cancelled in loop (return false) // 0x00100000 - thread suspended in loop (return true) // 0x00200000 - suspend not completed during retry loop (return false) 0x210011 decodes as: // 0x00000001 - routine was entered // 0x00000010 - routine return false at end // 0x00010000 - routine was entered // 0x00200000 - suspend not completed during retry loop (return false) This confirms that the original wait_for_ext_suspend_completion() algorithm does not scale well on fast Linux boxes.
28-06-2006

EVALUATION Still waiting for a Linux AMD64 build to complete... Here is the key part of wait_for_ext_suspend_completion(SUSPENDRETRYCOUNT, bits) from src/share/vm/runtime/thread.cpp: for (int i = 0; i < retries; i++) { *bits = reset_bits; // reinit to only track last retry // If the VMThread is trying to synchronize all the threads for a // safepoint, then we block ourself on the Threads_lock. Our grab // of the SR_lock will block us only if it is contended. if (SafepointSynchronize::is_synchronizing()) { Threads_lock->lock(); Threads_lock->unlock(); } os::yield_all(i); // increase yield with each retry { MutexLockerEx ml(SR_lock(), Mutex::_no_safepoint_check_flag); is_suspended = is_ext_suspend_completed(true /* called_by_wait */, bits); // It is possible for the external suspend request to be cancelled // (by a resume) before the actual suspend operation is completed. // Refresh our local copy to see if we still need to wait. pending = is_external_suspend(); } The intention of the 'os::yield_all(i)' call is to get an increasing yield to other threads for each retry attempt. However, for Linux, that is not the way that os::yield() works (src/os/linux/vm/os_linux.cpp): void os::yield_all(int attempts) { // Yields to all threads, including threads with lower priorities // Threads on Linux are all with same priority. The Solaris style // os::yield_all() with nanosleep(1ms) is not necessary. sched_yield(); } So it looks like the original wait_for_ext_suspend_completion() algorithm does not scale well on fast Linux boxes.
27-06-2006

EVALUATION Here is the relevant code from JVM/TI NotifyFramePop() (src/share/vm/prims/jvmtiEnv.cpp): // Threads_lock NOT held, java_thread not protected by lock // java_thread - pre-checked // java_thread - unchecked // depth - pre-checked as non-negative jvmtiError JvmtiEnv::NotifyFramePop(JavaThread* java_thread, jint depth) { ResourceMark rm; uint32_t debug_bits = 0; if (!JvmtiEnv::is_thread_fully_suspended(java_thread, true, &debug_bits)) { return JVMTI_ERROR_THREAD_NOT_SUSPENDED; } The 'true' parameter to is_thread_fully_suspended() means to wait for external suspension to complete (src/share/vm/prims/jvmtiEnvBase.cpp): bool JvmtiEnvBase::is_thread_fully_suspended(JavaThread* thr, bool wait_for_suspend, uint32_t *bits) { // "other" threads require special handling if (thr != JavaThread::current()) { if (wait_for_suspend) { // We are allowed to wait for the external suspend to complete // so give the other thread a chance to get suspended. if (!thr->wait_for_ext_suspend_completion(SUSPENDRETRYCOUNT, bits)) { // didn't make it so let the caller know return false; } } // We aren't allowed to wait for the external suspend to complete // so if the other thread isn't externally suspended we need to // let the caller know. else if (!thr->is_ext_suspend_completed_with_lock(bits)) { return false; } } return true; } The debug_bits for a failed is_thread_fully_suspended call will help figure this out.
27-06-2006