JDK-8004903 : VMThread::execute() calls Thread::check_for_valid_safepoint_state() on concurrent VM ops
  • Type: Bug
  • Status: Closed
  • Resolution: Fixed
  • Component: hotspot
  • Sub-Component: runtime
  • Priority: P4
  • Affected Version: hs25
  • OS: generic
  • CPU: generic
  • Submit Date: 2012-12-12
  • Updated Date: 2014-06-26
  • Resolved Date: 2013-01-22
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availabitlity Release.

To download the current JDK release, click here.
7u40Fixed 8Fixed
During Karen's testing of the changes for 6607129, she ran into a crash with the following stack trace snippet:

  [5] VMError::report_and_die(this = 0xe587e654), line 891 in "vmError.cpp"
  [6] report_vm_error(file = 0xfef8c84f "/java/east/u2/kk1971/hssyncperf/src/share/vm/runtime/thread.cpp", line = 761, error_msg = 0xfecef355 "fatal error", detail_msg = 0xfef8c88f "Possible safepoint reached by thread that does not allow it"), line 176 in "debug.cpp"
  [7] report_fatal(file = 0xfef8c84f "/java/east/u2/kk1971/hssyncperf/src/share/vm/runtime/thread.cpp", line = 761, message = 0xfef8c88f "Possible safepoint reached by thread that does not allow it"), line 181 in "debug.cpp"
=>[8] Thread::check_for_valid_safepoint_state(this = 0x832400, potential_vm_operation = true), line 761 in "thread.cpp"
  [9] VMThread::execute(op = 0x80b318), line 537 in "vmThread.cpp"
  [10] InduceScavenge(Self = 0x832400, Whence = 0xfef791b7 "omAlloc"), line 1146 in "synchronizer.cpp"
  [11] ObjectSynchronizer::omAlloc(Self = 0x832400), line 1234 in "synchronizer.cpp"
  [12] ObjectSynchronizer::inflate(Self = 0x832400, object = 0xf8612270), line 1492 in "synchronizer.cpp"
  [13] ObjectSynchronizer::inflate_helper(obj = 0xf8612270), line 1426 in "synchronizer.cpp"
  [14] BasicLock::move_to(this = 0xe587f720, obj = 0xf8612270, dest = 0x71d4e0), line 62 in "basicLock.cpp"
  [15] vframeArrayElement::fill_in(this = 0x71d32c, vf = 0x5e8d20), line 69 in "vframeArray.cpp"
  [16] vframeArray::fill_in(this = 0x71d030, thread = 0x832400, frame_size = 28, chunk = 0x5e8cd8, reg_map = 0xe587f378), line 442 in "vframeArray.cpp"
  [17] vframeArray::allocate(thread = 0x832400, frame_size = 28, chunk = 0x5e8cd8, reg_map = 0xe587f378, sender = CLASS, caller = CLASS, self = CLASS), line 430 in "vframeArray.cpp"
  [18] Deoptimization::create_vframeArray(thread = 0x832400, fr = CLASS, reg_map = 0xe587f378, chunk = 0x5e8cd8), line 938 in "deoptimization.cpp"
  [19] Deoptimization::fetch_unroll_info_helper(thread = 0x832400), line 225 in "deoptimization.cpp"
  [20] Deoptimization::uncommon_trap(thread = 0x832400, trap_request = -34), line 1595 in "deoptimization.cpp"

Analysis of the above stack trace shows that VMThread::execute() is
calling Thread::check_for_valid_safepoint_state() on concurrent VM
ops (Mode == _concurrent || Mode == _async_safepoint). That may
not be a correct check to make.

To reproduce this failure mode run CallTimerGrid with following options: /work/local/jdk/1.8.0/solaris-x64/bin/java \ -server-fast \ -ms256m -mx256m \ -showversion \ -XX:MonitorBound=1 \ -XX:+DeoptimizeALot \ EDU.oswego.cs.dl.util.concurrent.misc.SynchronizationTimer java version "1.8.0-ea" Java(TM) SE Runtime Environment (build 1.8.0-ea-b71) Java HotSpot(TM) Server VM (build 25.0-b14-fastdebug, mixed mode) 0.0 => NoSynchronization 1T 1/4S 8I 0Lm 100TO 0Pm 0Cm 0B 256C 2Xp 4096Ib : 1.1 1.0 => PublicSynchronization 1T 1/4S 8I 0Lm 100TO 0Pm 0Cm 0B 256C 2Xp 4096Ib : 2 .0 2.0 => NestedSynchronization 1T 1/4S 8I 0Lm 100TO 0Pm 0Cm 0B 256C 2Xp 4096Ib : # To suppress the following error report, specify this argument # after -XX: or in .hotspotrc: SuppressErrorAt=/thread.cpp:911 # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/tmp/workspace/jdk8-2-build-solaris-i586-product/jdk8/hotspot /src/share/vm/runtime/thread.cpp:911), pid=18789, tid=14 # fatal error: Possible safepoint reached by thread that does not allow it # # JRE version: Java(TM) SE Runtime Environment (8.0-b71) (build 1.8.0-ea-b71) # Java VM: Java HotSpot(TM) Server VM (25.0-b14-fastdebug mixed mode solaris-x86 ) # Core dump written. Default location: /work/shared/bugs/8004903/CallTimerGrid/c ore or core.18789 # # An error report file with more information is saved as: # /work/shared/bugs/8004903/CallTimerGrid/hs_err_pid18789.log Phoning home... Using server:, port 4711 # # If you would like to submit a bug report, please visit: # http://bugreport.sun.com/bugreport/crash.jsp # Current thread is 14 Dumping core ... run-calltimer-8004903.sh[7]: 18789 Abort(coredump) The "-server-fast" option is so that the fastdebug version of the Server VM can be used with the product bits JDK. The "-XX:MonitorBound=1" option is so that InduceScavenge() is call as soon as possible. The "-XX:+DeoptimizeALot" option gets a deoptimize happening which happened in Karen's original bug sighting. Here's a stack trace snippet from my hs_err_pid file: --------------- T H R E A D --------------- Current thread (0x083a8000): JavaThread "Thread-1" [_thread_in_Java, id=14, sta ck(0xe43af000,0xe43ff000)] Stack: [0xe43af000,0xe43ff000], sp=0xe43fc5e0, free space=309k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x1db4c39] void VMError::report(outputStream*)+0x92d V [libjvm.so+0x1db5ebe] void VMError::report_and_die()+0x57a V [libjvm.so+0xb5816f] void report_fatal(const char*,int,const char*)+0x56f V [libjvm.so+0x1c74266] void Thread::check_for_valid_safepoint_state(bool)+0x1 56 V [libjvm.so+0x1de2ea6] void VMThread::execute(VM_Operation*)+0x1f6 V [libjvm.so+0x1bf2333] ObjectMonitor*ObjectSynchronizer::inflate(Thread*,oop) +0x4af V [libjvm.so+0x1bf1bcf] ObjectMonitor*ObjectSynchronizer::inflate_helper(oop)+ 0xab V [libjvm.so+0x5e978f] void BasicLock::move_to(oop,BasicLock*)+0x53 V [libjvm.so+0x1da3539] void vframeArrayElement::fill_in(compiledVFrame*)+0x3a 9 V [libjvm.so+0x1da5497] vframeArray*vframeArray::allocate(JavaThread*,int,Grow ableArray<compiledVFrame*>*,RegisterMap*,frame,frame,frame)+0x1c7 V [libjvm.so+0xbc003c] vframeArray*Deoptimization::create_vframeArray(JavaThre ad*,frame,RegisterMap*,GrowableArray<compiledVFrame*>*)+0x4a8 V [libjvm.so+0xbb28f4] Deoptimization::UnrollBlock*Deoptimization::fetch_unrol l_info_helper(JavaThread*)+0x1300 V [libjvm.so+0xbb1533] Deoptimization::UnrollBlock*Deoptimization::fetch_unrol l_info(JavaThread*)+0x16b v ~DeoptimizationBlob J java.lang.StringBuffer.append(Ljava/lang/String;)Ljava/lang/StringBuffer; j EDU.oswego.cs.dl.util.concurrent.misc.SynchronizationTimer$OneTest.run()V+720 j EDU.oswego.cs.dl.util.concurrent.PooledExecutor$Worker.run()V+15 j java.lang.Thread.run()V+11 j EDU.oswego.cs.dl.util.concurrent.misc.Threads$MyThread.run()V+8 v ~StubRoutines::call_stub V [libjvm.so+0x1024427] void JavaCalls::call_helper(JavaValue*,methodHandle*,J avaCallArguments*,Thread*)+0x124f V [libjvm.so+0x18be40c] void os::os_exception_wrapper(void(*)(JavaValue*,metho dHandle*,JavaCallArguments*,Thread*),JavaValue*,methodHandle*,JavaCallArguments* ,Thread*)+0x18 V [libjvm.so+0x102319b] void JavaCalls::call(JavaValue*,methodHandle,JavaCallA rguments*,Thread*)+0x43 V [libjvm.so+0x102112b] void JavaCalls::call_virtual(JavaValue*,KlassHandle,Sy mbol*,Symbol*,JavaCallArguments*,Thread*)+0x4d7 V [libjvm.so+0x102172a] void JavaCalls::call_virtual(JavaValue*,Handle,KlassHa ndle,Symbol*,Symbol*,Thread*)+0xa6 V [libjvm.so+0x129c0c8] void thread_entry(JavaThread*,Thread*)+0xb0 V [libjvm.so+0x1c7e539] void JavaThread::thread_main_inner()+0x4e1 V [libjvm.so+0x1c7dd51] void JavaThread::run()+0x845 V [libjvm.so+0x18ad485] java_start+0x1d1 C [libc.so.1+0xa71d0] _thr_setup+0x4e C [libc.so.1+0xa74c0] __moddi3+0x60

See https://jbs.oracle.com/bugs/browse/JDK-8004902 for the webrevs used during Code Review Round [012] for this fix.

Here's a snippet of an e-mail thread with some useful info: Backing upto see what Thread::check_for_valid_safepoint_state(bool potential_vm_operation) is intended to do: - checks to see if there is a No_Safepoint_Verifier object on the stack: (!(_allow_safepoint_count == 0)) - checks to see if the calling JavaThread is in the right state: (is_Java_thread() && ((JavaThread*)this)->thread_state() != _thread_in_vm) - if the caller is a JavaThread and potential_vm_operation is true, then we have to make sure the JavaThread isn't holding any locks that would deadlock the VMThread - handle GCALotAtAllSafepoints All of the above checks make sense for blocking VM operations where Mode == _safepoint. The requesting JavaThread is planning to wait around synchronously until the operation is done. For non-blocking VM operations where (Mode == _concurrent or Mode == _async_safepoint), let's examine/reason about each check: - No_Safepoint_Verifier is meant to guard code where we're doing something that a safepoint can mess up (oop manipulation, etc). Submitting a non-blocking VM operation isn't going to goto a safepoint. The JavaThread submitting the non-blocking VM op is going to return and the non-blocking VM op is going to get handled sometime later. - "calling JavaThread is in the right state (_thread_in_vm)" is meant to make sure that the JavaThread submitting the VM op is in a safepoint-able state. Submitting a non-blocking VM operation isn't going to goto a safepoint so we don't need to make sure that the JavaThread is in state _thread_in_vm. - "JavaThread isn't holding locks that can block the VMThread" is meant to make sure that the blocked JavaThread isn't going to deadlock the VMThread. Submitting a non-blocking VM operation isn't going to block so we don't need to check locks. - When we are on this branch in VMThread::execute(), we create a SkipGCALot helper object so GCALotAtAllSafepoints is already by-passed. OK. It took me a while, but I'm now convinced that non-blocking VM operations don't need to make that check_for_valid_safepoint_state() call. Of course, I still need to move the code from where Karen had to address the other code review comments that I had, but I think we're good to go on this one.

Yes, restricting when the check should be made is the direction that Karen's experimental fix from a couple of years ago went. I'm doing the due diligence necessary to bring the fix forward to HSX-25.

_allow_safepoint_count is manipulated by the gcLocker. So it seems that the thread is marked as not safepoint "safe" but then triggers a safepointing VM operation. It may be that this should only be checked for non-concurrent safepoint ops.