JDK-8007476 : assert(the_owner != NULL) failed: Did not find owning Java thread for lock word address
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: hs24
  • Priority: P3
  • Status: Closed
  • Resolution: Fixed
  • OS: linux
  • Submitted: 2013-02-04
  • Updated: 2014-06-26
  • Resolved: 2013-02-28
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 8 Other
8Fixed hs25Fixed
Related Reports
Relates :  
Relates :  
Description
Test cvm/jit/ExceptionHandling/NativeFromCompiled crashed on fastdebug VM with:
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/opt/jprt/T/P1/205951.amurillo/s/hotspot/src/share/vm/runtime/thread.cpp:4255), pid=19136, tid=139790698199376
#  assert(the_owner != NULL) failed: Did not find owning Java thread for lock word address
#
# JRE version: Java(TM) SE Runtime Environment (7.0) (build 1.7.0-internal-fastdebug-201302012059.amurillo.hs24-b32-jdk7u14-b-b00)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (24.0-b32-fastdebug mixed mode linux-amd64 compressed oops)
# Core dump written. Default location: /export/local/164276.JAVASE.PIT.VM.linux-amd64_vm__server_mixed_cvm.jit.testlist.runTests/results/ResultDir/NativeFromCompiled/core or core.19136
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.sun.com/bugreport/crash.jsp
#
Comments
I created a simple deadlocking program (attached as DoDeadlockjava). Here is the interesting part of the deadlock detection output: <begin sample_deadlock_output_regular snippet> Found one Java-level deadlock: ============================= "Worker-1": waiting to lock monitor 0x02f60e54 (object 0x1026cd50, a java.lang.Object), which is held by "Worker-0" "Worker-0": waiting to lock monitor 0x02f601bc (object 0x1026cd58, a java.lang.Object), which is held by "Worker-1" Java stack information for the threads listed above: =================================================== "Worker-1": at WorkerThread.run(DoDeadlock.java:60) - waiting to lock <0x1026cd50> (a java.lang.Object) - locked <0x1026cd58> (a java.lang.Object) "Worker-0": at WorkerThread.run(DoDeadlock.java:53) - waiting to lock <0x1026cd58> (a java.lang.Object) - locked <0x1026cd50> (a java.lang.Object) Found 1 deadlock. <end sample_deadlock_output_regular snippet> Complete "$JAVA_HOME/bin/jstack -J-Dsun.jvm.hotspot.runtime.VM.disableVersionCheck=1 -l " output is attached as sample_deadlock_output_regular. The "found deadlock" part is typical with two threads listed, "Worker-1" blocked by "Worker-0" and "Worker-0" blocked by "Worker-1". I simulated this failure mode by modifiying the VM to return NULL from Threads::owning_thread_from_monitor_owner() and reran the test program along with jstack. Here is the interesting part of the deadlock detection output: <start sample_deadlock_output_UNKNOWN snippet> Found one Java-level deadlock: ============================= "Worker-1": waiting to lock monitor 0x02f60e54 (object 0x1026ce00, a java.lang.Object), which is held by UNKNOWN_owner_addr=0x0352fc54 Java stack information for the threads listed above: =================================================== "Worker-1": at WorkerThread.run(DoDeadlock.java:60) - waiting to lock <0x1026ce00> (a java.lang.Object) - locked <0x1026ce08> (a java.lang.Object) Found one Java-level deadlock: ============================= "Worker-0": waiting to lock monitor 0x02f601bc (object 0x1026ce08, a java.lang.Object), which is held by UNKNOWN_owner_addr=0x0357fbd4 Java stack information for the threads listed above: =================================================== "Worker-0": at WorkerThread.run(DoDeadlock.java:53) - waiting to lock <0x1026ce08> (a java.lang.Object) - locked <0x1026ce00> (a java.lang.Object) Found 2 deadlocks. <end sample_deadlock_output_UNKNOWN snippet> Complete "$JAVA_HOME/bin/jstack -J-Dsun.jvm.hotspot.runtime.VM.disableVersionCheck=1 -l " output is attached as sample_deadlock_output_UNKNOWN. The output is different this time: "Worker-1" is blocked by "UNKNOWN_owner_addr=0x0352fc54" in one deadlock and in the other deadlock, "Worker-0" is blocked by "UNKNOWN_owner_addr=0x0357fbd4". In each "found deadlock" part, only one thread is listed because we can't find the JavaThread that has locked the object. Normally, I don't like to use uppercase, but tagging the info that is available with "UNKNOWN_owner_addr" seems appropriate.
26-02-2013

Pavel, thanks for the flexibility.
26-02-2013

SQE is ok with proposed test plan
26-02-2013

If you plan on playing with the deadlock detection logic then please also see: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2012-November/004855.html "Excess threads reporting on deadlock" and http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2012-November/004861.html "Deadlock detection mechanism incorrectly assumes that thread cannot block itself" Thanks.
26-02-2013

The test failure described by this bug is not relevant for JDK7u14/HSX-24 because the test (cvm/jit/ExceptionHandling/NativeFromCompiled) should not have been executed on JDK7u14/HSX-24 due to https://jbs.oracle.com/bugs/browse/INTJDK-7068347. However, the test failure is interesting from a robustness/diagnostics point of view and I'm planning to use this bug ID to make changes to the deadlock detection diagnostics code in order for the VM to be able to report such issues instead of failing an assert in non-product code or ignoring the issue in product code. To be clear, if the situation that caused the assertion failure happens in product code, then a deadlock can occur and the deadlock detection code will not report the issue as a deadlock. In the current code, if thread T1 is blocked trying to lock obj and the owner of the lock on obj cannot be found (T2), then T1 is not considered to be blocked by T2. However, T1 will never be able to acquire the lock on job since T2 isn't around to unlock it. In my opinion, it would be good if thread T1 was reported as deadlocked. And in this case, it is blocked by a missing/dead thread (T2) so deadlocked seems very appropriate. So my plan is: 1) change the "fix version" from hs24 -> hs25 2) lower the priority from P2 -> P3 3) finish developing my enhancement to the deadlock detection code I'll check with Pavel to see if he agrees with this plan. After the fix bakes for a while in HSX-25, we should consider backporting the fix to HSX-24 as a diagnostics improvement. Also note, the following bug failed in a similar way: 6786879 Race in jvmti GetObjectMonitorUsage could lead to crashes As part of this fix, the exact assertion that fired in 6786879 will be removed, but there is another in the JVM/TI code path that is just as ill advised. My suggested changes will do _nothing_ to address the meta-issue described in 6786879 where threads can exit while they are being examined. My suggested changes will only address the bad assertions.
26-02-2013

My point is that the reason for the assert condition not holding (the owner of a monitor apparently not being in the Threadslist) may not be due to any inherent bug in deadlock detection or monitor management but may due to some other problem induced by the testcase - eg a memory stomp due to native code failing to check for errors after invoking JNI functions.
22-02-2013

David, there's no argument that the test program has issues. In fact it was probably an infra glitch that allowed the test to run since it should have been excluded by INTJDK-7068347. However, in the process of trying to find and report information about potential deadlocks, the VM failed an assert and died. Not a good thing for the VM to do. Especially not in diagnostic code that's supposed to tell us when something might be wrong.
22-02-2013

Dan, The VM should never encounter a monitor with an owner thread that doesn't exist. Seems to me this may be a symptom rather than the actual problem. Looking at the NFC.c code it seems to me that the code is fragile because it has sequences of the form: clazz = (*env)->FindClass(env, "cvm/jit/share/TestException"); mid = (*env)->GetMethodID(env, clazz, "<init>", "()V"); throwObj = (jthrowable) (*env)->NewObject(env, clazz, mid); but there is no checking to see if any exceptions occurred. Not saying that is the case here but if for some reason there is a problem who knows what the failure mode might be.
22-02-2013

Pavel, this is very useful information. I have testbase version 7/r10 mirrored in my Colorado lab which is why I'm using it. My e-mail archive shows that is the latest promoted version of testbase 7 so I don't think testbase version 7/r11 has been promoted/released. I suspect that there was an exclude list glitch in the PIT testing that showed this failure mode. However, that doesn't change the fact that the VM was less than robust when running into an unmatched monitor. Not a good thing and I'll look at the relevant VM code for more info.
21-02-2013

Ran the same test on my Solaris X86 machine for 2.5 hours and it did not finish. Definitely not liking this test at all.
21-02-2013

I let the cvm/jit/ExceptionHandling/NativeFromCompiled test run on my WinXP VMware client using the following JDK8 bits: $JAVA_HOME/bin/java -version java version "1.8.0-ea" Java(TM) SE Runtime Environment (build 1.8.0-ea-b77) Java HotSpot(TM) Client VM (build 25.0-b18, mixed mode) and it did not finish in 2 hours. I'm not getting a good feeling about this test.
20-02-2013

Several questions come to mind: - has this failure mode been seen on JDK8/HSX-25?; I'm searching via the "history" link, but it hasn't come back yet - has this failure mode been seen on non-Linux systems? - I'm not familiar with the cvm/jit VM/NSK subsuite?; I'm hoping it is similar to the other VM/NSK subsuites that I've used in the past The two failures have these configs: Java VM: Java HotSpot(TM) 64-Bit Server VM (24.0-b32-fastdebug mixed mode linux-amd64 compressed oops) CPU:total 16 (4 cores per cpu, 2 threads per core) family 6 model 26 stepping 5, cmov, cx8, fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, ht, tsc, tscinvbit, tscinv Java VM: Java HotSpot(TM) Client VM (24.0-b32-fastdebug mixed mode, sharing linux-x86 ) CPU:total 1 (2 cores per cpu, 1 threads per core) family 6 model 15 stepping 6, cmov, cx8, fxsr, mmx, sse, sse2, sse3, ssse3, tsc so it looks like: - Client vs Server VM doesn't matter - 32-bit vs 64-bit doesn't matter - compressed oops doesn't matter - sharing doesn't matter - has been seen on a dual-core machine so a 16 CPU isn't necessary
13-02-2013

I=H, L=M,W=M
05-02-2013

Thanks for all that detailed info. So basically the test got itself into a state where it was timing out for some reason and when we hit it with jstack the deadlock detection crashed. The gdb file shows that the process was busy doing a GC prior to that that: VM_ParallelGCSystemGC::doit. This must have completed for the thread dump to have been initiated.
05-02-2013

Here's the crashing code: 4221 JavaThread *Threads::owning_thread_from_monitor_owner(address owner, bool doLock) { 4222 assert(doLock || 4223 Threads_lock->owned_by_self() || 4224 SafepointSynchronize::is_at_safepoint(), 4225 "must grab Threads_lock or be at safepoint"); 4226 4227 // NULL owner means not locked so we can skip the search 4228 if (owner == NULL) return NULL; 4229 4230 { 4231 MutexLockerEx ml(doLock ? Threads_lock : NULL); 4232 ALL_JAVA_THREADS(p) { 4233 // first, see if owner is the address of a Java thread 4234 if (owner == (address)p) return p; 4235 } 4236 } 4237 assert(UseHeavyMonitors == false, "Did not find owning Java thread with UseHeavyMonitors enabled"); 4238 if (UseHeavyMonitors) return NULL; 4239 4240 // 4241 // If we didn't find a matching Java thread and we didn't force use of 4242 // heavyweight monitors, then the owner is the stack address of the 4243 // Lock Word in the owning Java thread's stack. 4244 // 4245 JavaThread* the_owner = NULL; 4246 { 4247 MutexLockerEx ml(doLock ? Threads_lock : NULL); 4248 ALL_JAVA_THREADS(q) { 4249 if (q->is_lock_owned(owner)) { 4250 the_owner = q; 4251 break; 4252 } 4253 } 4254 } 4255 assert(the_owner != NULL, "Did not find owning Java thread for lock word address"); 4256 return the_owner; 4257 } The assert on line 4255 blows because the "owner" doesn't map to a live JavaThread, i.e., it is not on the threads list. Here is a snippet from the hs_err_pid file: ;; Using jvm: "/export/local/common/jdk/baseline/linux-amd64/jre/lib/amd64/server/libjvm.so" # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/opt/jprt/T/P1/205951.amurillo/s/hotspot/src/share/vm/runtime/thread.cpp:4255), pid=19136, tid=139790698199376 # assert(the_owner != NULL) failed: Did not find owning Java thread for lock word address # # JRE version: Java(TM) SE Runtime Environment (7.0) (build 1.7.0-internal-fastdebug-201302012059.amurillo.hs24-b32-jdk7u14-b-b00) # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.0-b32-fastdebug mixed mode linux-amd64 compressed oops) # Core dump written. Default location: /export/local/164276.JAVASE.PIT.VM.linux-amd64_vm__server_mixed_cvm.jit.testlist.runTests/results/ResultDir/NativeFromCompiled/core or core.19136 # # If you would like to submit a bug report, please visit: # http://bugreport.sun.com/bugreport/crash.jsp # --------------- T H R E A D --------------- Current thread (0x0000000001777000): VMThread [stack: 0x00007f238ed8b000,0x00007f238ee8c000] [id=19164] Stack: [0x00007f238ed8b000,0x00007f238ee8c000], sp=0x00007f238ee8a7b0, free space=1021k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0xd90b72] VMError::report_and_die()+0x302;; VMError::report_and_die()+0x302 V [libjvm.so+0x6906a4] report_vm_error(char const*, int, char const*, char const*)+0x84;; report_vm_error(char const*, int, char const*, char const*)+0x84 V [libjvm.so+0xd1f996] Threads::owning_thread_from_monitor_owner(unsigned char*, bool)+0x156;; Threads::owning_thread_from_monitor_owner(unsigned char*, bool)+0x156 V [libjvm.so+0xd338b6] ThreadService::find_deadlocks_at_safepoint(bool)+0x1e6;; ThreadService::find_deadlocks_at_safepoint(bool)+0x1e6 V [libjvm.so+0xdb9619] VM_FindDeadlocks::doit()+0x19;; VM_FindDeadlocks::doit()+0x19 V [libjvm.so+0xdba469] VM_Operation::evaluate()+0x89;; VM_Operation::evaluate()+0x89 V [libjvm.so+0xdb7b75] VMThread::evaluate_operation(VM_Operation*)+0x105;; VMThread::evaluate_operation(VM_Operation*)+0x105 V [libjvm.so+0xdb8610] VMThread::loop()+0x660;; VMThread::loop()+0x660 V [libjvm.so+0xdb8840] VMThread::run()+0xb0;; VMThread::run()+0xb0 V [libjvm.so+0xb6fd88] java_start(Thread*)+0x108;; java_start(Thread*)+0x108 VM_Operation (0x00007f238e34aa80): FindDeadlocks, mode: safepoint, requested by thread 0x000000000185c800 I don't know this test (cvm/jit/ExceptionHandling/NativeFromCompiled) so I don't know if it is intentionally trying to do dead lock detection or not.
04-02-2013