JDK-8213825 : assert(false) failed: Non-balanced monitor enter/exit! Likely JNI locking
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 6,7,8,9,10,11,12,13
  • Priority: P2
  • Status: Closed
  • Resolution: Fixed
  • Submitted: 2018-11-14
  • Updated: 2020-01-15
  • Resolved: 2019-01-24
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11 Other
11.0.4Fixed openjdk8u222Fixed
Related Reports
Duplicate :  
Relates :  
Relates :  
Description
This crash seems to have happened only three times.  The most recent:

Stack: [0x00000041adef0000,0x00000041adff0000]
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [jvm.dll+0xb49bd1]  os::platform_print_native_stack+0xf1  (os_windows_x86.cpp:369)
V  [jvm.dll+0xd67860]  VMError::report+0xe80  (vmerror.cpp:698)
V  [jvm.dll+0xd690b4]  VMError::report_and_die+0x854  (vmerror.cpp:1463)
V  [jvm.dll+0xd69774]  VMError::report_and_die+0x64  (vmerror.cpp:1287)
V  [jvm.dll+0x57be7e]  report_vm_error+0x7e  (debug.cpp:232)
V  [jvm.dll+0xb21db4]  ObjectMonitor::exit+0xf4  (objectmonitor.cpp:1025)
V  [jvm.dll+0xc0f08a]  SharedRuntime::complete_monitor_unlocking_C+0x17a  (sharedruntime.cpp:2062)
C  0x0000004187bfcf5e
Comments
Fix Request This fixes the serious intermittent bug in compilers subtly breaking synchronization code. Patch applies cleanly to 11u and 8u (with reshuffling). Passes tier1 and ctw tests in 11u on Linux x86_64 fastdebug. Passes hotspot jtregs in 8u on Linux x86_64 fastdebug. The risk is low: the patch disables the broken optimization.
06-03-2019

Failed tests pass in mach5
12-02-2019

Thank you, Vladimir
24-01-2019

Fix request approved. Changes are safe - optimization is skipped for load from Java object's Mark word.
23-01-2019

Fix Request: if markword load has a sfpt as control i/p(i.e synchronizations near a safepoint), it skips sfpt assuming sfptOp wouldn't write to markword memory. fix is to not to skip safepoints for markword loads. although it is a very old bug, having fix in 12 saves users from unexpected synchronization bugs.
23-01-2019

Ok sure.
23-01-2019

[~jcm] don't touch anything else. Please, fix only current issue with loading mark word in this lock case.
22-01-2019

even anti-dependency check skips safepoint and runtimecall saying only raw memory get updated. another candidate for change?
22-01-2019

i couldn't reproduce the 8011646: SEGV in compiled code with loop predication issue with latest code , after reverting the change. may be it is fixed somewhere else. in that case can i pin the mark load in that path. ? i see castpp removal working fine with the test case.
22-01-2019

working on fix
18-01-2019

Yes, your addr_mark solution is more robust. In graph you have LoadL (marl word load) don't use AddP and directly point to Phi node.
18-01-2019

i will check other places where mark is being loaded and missing to set control edge. that was in todo list. but offset zero case for AddP , AddP identity returning non zero input. so i thought AddP case would never come for Mark load.
18-01-2019

An other thing. Skipping SFP is optimizatio. If we don't skip it for some loads - nothing bad should happen.
18-01-2019

No, AddP is used for Address in memory accesses. You can check for returned from Ideal_base_and_offset() base != NULL to make sure that you have normal oop+offset case. We have similar checks in other places. An other thing to try is to not set load's control edge in PhaseMacroExpand::expand_lock_node() and expand_unlock_node(). The was bug JDK-8011646 where we removed control for mark word load in LibraryCallKit::inline_native_hashcode(): http://hg.openjdk.java.net/jdk/jdk/rev/bcf892f54aba
18-01-2019

offset = 0 for addp is identity operation, isn't it?
18-01-2019

oops there was a mistake in assert. still i have the question, would there be a case where addp being used for mark load. as that is at offset zero.
18-01-2019

The only other possibility is Unsafe access and it is covered by AliasIdxRaw check. What is left is normal oops. Change look like next: diff -r 74dde8b66b7f src/hotspot/share/opto/memnode.cpp --- a/src/hotspot/share/opto/memnode.cpp +++ b/src/hotspot/share/opto/memnode.cpp @@ -1528,19 +1528,22 @@ Node* address = in(MemNode::Address); bool progress = false; + intptr_t offset = 0; + Node* base = AddPNode::Ideal_base_and_offset(address, phase, offest); + // Skip up past a SafePoint control. Cannot do this for Stores because // pointer stores & cardmarks must stay on the same side of a SafePoint. if( ctrl != NULL && ctrl->Opcode() == Op_SafePoint && + offset != oopDesc::mark_offset_in_bytes() && // Don't do it for mark word phase->C->get_alias_index(phase->type(address)->is_ptr()) != Compile::AliasIdxRaw ) { ctrl = ctrl->in(0); set_req(MemNode::Control,ctrl); progress = true; } - intptr_t ignore = 0; - Node* base = AddPNode::Ideal_base_and_offset(address, phase, ignore); - if (base != NULL - && phase->C->get_alias_index(phase->type(address)->is_ptr()) != Compile::AliasIdxRaw) { + if (base != NULL && + offset != oopDesc::mark_offset_in_bytes() && // Don't do it for mark word + phase->C->get_alias_index(phase->type(address)->is_ptr()) != Compile::AliasIdxRaw) { // Check for useless control edge in some common special cases if (in(MemNode::Control) != NULL && can_remove_control()
18-01-2019

[~kvn] mark is at offset zero, what is probability of addp coming as address to this load node.?
18-01-2019

Also check to avoid removal of control for such loads in following code in LoadNode::Ideal.
17-01-2019

[~jcm] Okay, load node from mark word is Long type. I thought it is RAW. It is still special load. I would suggest to compare load's offset with oopDesc::mark_offset_in_bytes() to avoid skipping SFP.
17-01-2019

we initially set ctrl to safepoint node, then skip it in LoadNode::Ideal transform. not to skip the ctrl for mark should be solution. // Skip up past a SafePoint control. Cannot do this for Stores because // pointer stores & cardmarks must stay on the same side of a SafePoint. if( ctrl != NULL && ctrl->Opcode() == Op_SafePoint && phase->C->get_alias_index(phase->type(address)->is_ptr()) != Compile::AliasIdxRaw ) { ctrl = ctrl->in(0); set_req(MemNode::Control,ctrl); progress = true; }
17-01-2019

with other test case i see problem with jdk6 and above.
17-01-2019

above code is with jdk12. <jdk12 i tried with a different test case. so, i will reconfirm failing behaviour with <jdk12.
16-01-2019

Simple test case of this sort. ############################################ class Main { public static void main(String args[]) { for (int i=0 ; i<15000 ; i++) { new Test(i).test(); } } } ############################################# class Test { public int max_count; Test(int count) { max_count = count; } void test() { for (int i=0; check(i); i++) { dummy(); } } synchronized boolean check(int i) { return i<max_count; } void dummy(){ } } ##################################### run with java -XX:CompileCommand=exclude,Main.main -XX:CompileCommand=print,Test.test Main ######################################## generates this code pattern 0c2 B8: # B14 B9 <- B6 B7 Loop: B8-B7 inner Freq: 1083.92 0c2 movq R10, [rsp + #0] # spill 0c6 movq RAX, [R10] # long 0c9 # TLS is in R15 0c9 movq R10, [R15 + #288 (32-bit)] # ptr 0d0 testl rax, [R10] # Safepoint: poll for GC # Test::test @ bci:17 L[0]=rsp + #0 L[1]=RBP # OopMap{[0]=Oop off=208} 0d3 movq R10, RAX # spill 0d6 andq R10, #7 # long 0da cmpq R10, #5 0de jne,s B14 P=0.000001 C=-1.000000
16-01-2019

This is a scary bug then! Anything from JDK8+ could start crashing (or misbehave in other ways?) due to some random timing change!
15-01-2019

Jamsheed said on our meeting that he reproduced the issue with jdk 8 and his small test case.
15-01-2019

> Is this a recently introduced problem? [~dholmes] David, based on your comment from 2018-11-13 "Intermittent issues also seen in 9 - JDK-8071813", this is a reincarnation of JDK-8071813, which was created 2015-01-28 and closed as CNR on 2016-01-26.
15-01-2019

i wrote a small test case, similar to ucp. and with that i find issue. issue is there from long time itseems. will confirm
15-01-2019

Great find [~jcm]! Is this a recently introduced problem?
14-01-2019

problem is in compiler generated code. the object mark load is abv the safepoint. overlooked the issue!
14-01-2019

checking whether mark can get corrupted by near by object
14-01-2019

checked for addUrl, closeLoaders, getLoader, it is not inlined anywhere else in c1/c2 code. only two c2 versions available. one getLoader, and other getLoader inlined in getResource.
14-01-2019

also in c2 inlined code, although prima facie it appeared to be not inlined anywhere else in c2 code.
12-01-2019

that was again false flag. epoch related code, are all perfect. things left out for verification, c1 inlined codes having ucp locks. that seems to be tedious task.
11-01-2019

//Sorry I can't follow all the asm in detail. And I don't know exactly what source code logic the asm corresponds to. // asm has both compiled(biased case,fast case , slow case) and masm fast_lock/unlock logic. OptoBiasInline (compiler generated) FastLock (masm) ( inlined) SlowCase call to runtime (compiler generated call) Similarly in unlock OptoBiasInling (compiler generated) FastUnlock(masm) (inlined) SlowCase call to runtime (compiler generated call) //Now that we have some CI failures, is this any more reproducible?// no more details - both the linux core files i have has a bulk revoke + another revoke before crash. it says something wrong with epoch handling in compiled code while windows aot crashes doesn't show a bulk revoke. so itseems multiple issues..
11-01-2019

Sorry I can't follow all the asm in detail. And I don't know exactly what source code logic the asm corresponds to. Now that we have some CI failures, is this any more reproducible?
11-01-2019

in another linux crash[1], i have analyzed pattern is same dhw points the rbp spill slot of previous frame, frame size is different for this compiled method the index at time of crash is 6. [1]https://bugs.openjdk.java.net/secure/attachment/80696/ucpgRLinuxCrash%5B2%5D.txt
10-01-2019

all the failures are in mixed mode, tiered, compressed oops, g1 gc, runs, m/c config: Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz [6/8 cores, 30-60G ram]
10-01-2019

in almost all hs_err file i see these classes loaded just before crash loading class ch/qos/logback/core/spi/AppenderAttachableImpl loading class ch/qos/logback/core/spi/AppenderAttachableImpl done or something from ch/qos/logback/ ..
09-01-2019

if crashing thread had taken ucp lock in fast path, or slow path, it is expected to have a valid dhw. but as it having invalid dhw. suggests locking didn't happen! neither bias is lost due to revoke bias as both spill codes and debuginfos are perfect for crashing code path. only leftout possibilities. 1)inflation happened in exit path due to unexpected neutral markword( it lost bias, somehow) 2)stackwalk itself goes wrong (unexpected)
07-01-2019

crashing thread: Pseudo code: Loop: 1) a safepoint without lock 2) lock(can safepoint with lock state, slowpath) 3) loaders.get(index) << fully inlined 4) unlock jump Loop asm: Locking code 0x7f993c81f759: mov 0x40(%rsp),%r10 // loading ucp from spill loc 0x7f993c81f75e: mov (%r10),%rax // mark in rax 0x7f993c81f761: mov 0x4(%rsp),%r10d // index count copy 0x7f993c81f766: inc %r10d // index is incremented 0x7f993c81f769: mov %r10d,0x38(%rsp) // index spill loc 0x7f993c81f76e: mov %rax,%r10 // mark in r10 0x7f993c81f771: and $0x7,%r10 // lock mask // safepoint check 0x7f993c81f775: mov 0x120(%r15),%r11 0x7f993c81f77c: test %eax,(%r11) 0x7f993c81f77f: cmp $0x5,%r10 0x7f993c81f783: jne 0x7f993c8257e4 // not biased try fast lock 0x7f993c81f789: mov 0x48(%rsp),%r10 // get instanceKlass 0x7f993c81f78e: mov 0xb8(%r10),%r10 // get prototype header 0x7f993c81f795: mov %r10,%r11 // prototype header in r11 0x7f993c81f798: or %r15,%r11 // current thread protoypeheader (ctp) 0x7f993c81f79b: mov %r11,%r8 // ctp in r8 0x7f993c81f79e: xor %rax,%r8 // xor with mark 0x7f993c81f7a1: test $0xffffffffffffff87,%r8 // check non age bit match 0x7f993c81f7a8: jne 0x7f993c825851 // if matched biased locked else try fast lock fast lock: 0x7f993c8257da: mov 0x40(%rsp),%r11 0x7f993c8257df: lock cmpxchg %r10,(%r11)ock 0x7f993c8257e4: mov 0x40(%rsp),%r11 0x7f993c8257e9: lea 0xc0(%rsp),%rbx // box lock 0x7f993c8257f1: mov (%r11),%rax 0x7f993c8257f4: test $0x2,%rax 0x7f993c8257fa: jne 0x7f993c825820 0x7f993c8257fc: or $0x1,%raxL 0x7f993c825800: mov %rax,(%rbx) 0x7f993c825803: lock cmpxchg %rbx,(%r11) 0x7f993c825808: je 0x7f993c825833 0x7f993c82580e: sub %rsp,%rax 0x7f993c825811: and $0xfffffffffffff007,%rax 0x7f993c825818: mov %rax,(%rbx) 0x7f993c82581b: jmpq 0x7f993c825833 0x7f993c825820: mov %rax,%r10 0x7f993c825823: xor %rax,%rax 0x7f993c825826: lock cmpxchg %r15,0x7e(%r10)71b 0x7f993c82582c: movq $0x3,(%rbx) 0x7f993c825833: je 0x7f993c81f7ae 0x7f993c825839: mov 0x40(%rsp),%rsi 0x7f993c82583e: lea 0xc0(%rsp),%rdx 0x7f993c825846: nop 0x7f993c825847: callq 0x7f9934d642a0 runtime call for locking 0x7f993c82584c: jmpq 0x7f993c81f7ae 0x7f993c825851: test $0x7,%r8 0x7f993c825858: jne 0x7f993c8257da BiasedOptoInline 0x7f993c82585e: test $0x300,%r8 0x7f993c825865: jne 0x7f993c825874 0x7f993c825867: and $0x37f,%rax 0x7f993c82586e: mov %rax,%r11 0x7f993c825871: or %r15,%r11 0x7f993c825874: mov 0x40(%rsp),%r10 0x7f993c825879: lock cmpxchg %r11,(%r10) 0x7f993c82587e: jne 0x7f993c825839 0x7f993c825880: jmpq 0x7f993c81f7ae [ 0x7f993c81f7ae critical section 0x7f993c81f848] UNlock code 0x7f993c81f84e: mov $0x7,%r10d 0x7f993c81f854: mov 0x40(%rsp),%r11 0x7f993c81f859: and (%r11),%r10 0x7f993c81f85c: cmp $0x5,%r10 0x7f993c81f860: je 0x7f993c81f71b 0x7f993c81f866: lea 0xc0(%rsp),%rax 0x7f993c81f86e: cmpq $0x0,(%rax) 0x7f993c81f875: je 0x7f993c81f8f3 0x7f993c81f87b: mov (%r11),%r10 0x7f993c81f87e: test $0x2,%r10 0x7f993c81f885: je 0x7f993c81f8eb 0x7f993c81f887: xor %rax,%rax 0x7f993c81f88a: or 0x8e(%r10),%rax 0x7f993c81f891: jne 0x7f993c81f8f3 0x7f993c81f893: mov 0x9e(%r10),%rax 0x7f993c81f89a: or 0x96(%r10),%rax 0x7f993c81f8a1: jne 0x7f993c81f8ad 0x7f993c81f8a3: movq $0x0,0x7e(%r10) 0x7f993c81f8ab: jmp 0x7f993c81f8f3 0x7f993c81f8ad: cmpq $0x0,0xa6(%r10) 0x7f993c81f8b8: je 0x7f993c81f8df 0x7f993c81f8ba: xor %rax,%rax 0x7f993c81f8bd: movq $0x0,0x7e(%r10) 0x7f993c81f8c5: lock addl $0x0,(%rsp) 0x7f993c81f8ca: cmpq $0x0,0xa6(%r10) 0x7f993c81f8d5: jne 0x7f993c81f8e4 0x7f993c81f8d7: lock cmpxchg %r15,0x7e(%r10) 0x7f993c81f8dd: jne 0x7f993c81f8e4 0x7f993c81f8df: or $0x1,%eax 0x7f993c81f8e2: jmp 0x7f993c81f8f3 0x7f993c81f8e4: test $0x0,%eax 0x7f993c81f8e9: jmp 0x7f993c81f8f3 0x7f993c81f8eb: mov (%rax),%r10 0x7f993c81f8ee: lock cmpxchg %r10,(%r11) 0x7f993c81f8f3: jne 0x7f993c81f6fe 0x7f993c81f8f9: jmpq 0x7f993c81f71b 0x7f993c81f6fe: mov 0x40(%rsp),%rdi 0x7f993c81f703: lea 0xc0(%rsp),%rsi 0x7f993c81f70b: mov %r15,%rdx 0x7f993c81f70e: movabs $0x7f9952ca0780,%r10 0x7f993c81f718: callq *%r10 //runtime call for unlocking [ 0x7f993c81f71b - outer loop. loader.getResoure call - 0x7f993c81f759)
07-01-2019

When you say the dhw is a junk pointer what exactly are you referring to? p *lock $8 = {_displaced_header = 0x7f983efd18b0} 0x7f983efd18b0 is rbp spill slot of previous frame.
07-01-2019

So everything about that objectMonitor instance indicates that it is unlocked and quiescent. The _header shows it is unlocked; _owner is NULL, _recursions is 0 etc etc. So it appears somehow we have "missed" the locking of the monitor (possibly the inflation of the monitor was not done correctly) Do we have the annotated asm code for the complete sequence of entering the synchronized method and leaving it again - I see asm above for the critical section but without any symbolic information or annotations it is impossible for me to recognize the monitor related code. When you say the dhw is a junk pointer what exactly are you referring to? (After a monitor is inflated it isn't valid to read a BasicLock's content - according to BasicLock::move_to in basicLock.cpp)
07-01-2019

- what is the markword for the object being locked? markword contain a monitor and it is trying to exit it. dhw is some junk pointer - if the monitor is inflated what is its state? it is exiting this monitor {_header = 0x9, _object = 0x5efdd9088, FreeNext = 0x0, _pad_buf0 = '\000' <repeats 103 times>, _owner = 0x0, _previous_owner_tid = 0, _recursions = 0, _EntryList = 0x0, _cxq = 0x0, _succ = 0x0, _Responsible = 0x0, _Spinner = 0, _SpinDuration = 5000, _count = 0, _WaitSet = 0x0, _waiters = 0, _WaitSetLock = 0, static _sync_ContendedLockAttempts = 0x7f994c01e3b0, static _sync_FutileWakeups = 0x7f994c01e490, static _sync_Parks = 0x7f994c01e570, static _sync_Notifications = 0x7f994c01e640, static _sync_Inflations = 0x7f994c01ddf0, static _sync_Deflations = 0x7f994c01e2e0, static _sync_MonExtant = 0x7f994c01e720, static Knob_SpinLimit = 5000} - if the monitor is inflated on which thread's in-use list does it appear? crashing thread
07-01-2019

checked debug info in the above code region! seems to be perfect!. checking how revoke bias operation deal with two safepoint nearby eachother, one unlocked and one locked tight loop.
07-01-2019

i doubt it being inflated during moniotr.exit kind of scenrio.
07-01-2019

I've been trying to follow the analysis but am unclear on the basic conditions: - what is the markword for the object being locked? - if the monitor is inflated what is its state? - if the monitor is inflated on which thread's in-use list does it appear? Thanks
06-01-2019

slight mistake on above assertion. the dhw is actually a junk value. says there no basic lock. so crashing thread actually had a bias or a monitor. Bias lost by crashing thread case need to be analysed first. Other info is that, there is no bulk revoke in all the crashes. so let us assume bulk revoke is not the cause. sequence of events in crashing thread Loop: 1) a safepoint without lock 2) lock(can safepoint with lock state, slowpath) 3) loaders.get(index) << fully inlined 4) unlock jump Loop
06-01-2019

Thread creation time stamp: 1542034348998 == 12/11/2018, 06:52:28:998 last safepoint operation: 1542034349015 == 12/11/2018, 06:52:29:015 crash time: Time: Mon Nov 12 06:52:29 2018 last bulk revoke 1542034349009 12/11/2018, 06:52:29:009
04-01-2019

verifying spill codes in other critical path.
02-01-2019

verifying compiler generated code for the other ucp critical sections.
31-12-2018

possible scenario : other thread lost lock(possibly due to bias revoke), and then exited the lock of crashing thread( which was trying to have a runtime exit as it couldnt find succ)
31-12-2018

code that being run in locked state 0x7f993c81f7ae: mov 0x40(%rsp),%r10 0x7f993c81f7b3: movzbl 0xc(%r10),%r10d 0x7f993c81f7b8: test %r10d,%r10d 0x7f993c81f7bb: jne 0x7f993c825a48 0x7f993c81f7c1: mov 0x4(%rsp),%r10d 0x7f993c81f7c6: add $0x2,%r10d 0x7f993c81f7ca: mov %r10d,0x4(%rsp) 0x7f993c81f7cf: mov 0x40(%rsp),%r10 0x7f993c81f7d4: mov 0x18(%r10),%r11d 0x7f993c81f7d8: mov 0x10(%r12,%r11,8),%r10d 0x7f993c81f7dd: cmp 0x4(%rsp),%r10d 0x7f993c81f7e2: jl 0x7f993c81fa18 0x7f993c81f7e8: mov 0x40(%rsp),%r10 0x7f993c81f7ed: mov 0x18(%r10),%r10d 0x7f993c81f7f1: mov 0x10(%r12,%r10,8),%r11d 0x7f993c81f7f6: test %r11d,%r11d 0x7f993c81f7f9: jl 0x7f993c825885 0x7f993c81f7ff: mov 0x38(%rsp),%r9d 0x7f993c81f804: cmp %r11d,%r9d 0x7f993c81f807: jae 0x7f993c825785 0x7f993c81f80d: mov 0x14(%r12,%r10,8),%r10d 0x7f993c81f812: mov 0x38(%rsp),%r8d 0x7f993c81f817: mov 0xc(%r12,%r10,8),%r9d 0x7f993c81f81c: cmp %r9d,%r8d 0x7f993c81f81f: jae 0x7f993c8257ae 0x7f993c81f825: shl $0x3,%r10 0x7f993c81f829: mov 0x10(%r10,%r8,4),%ebx 0x7f993c81f82e: mov 0x8(%r12,%rbx,8),%r10d 0x7f993c81f833: shl $0x3,%r10 0x7f993c81f837: mov 0x38(%r10),%r10 0x7f993c81f83b: movabs $0x7c00841b8,%r11 0x7f993c81f845: cmp %r11,%r10 0x7f993c81f848: jne 0x7f993c81f8fe it is a very small critical section.
31-12-2018

checking race condition at fast_unlock. setting owner null case.
31-12-2018

dhw(anonymously biased) in ucp basic lock of crashing thread asserts that there was no revoke bias at state of locked(runtime exit safepoint). so crashing thread either had a basic lock or an inflated lock.
31-12-2018

//Theoretical possibility: the lock was biased to the crashing thread at the time it was locked. Another thread tried to lock the same monitor and so should have revoked the bias and inflated it, and installed the crashing thread as the owner (or the crashing thread may do that itself later ...). But when the crashing thread gets to the unlock it doesn't see the owner set and so triggers the assertion. Need to track through the exact sequence of operations for trying to lock an uninflated but biased monitor that is already locked. // revoke_and_rebias require a safepointing. only safepoint in ucp getLoader is at unlocked state or contending for acquiring lock(for crashing case). see [2] [2]at time of crash the index is 5, loaders.size is 42. code generated takes this path [1]for index 0-41, the lock is lost in this time. [1] Safepoint check lock (slow case runtime call also can safepoint check and block) loaders.get(index); //fully inlined unlock public Resource getResource(String name, boolean check) { ... Loader loader; for (int i = 0; (loader = getLoader(i)) != null; i++) { ... } return null; } private synchronized Loader getLoader(int index) { if (closed) { return null; } while (loaders.size() < index + 1) { ..... } return loaders.get(index); } Thread creation time stamp: 1542034348998 == 12/11/2018, 06:52:28:998 last safepoint operation: 1542034349015 == 12/11/2018, 06:52:29:015 crash time: Time: Mon Nov 12 06:52:29 2018 last safepoint in ucp getLoader(inlined) method 0x7f993c81f775: mov 0x120(%r15),%r11 0x7f993c81f77c: test %eax,(%r11) just before locking, or at lock slow path runtime call.
27-12-2018

agreed. as x86 being TSO. all unlock should get automatically visible with thread exit.
27-12-2018

Compiler is still supposed to generate completely balanced locking. Whether compiler or interpreter doesn't change the fact a thread can't terminate with monitors held.
27-12-2018

//Still not exactly sure what you mean. Two threads with the same address can't exist at the same time. So the original thread would still have to had unlock all monitors as it exited even if it had somehow omitted unlocking one (which it can't).// cant unlock happen in compiler generated code, which doesn't follow same protocol as runtime. does cmpxchng must for setting owner field null in unlock path? //Does this reproduce or are you examining the only core file we have access to? // it doesn't reproduce. i have a few core files. last time it crashed on Dec 8, till then it was more frequent, like weekly once crash. started crashing from Nov 7 th
27-12-2018

Still not exactly sure what you mean. Two threads with the same address can't exist at the same time. So the original thread would still have to had unlock all monitors as it exited even if it had somehow omitted unlocking one (which it can't). Further, if the monitor appeared locked by the original thread that would be seen when the crashing thread locked the monitor, not when it unlocked it. Theoretical possibility: the lock was biased to the crashing thread at the time it was locked. Another thread tried to lock the same monitor and so should have revoked the bias and inflated it, and installed the crashing thread as the owner (or the crashing thread may do that itself later ...). But when the crashing thread gets to the unlock it doesn't see the owner set and so triggers the assertion. Need to track through the exact sequence of operations for trying to lock an uninflated but biased monitor that is already locked. Does this reproduce or are you examining the only core file we have access to? By the way there is another bug, JDK-8215355, that may be related, though it has the opposite problem: the lock appears locked to all threads trying to acquire it, but the reported owner has already moved passed the code where the unlock happened. Both could potentially be explained by a bug in the logic to inflate a biased monitor.
27-12-2018

i meant the thread previously shared the thread ptr of crashing thread exited monitor by unlocking ( some exception path close by). while this was not visible to newly created thread with same ptr later. something of this sort.?
27-12-2018

I don't understand that last comment. The "crashing" thread is executing a monitor-exit but the state of the monitor shows it is not owned by that thread. But that should be impossible given that we are returning from a synchronized method so we must have owned the monitor on entry. The only way a thread can terminate but still own a monitor is if it calls pthread_exit directly - something only a natively attaching thread could possibly do.
26-12-2018

crashing thread sharing thread ptr of an exited thread who previously owned monitor can create this scenario?
26-12-2018

Thread creation time stamp: 1542034348998 == 12/11/2018, 06:52:28:998 last safepoint operation: 1542034349015 == 12/11/2018, 06:52:29:015 crash time: Time: Mon Nov 12 06:52:29 2018 last safepoint in ucp method 0x7f993c81f775: mov 0x120(%r15),%r11 0x7f993c81f77c: test %eax,(%r11) just before locking
26-12-2018

at time of crash the index is 5, loaders.size is 42. and compiler generated code doesn't seems to have branches to wrong locations. code generated takes this path [1]for index 0-41, the lock is lost in this time. it seems like revoke bias is not visible to this thread at the time of locking, is the problem. ? [1] lock loaders.get(index); unlock public Resource getResource(String name, boolean check) { ... Loader loader; for (int i = 0; (loader = getLoader(i)) != null; i++) { ... } return null; } private synchronized Loader getLoader(int index) { if (closed) { return null; } while (loaders.size() < index + 1) { ..... } return loaders.get(index); }
25-12-2018

just verified from assembly code. it is at (synchronized)getLoader exit location entering outer loop.
22-12-2018

i can get more reliable getResource inlined information, if luck helps by greping nearby reliable safepoints.
22-12-2018

i created frame out of imprecise information!.. so frame inlined from UrlClassPath::getResource is incorrect. posted stack trace mainly for pointing call stack below getResouce
22-12-2018

it is inlined compiled method top frame exit/ or inside loop locations! as it is not safepoint location, it is imprecise ,and point wrong byte code/methods
22-12-2018

I don't understand the stack traces that have been posted - what monitor is being unlocked in the top frame? - public java.lang.String toString() @0x000000bc7d2f6510 @bci = 8, line = 448, pc = 0x000000bc5d9674e5 (Compiled; information may be imprecise) - public static java.lang.String urlNoFragString(java.net.URL) @0x000000bc7d759640 @bci = 110, line = 80, pc = 0x000000bc5d9674e5 (Compiled) - private synchronized jdk.internal.loader.URLClassPath$Loader getLoader(int) @0x000000bc7d57c160 @bci = 61, line = 437, pc = 0x000000bc5d9674e5 (Compiled) We will only unlock when we get to the getLoader frame and return from it.
22-12-2018

though crash always happen in same method. the thread that is crashing is very newly created thread.
20-12-2018

failed thread stack Java Stack Trace for logback-appender-logstash-java.se.oracle.com:10074-2 Thread state = IN_JAVA - void getBytes(byte[], int, byte) @0x00007f990f80b048 @bci = 22, line = 3366, pc = 0x00007f993c81f71b (Compiled; information may be imprecise) - private final void putStringAt(int, java.lang.String) @0x00007f990f8b24b8 @bci = 25, line = 1667, pc = 0x00007f993c81f71b (Compiled) - public java.lang.AbstractStringBuilder append(java.lang.String) @0x00007f990f8af460 @bci = 30, line = 539, pc = 0x00007f993c81f71b (Compiled) - public java.lang.StringBuilder append(java.lang.String) @0x00007f990f8b47d0 @bci = 2, line = 174, pc = 0x00007f993c81f71b (Compiled) - public static java.lang.String urlNoFragString(java.net.URL) @0x00007f990fd34870 @bci = 24, line = 54, pc = 0x00007f993c81f71b (Compiled) - private synchronized jdk.internal.loader.URLClassPath$Loader getLoader(int) @0x00007f990fb37020 @bci = 61, line = 437, pc = 0x00007f993c81f71b (Compiled) - public jdk.internal.loader.Resource getResource(java.lang.String, boolean) @0x00007f990fb36ab0 @bci = 42, line = 313, pc = 0x00007f993c81f71b (Compiled) - public java.lang.Class run() [signature ()Ljava.lang.Class<*>;] @0x00007f990e60ee40 @bci = 26, line = 420, pc = 0x00007f993c91425c (Compiled) - public [bridge] [synthetic] java.lang.Object run() @0x00007f990e60ef00 @bci = 1, line = 417, pc = 0x00007f993c91425c (Compiled) - private static java.lang.Object executePrivileged(java.security.PrivilegedExceptionAction, java.security.AccessControlContext, java.lang.Class) [signature <T:Ljava.lang.Object;>(Ljava.security.PrivilegedExceptionAction<TT;>;Ljava.security.AccessControlContext;Ljava.lang.Class<*>;)TT;] @0x00007f990f836800 @bci = 21, line = 765, pc = 0x00007f993c91425c (Compiled) - public static java.lang.Object doPrivileged(java.security.PrivilegedExceptionAction, java.security.AccessControlContext) [signature <T:Ljava.lang.Object;>(Ljava.security.PrivilegedExceptionAction<TT;>;Ljava.security.AccessControlContext;)TT;] @0x00007f990f8363a8 @bci = 13, line = 688, pc = 0x00007f993c91425c (Compiled) - protected java.lang.Class findClass(java.lang.String) [signature (Ljava.lang.String;)Ljava.lang.Class<*>;] @0x00007f990ff73420 @bci = 13, line = 416, pc = 0x00007f993c91425c (Compiled) - protected java.lang.Class loadClass(java.lang.String, boolean) [signature (Ljava.lang.String;Z)Ljava.lang.Class<*>;] @0x00007f990f81e280 @bci = 70, line = 588, pc = 0x00007f993c91425c, oop = 0x00000005efd2a928 (Compiled) - locked <0x00000005f26ad730> (a java.lang.Object) - public java.lang.Class loadClass(java.lang.String) [signature (Ljava.lang.String;)Ljava.lang.Class<*>;] @0x00007f990f81e0e0 @bci = 3, line = 521, pc = 0x00007f993c938000 (Compiled) - private synchronized void openSocket() @0x00007f983edcfd98 @bci = 305, line = 567, pc = 0x00007f9934c71cdb, oop = 0x00000005efd2b3f0 (Interpreted) - locked <0x00000005efd2b3f0> (a net.logstash.logback.appender.AbstractLogstashTcpSocketAppender$TcpSendingEventHandler) - public void onStart() @0x00007f983edcf840 @bci = 1, line = 488, pc = 0x00007f9934c3ef43, oop = 0x00000005efd2b3f0 (Interpreted) - public void onStart() @0x00007f983e90b778 @bci = 17, line = 332, pc = 0x00007f9934c3ef88, oop = 0x00000005f26797b8 (Interpreted) - private void notifyStart() @0x00007f983e90e738 @bci = 17, line = 185, pc = 0x00007f9934c3ef88, oop = 0x00000005f268a500 (Interpreted) - public void run() @0x00007f983e90e4c0 @bci = 32, line = 114, pc = 0x00007f9934c3ef43, oop = 0x00000005f268a500 (Interpreted) - public java.lang.Object call() [signature ()TT;] @0x00007f990ff615c8 @bci = 4, line = 515, pc = 0x00007f9936b6bcbc, oop = 0x00000005f26a0940 (Compiled) - public void run() @0x00007f990ff5f1c0 @bci = 39, line = 264, pc = 0x00007f9936b6bcbc, oop = 0x00000005f26a08f8 (Compiled) - public void run() @0x00007f990ff5d770 @bci = 28, line = 304, pc = 0x00007f9934c3ef43, oop = 0x00000005f26a08f8 (Interpreted) - final void runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) @0x00007f990fe923c8 @bci = 92, line = 1128, pc = 0x00007f9934c3ef88, oop = 0x00000005f27bab70 (Interpreted) - public void run() @0x00007f990ff62758 @bci = 5, line = 628, pc = 0x00007f9934c3ef43, oop = 0x00000005f26a0958 (Interpreted) - public void run() @0x00007f990f8409b0 @bci = 11, line = 835, pc = 0x00007f99363624c4 (Compiled)
20-12-2018

oversight while doing analysis!, deleted the comments.
18-12-2018

Added [~pchilanomate] to this bug since there is mention of biased locking...
14-12-2018

i see revoke bias vm operation happened quite often nearing to crash! can it be because of incorrect debug info, feeding unlocked state to revoke bias. then this thread trying to exit the monitor which it never acquired ? next step analysing debug info
14-12-2018

i dont find lock elimination in the generated code, neither unbalanced monitor in control path. checking other possibilities.
12-12-2018

This could possibly be related to the EliminateLocks flag.
27-11-2018

ILW = assert failure crash, likely unbalanced ObjectMonitor locking after returning to C2 method from JNI code; intermittent, rare (yet to try reproduce); none = HLH = P2
14-11-2018

Intermittent issues also seen in 9 - JDK-8071813
14-11-2018

The stack is incomplete: V [jvm.dll+0xc0f08a] SharedRuntime::complete_monitor_unlocking_C+0x17a (sharedruntime.cpp:2062) C 0x0000004187bfcf5e The AgentVMThread doesn't use any direct JNI code - i.e. there's no native code in jtreg/javatest AFAICS.
14-11-2018

I'm filing under hotspot/compiler because of SharedRuntime::complete_monitor_unlocking_C on the stack.
14-11-2018