JDK-8075805 : Crash while trying to release CompiledICHolder
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 8u60,9
  • Priority: P2
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2015-03-24
  • Updated: 2019-01-23
  • Resolved: 2015-08-25
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 8 JDK 9
8u72Fixed 9 b81Fixed
Related Reports
Duplicate :  
Duplicate :  
Relates :  
Relates :  
Description
 ----------System.out:(28/1204)----------
OK!
OK!
OK!
OK!
## nof_mallocs = 220145, nof_frees = 124989
## memory stomp:
GuardedMemory(0xffff80ff621e0090) base_addr=0x00000008000504f8 tag=0x0000000200000014 user_size=4294967308 user_data=0x0000000800050518
  Header guard @0x00000008000504f8 is BROKEN
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0xffff80dd1d0639ce, pid=5701, tid=26
#


The hs_err head is:
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0xffff80dd1d0639ce, pid=5701, tid=26
#
# JRE version: Java(TM) SE Runtime Environment (8.0_60) (build 1.8.0_60-internal-fastdebug-20150320161314.amurillo.hs25-60-b08-snap-b00)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.60-b08-fastdebug compiled mode solaris-amd64 compressed oops)
# Problematic frame:
# V  [libjvm.so+0x14739ce]  void GuardedMemory::print_on(outputStream*)const+0xfa
#
# Core dump written. Default location: /export/local/aurora/sandbox/results/workDir/java/util/concurrent/CyclicBarrier/Basic/core or core.5701
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
#

---------------  T H R E A D  ---------------

Current thread (0x0000000000725800):  VMThread [stack: 0xffff80ff620e1000,0xffff80ff621e1000] [id=26]

siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x0000000900050524

Registers:
RAX=0x0000000900050534, RBX=0x00000000004161c0, RCX=0x0000000900050524, RDX=0x0000000900050524
RSP=0xffff80ff621e0030, RBP=0xffff80ff621e0060, RSI=0x0000000000000111, RDI=0x000000000041f9f0
R8 =0x0000000000000006, R9 =0x000000004b4f5242, R10=0x000000000000000d, R11=0x0000000000000246
R12=0xffff80ff621e0090, R13=0x0000000800050518, R14=0x0000000200000014, R15=0x0000000800050518
RIP=0xffff80dd1d0639ce, RFLAGS=0x0000000000010287

Top of Stack: (sp=0xffff80ff621e0030)
0xffff80ff621e0030:   0000000000000000 0000000000000000
0xffff80ff621e0040:   0000000800050518 ffff80dd1f06c364
0xffff80ff621e0050:   ffff80dd1f068298 0000000800050518
0xffff80ff621e0060:   ffff80ff621e00d0 ffff80dd1dfce1cc
0xffff80ff621e0070:   0000000000000000 ffff80dd1ed3af00
0xffff80ff621e0080:   0000000000000000 ffff80dd1ed3a500
0xffff80ff621e0090:   ffff80dd1efc2338 00000008000504f8
0xffff80ff621e00a0:   00000100bf1ee5b0 0000000000000000
0xffff80ff621e00b0:   0000000800050518 ffff80dd1f06c364
0xffff80ff621e00c0:   ffff80dd1efc162c ffff80dd1efc6178
0xffff80ff621e00d0:   ffff80ff621e0130 ffff80dd1d0c89de
0xffff80ff621e00e0:   ffffffffffffffff ffff80dd1ea4dafc
0xffff80ff621e00f0:   ffff80dd1ea4daf2 ffff80dd1ea4db1c
0xffff80ff621e0100:   0000000000000000 000000000000000a
0xffff80ff621e0110:   ffff80dd1ed079e8 ffff80dd1ed0a86e
0xffff80ff621e0120:   ffff80dd1ed0a8a4 ffff80dd1efc14d8
0xffff80ff621e0130:   ffff80ff621e02a0 ffff80dd1e256b09
0xffff80ff621e0140:   ffff80dd1f035558 0000000000000100
0xffff80ff621e0150:   0000000000000000 0000000000000000
0xffff80ff621e0160:   0000000000000000 0000000000000000
0xffff80ff621e0170:   ffff80dd1f035558 0000000000000100
0xffff80ff621e0180:   0000000000000000 0000000000000000
0xffff80ff621e0190:   0000000000000000 0000000000000000
0xffff80ff621e01a0:   0000000000000000 0000000000000000
0xffff80ff621e01b0:   0000000000000000 0000000000000000
0xffff80ff621e01c0:   0000000000000000 0000000000000000
0xffff80ff621e01d0:   0000000000000000 0000000000000000
0xffff80ff621e01e0:   0000000000000000 0000000000000000
0xffff80ff621e01f0:   0000000000000000 0000000000000000
0xffff80ff621e0200:   0000000000000000 00000007db9cf100
0xffff80ff621e0210:   000000009ec17df8 00819e14d540ee1b
0xffff80ff621e0220:   ffff80dd1ed0a8a4 ffff80dd1f021d10 

Instructions: (pc=0xffff80dd1d0639ce)
0xffff80dd1d0639ae:   48 85 c0 0f 84 9e 01 00 00 48 8b 50 10 49 03 d5
0xffff80dd1d0639be:   48 8d 42 10 48 3b d0 73 1a 48 8b ca 0f 18 49 40
0xffff80dd1d0639ce:   80 39 ab 75 0a 48 ff c1 48 3b c8 72 ef eb 04 33
0xffff80dd1d0639de:   c0 eb 05 b8 01 00 00 00 85 c0 75 09 48 8d 0d 1e 

Register to memory mapping:

RAX=0x0000000900050534 is an unknown value
RBX=0x00000000004161c0 is an unknown value
RCX=0x0000000900050524 is an unknown value
RDX=0x0000000900050524 is an unknown value
RSP=0xffff80ff621e0030 is an unknown value
RBP=0xffff80ff621e0060 is an unknown value
RSI=0x0000000000000111 is an unknown value
RDI=0x000000000041f9f0 is an unknown value
R8 =0x0000000000000006 is an unknown value
R9 =0x000000004b4f5242 is an unknown value
R10=0x000000000000000d is an unknown value
R11=0x0000000000000246 is an unknown value
R12=0xffff80ff621e0090 is an unknown value
R13=0x0000000800050518 is pointing into metadata
R14=0x0000000200000014 is an unallocated location in the heap
R15=0x0000000800050518 is pointing into metadata


Stack: [0xffff80ff620e1000,0xffff80ff621e1000],  sp=0xffff80ff621e0030,  free space=1020k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x14739ce]  void GuardedMemory::print_on(outputStream*)const+0xfa
V  [libjvm.so+0x23de1cc]  void os::free(void*,MemoryType)+0x238
V  [libjvm.so+0x14d89de]  void InlineCacheBuffer::update_inline_caches()+0x11a
V  [libjvm.so+0x2666b09]  void SafepointSynchronize::do_cleanup_tasks()+0x55
V  [libjvm.so+0x2666062]  void SafepointSynchronize::begin()+0x106e
V  [libjvm.so+0x2a7f120]  void VMThread::loop()+0x858
V  [libjvm.so+0x2a7df36]  void VMThread::run()+0xb6
V  [libjvm.so+0x23f114e]  java_start+0x1ce
C  [libc.so.1+0x121555]  _thrp_setup+0xa5
C  [libc.so.1+0x121800]  _lwp_start+0x0

Comments
Detailed evaluation: The VM crashes at a safepoint while trying to free a CompiledICHolder object that was enqueued for release after flushing a nmethod. The crash happens because the object is not a CompiledICHolder but Metadata which should not be removed. The problem is that at nmethod release, "CompiledIC::is_icholder_entry" is used to determine if the ICs of that nmethod still reference CompiledICHolder objects and if so, those objects are enqueued for release at the next safepoint. The method returns true if the destination of the IC is a C2I adapter, assuming that in this case the IC is in the to-interpreter state and the cached value must be a CompiledICHolder object. However, there are very rare cases where the IC is actually in the to-compiled state but the destination nmethod was already flushed and replaced by another allocation. Since the IC is still pointing to the same address in the code cache, the state of the IC is confused. Cleaning of inline caches that point to dead nmethods should prevent this. However, we do not clean ICs of nmethods that are converted to zombie themselves. Usually, that's okay because a zombie nmethod will be flushed before any dead nmethod it references. This is guaranteed because each nmethod goes through the states alive -> non-entrant -> zombie -> marked-for-reclamation before being flushed. Suppose we have two nmethods A and B, where A references B through an IC and B is always processed first by the sweeper. The following table shows the state transitions from top to bottom where lines marked with "S" show a transition in the corresponding iteration of the sweeper. state of A state of B ----------------------------------------- non-entrant non-entrant S [not on stack] [not on stack] S zombie zombie S marked marked S flushed flushed/re-allocated The IC of A will be cleaned in the first sweeper cycle because B is non-entrant so we don't need to clean ICs again if A is converted to zombie. Let's look at the following setting: state of A state of B ----------------------------------------- non-entrant S [not on stack] non-entrant S zombie [not on stack] zombie S marked marked S flushed flushed/re-allocated There are two problems here: - the IC of A is not cleaned because B is not yet non-entrant in the first iteration of the sweeper and afterwards A becomes zombie itself, - the transition from B to zombie happens outside the sweeper in 'CodeCache::make_marked_nmethods_zombies()' because the previous sweeper iteration already determined that the nmethod is not on the stack. The VM now crashes while flushing A because it still references B. Since B was replaced by an C2I adapter, we assume that A's IC is in the to-interpreter state and try to free a CompiledICHolder object which is actually Klass-Metadata for B. The detailed logs are below [1]. A similar problem occurs with nmethod unloading because unloaded nmethods transition directly to zombie: state of A state of B ----------------------------------------- unloaded unloaded S zombie zombie S marked marked S flushed flushed/re-allocated Again, we crash while flushing A.
21-08-2015

8u60-defer-request: I would like to request deferral to 9 because we still don't have enough information to fix this in 8u60. The bug is very rare and is not a regression introduced by 8u60 (occurs with 8u40 as well). I will continue to run experiments and fix this in 9 a soon as I have enough information.
23-06-2015

8u60-critical-watch justification: The bug causes the VM to crash because we free memory multiple times. There is no workaround and it affects customers. If possible, we should fix this in 8u60.
16-06-2015

I was finally able to reproduce this bug after 3 weeks (30.000 runs). From the log files it looks like as if we enqueue the same CompiledICHolder object (0x00000007c00332c0) multiple times: enqueueing icholder 0x00000007c00332c0 to be freed create transition stub for 0xffff80ffa45f2647 destination 0xffff80ffa4106100 cached value 0x0000000000000000 create transition stub for 0xffff80ffa45f841f destination 0xffff80ffa4106100 cached value 0x0000000000000000 enqueueing icholder 0x00000007c00332c0 to be freed create transition stub for 0xffff80ffa4615457 destination 0xffff80ffa4106100 cached value 0x0000000000000000 create transition stub for 0xffff80ffa46657c7 destination 0xffff80ffa4106100 cached value 0x0000000000000000 create transition stub for 0xffff80ffa46b48a7 destination 0xffff80ffa4106100 cached value 0x0000000000000000 create transition stub for 0xffff80ffa46dc5d7 destination 0xffff80ffa4106100 cached value 0x0000000000000000 create transition stub for 0xffff80ffa470d21f destination 0xffff80ffa4106100 cached value 0x0000000000000000 create transition stub for 0xffff80ffa470f08f destination 0xffff80ffa4106100 cached value 0x0000000000000000 create transition stub for 0xffff80ffa472be07 destination 0xffff80ffa4106100 cached value 0x0000000000000000 enqueueing icholder 0x00000007c00332c0 to be freed enqueueing icholder 0x00000007c00332c0 to be freed I added additional logging to figure out why this happens.
16-06-2015

Updated ILW=Crash updating inline caches, extremely rare (not able to reproduce), none known=HLH=P2
05-06-2015

Even after running tests for almost a week (> 10.000 runs with the same build/settings/machines), I was still not able to reproduce the problem. We need a reliable reproducer to be able to fix this. Given that the bug also showed up with 8u40 (so it's not a regression introduced by 8u60) and the low likelihood, it should be fine to defer this until after 8u60.
01-06-2015

Kevin - do you have access to a better reproducer?
28-05-2015

I executed 5000 runs of the failing tests on the same machines with the same build but I was still not able to reproduce the problem. Without a reproducer it is hard to figure out what's going on. Because we sometimes hit the "assert(_live_count > 0) failed: underflow" in "CompiledICHolder::~CompiledICHolder" I suspect that we enqueue the CompiledICHolder object for release twice and therefore delete it twice in "InlineCacheBuffer::release_pending_icholders". I looked at the call sites of "InlineCacheBuffer::queue_for_release" but I'm not sure how this can happen. Running the tests with -XX:+TraceICBuffer should reveal the problem. I'll run more tests over the weekend.
28-05-2015

Reproduced again: http://aurora.ru.oracle.com/functional/faces/RunDetails.xhtml?names=883278.VMSQE.PIT-741 http://aurora.ru.oracle.com/slot-gw/883278.VMSQE.PIT-741/results/ResultDir/jmap_clstats_basic_forName_core/hs_err_pid28818.log RULE tmtools/jmap/clstats/jmap_clstats_basic_forName_core Crash Internal Error ...compiledICHolder.hpp...assert(_live_count > ...) failed: underflow RULE tmtools/jmap/histo/jmap_histo_forbidden_core Crash Internal Error ...compiledICHolder.hpp...assert(_live_count > ...) failed: underflow RULE tmtools/jstack/general/java_version_sa_live Crash SIGSEGV RULE tmtools/jstack/ownablesynchronizers-l-core/write_reentrantreadwritelock Crash SIGSEGV RULE tmtools/jstack/stacktrace_complex_scenario-core/complex_scenario8 Crash Internal Error ...compiledICHolder.hpp...assert(_live_count > ...) failed: underflow http://aurora.ru.oracle.com/functional/faces/RunDetails.xhtml?names=883278.VMSQE.PIT-731 http://aurora.ru.oracle.com/slot-gw/883278.VMSQE.PIT-731/results/ResultDir/Juggle20/hs_err_pid26731.log RULE gc/ArrayJuggle/Juggle20 Crash SIGSEGV
19-05-2015

I'm not able to get more information from the core file. I executed 10k runs of the same build on the same machine with the same test and settings. Not able to reproduce. Closing this as long as we don't have a reproducer.
16-04-2015

Relevant stack trace: V [libjvm.so+0x14739ce] void GuardedMemory::print_on(outputStream*)const+0xfa V [libjvm.so+0x23de1cc] void os::free(void*,MemoryType)+0x238 V [libjvm.so+0x14d89de] void InlineCacheBuffer::update_inline_caches()+0x11a V [libjvm.so+0x2666b09] void SafepointSynchronize::do_cleanup_tasks()+0x55 Evaluation: The VM invokes verify_memory() in os::free() because GuardedMemory::verify_guards() returns false. It prints ## nof_mallocs = 220145, nof_frees = 124989 ## memory stomp: GuardedMemory(0xffff80ff621e0090) base_addr=0x00000008000504f8 tag=0x0000000200000014 user_size=4294967308 user_data=0x0000000800050518 Header guard @0x00000008000504f8 is BROKEN and then crashes with a segmentation fault in GuardedMemory::print_on() because it tries to invoke verify() on the tail guard object which has an invalid address. Since the header guard is already broken, probably also the user_size used to compute the address of the tail guard object is wrong. Header guard: 0x00000008000504f8 <- valid address Tail guard: 0x00000008000504f8 + 4294967308 = 0x0000000900050504 <- invalid address! Segfault happens at 0x0000000900050524 It is not clear from the stack trace where os::free() is called from but I assume it is called from InlineCacheBuffer::release_pending_icholders() on a CompiledICHolder object. It could be that the address of this object is invalid for some reason. However, there is not enough information in the core file and I was not yet able to reproduce the problem.
09-04-2015

ILW=Crash updating inline caches, happend in pit, none known=HHH=P1
26-03-2015

V [libjvm.so+0x14d89de] void InlineCacheBuffer::update_inline_caches()+0x11a The crash comes from a free call from compiler code, giving this to the compiler team.
24-03-2015