JDK-6484364 : JVM crash at oopDesc*DefNewGeneration::copy_to_survivor_space(oopDesc*,oopDesc**)
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 5.0u9
  • Priority: P3
  • Status: Closed
  • Resolution: Duplicate
  • OS: solaris_10
  • CPU: sparc
  • Submitted: 2006-10-20
  • Updated: 2010-07-29
  • Resolved: 2007-06-07
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other JDK 6
5.0u13Resolved 6Resolved
Related Reports
Duplicate :  
Relates :  
Description
Customer got a JVM crash. From the decoded and demangled hs_err_pid6778.log, it crashes in oopDesc*DefNewGeneration::copy_to_survivor_space(oopDesc*,oopDesc**). The stack trace shows only functions from libjvm.so. There is no testcase available.

#
# An unexpected error has been detected by HotSpot Virtual Machine:
#
#  SIGBUS (0xa) at pc=0xffffffff7ed5f16c, pid=6778, tid=3
#
# Java VM: Java HotSpot(TM) 64-Bit Server VM (1.5.0_09-b01 mixed mode)
# Problematic frame:
# V  [libjvm.so+0x55f16c]
#

---------------  T H R E A D  ---------------

Current thread (0x00000001002b2400):  VMThread [id=3]

siginfo:si_signo=10, si_errno=0, si_code=1, si_addr=0xfffffffed523c829
si_signo=10 SIGBUS
si_code=1 BUS_ADRALN /* Invalid address alignment.  */

[--snip--]
Stack: [0xffffffff75100000,0xffffffff75200000),  sp=0xffffffff751fe6f0,  free space=1017k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x55f16c] oopDesc*DefNewGeneration::copy_to_survivor_space(oopDesc*,oopDesc**)+0x24
V  [libjvm.so+0x560dd8] void FastScanClosure::do_oop(oopDesc**)+0x58
V  [libjvm.so+0x298a60] void OopMapSet::all_do(const frame*,CodeBlob*,const RegisterMap*,OopClosure*,void(*)(oopDesc**,oopDesc**),O
opClosure*,OopClosure*)+0x2c8
V  [libjvm.so+0x298b3c] void OopMapSet::oops_do(const frame*,CodeBlob*,const RegisterMap*,OopClosure*)+0x4c
V  [libjvm.so+0x29871c] void frame::oops_code_blob_do(OopClosure*,const RegisterMap*)+0x38
V  [libjvm.so+0x2af1d0] void JavaThread::oops_do(OopClosure*)+0x130
V  [libjvm.so+0x3f5f3c] void Threads::oops_do(OopClosure*)+0x44
V  [libjvm.so+0x583530] void GenCollectedHeap::process_strong_roots(int,int,int,GenCollectedHeap::ClassScanningOption,OopsInGenClos
ure*,OopsInGenClosure*)+0xb8
V  [libjvm.so+0x55e5ec] void DefNewGeneration::collect(int,int,unsigned long,int,int)+0x41c
V  [libjvm.so+0x582fd4] void GenCollectedHeap::do_collection(int,int,unsigned long,int,int,int,int*)+0x5fc
V  [libjvm.so+0x52f0a4] HeapWord*TwoGenerationCollectorPolicy::satisfy_failed_allocation(unsigned long,int,int,int*)+0x1fc
V  [libjvm.so+0x84a3cc] void VM_GenCollectForAllocation::doit()+0xb4
V  [libjvm.so+0x3a4dc4] void VM_Operation::evaluate()+0x8c
V  [libjvm.so+0x47d8dc] void VMThread::run()+0x714
V  [libjvm.so+0x7b1108] void*_start(void*)+0x218

VM Arguments:
jvm_args: [--snip--] -Xmx2024m -Xms256m -XX:+UseConcMarkSweepGC -XX:-UseParNewGC -XX:ParallelGCThreads=4 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:-CMSParallelRemarkEnable

[--snip--]
---------------  S Y S T E M  ---------------
OS:                         Solaris 10 3/05 s10_74L2a SPARC
           Copyright 2005 Sun Microsystems, Inc.  All Rights Reserved.
                        Use is subject to license terms.
                            Assembled 22 January 2005

uname:SunOS 5.10 Generic_118833-24 sun4u  (T2 libthread)
rlimit: STACK 8192k, CORE infinity, NOFILE 8192, AS infinity
load average:13.49 11.96 10.94

CPU:total 8 has_v8, has_v9, has_vis1, has_vis2, is_ultra3

Memory: 8k page, physical 16777216k(9136768k free)

vm_info: Java HotSpot(TM) 64-Bit Server VM (1.5.0_09-b01) for solaris-sparc, built on Sep  7 2006 14:01:50 by unknown with unknown

Comments
EVALUATION Doesn't apply to jdk6 and beyond. From Tom Rodriguez: So I figured out what's wrong. Here's the code of interest: 0xffffffff75653f18: call %o7 + 0x23c 0xffffffff75653f1c: mov %g2, %l7 public native synchronized java.lang.Throwable fillInStackTrace() @0xffffffff6fc2e268 of public class java.lang.Thr\ owable @0xffffffff6fc2e7e8 @ bci = 0 monitors (owner = %i0, oop, lock = stack[384], normal) OopMap: at call = true Oop: %i0 [176] 0xffffffff75653f20: mov %l7, %g2 0xffffffff75653f24: mov 0x5, %l0 0xffffffff75653f28: st %l0, [%l4 + 0x198] 0xffffffff75653f2c: sethi %hi(0x836ffc00), %l0 0xffffffff75653f30: btog 0xfffffc00, %l0 0xffffffff75653f34: inc 0x0, %l0 0xffffffff75653f38: sethi %hi(0x80d3d400), %l1 0xffffffff75653f3c: btog 0xfffffc00, %l1 0xffffffff75653f40: inc 0x1f0, %l1 0xffffffff75653f44: sethi %hi(0x1c00), %l2 0xffffffff75653f48: bset 0x3fc, %l2 0xffffffff75653f4c: srlx %l4, 3, %l3 0xffffffff75653f50: and %l3, %l2, %l2 0xffffffff75653f54: clr [%l0 + %l2] 0xffffffff75653f58: ld [%l4 + 0x3c], %l0 0xffffffff75653f5c: ld [%l1 + %g0], %l1 0xffffffff75653f60: bset %l0, %l1 0xffffffff75653f64: mov %o0, %i0 0xffffffff75653f68: cmp %l1, 0x0 0xffffffff75653f6c: bne,pn %icc, 0xffffffff75654004 0xffffffff75653f70: nop 0xffffffff75654004: mov %l4, %o0 0xffffffff75654008: sethi %hi(0x80ff2000), %o7 0xffffffff7565400c: btog 0xfffffc00, %o7 0xffffffff75654010: nop 0xffffffff75654014: nop 0xffffffff75654018: nop 0xffffffff7565401c: nop 0xffffffff75654020: nop 0xffffffff75654024: call %o7 + 0x270 0xffffffff75654028: mov %g2, %l7 public native synchronized java.lang.Throwable fillInStackTrace() @0xffffffff6fc2e268 of public class java.lang.Thr\ owable @0xffffffff6fc2e7e8 @ bci = 0 OopMap: at call = true Oop: [176] The first call is the actual native call and the second is the call to check_special_condition_for_native_trans. The first thing that's wrong is that they don't have the same oopmap. While the thread in native is considered to be stopped it may execute the code between the native call and the check for SafepointSynchronize::_block being set while at the same time a GC thread may be inspecting the frame. If the oop maps don't agree clearly something isn't right. In particular %i0 is no longer considered to contain an oop but when the safepoint started it was. In the crash you can see that we are here: oop DefNewGeneration::copy_to_survivor_space(oop old, oop* from) { with from == %i2 == 0xfffffffef603bc00 and old == %i5 == 0xfffffffef603bc70. The reason it's in %i5 instead of %i1 is that %i1 was copied into %i5 and %i1 was destroyed. So all of this is ok. The problem is that the location this oopmap entry refers to is no longer an oop. %i2 is SP+64 which corresponds to the address of the register I0 in the register window save area. I0 gets overwritten with the handle returned from the native method at instruction 0xffffffff75653f64. That handle is SP+176, which is 0xfffffffef603bc70. One thing I can't make sense of is that in the core the oopmap for the native call site has %i0 and [176] as oops but when I use +PrintNativeNMethods to dump it from the latest 1.5.0 it only contains [176]. I just realized that they have explicitly turned on UseBiasedLocking and once I add that I get code which looks the same from 1.5. So now I can see what's wrong. The fix for 6298299 added debug info for the monitor for use with biased locking, but it didn't add the monitor to the check_special_condition_for_native_trans call site. Once you do that the oop maps agree and the code seems to work. So here's a fix: *** /tmp/geta9574 Wed Jun 6 17:28:45 2007 --- generateOptoStub.cpp Wed Jun 6 17:25:20 2007 *************** *** 454,459 **** --- 454,466 ---- _gvn.transform_no_reclaim(block); set_predefined_output_for_runtime_call(block, NULL); + if (UseBiasedLocking && method->is_synchronized()) { + // The oopmap for the runtime call must match the one used for + // the actual native call so make sure to add the monitor that + // was added above. + block->push_monitor(flock); + } + // Merge control flow post call RegionNode *region = new RegionNode(3); region->set_req( 1, control() ); and here's a test case that reproduces the crash when run with -XX:+UseBiasedLocking. public class fis { public static void main(String[] args) { int max = Runtime.getRuntime().availableProcessors() * 2; for (int i = 0; i < max; i++) { new Thread(new Runnable() { public void run() { Throwable t = new Throwable(); while (true) { t.fillInStackTrace(); } } }).start(); } while (true) { synchronized (args) { try { args.wait(100); } catch (Exception e) { } System.gc(); System.out.print('.'); } } } } 1.6 doesn't have the problem since it was more carefully coded and client in 1.5 doesn't seem to be effected either. So obvious workarounds are to exclude java/lang/Throwable.fillInStackTrace from compilation or to disable biased locking. Even with this fix we're generally getting lucky that C2's generated stubs are emitting similar things at both call sites. It's not surprising that it does the right thing but it's not really guaranteed. tom
07-06-2007