JDK-6565138 : 1.5.0_10 64-Bit SIGBUS with ParallelGC in MarkSweep::MarkAndPushClosure::do_oop
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 5.0u9,5.0u10
  • Priority: P2
  • Status: Resolved
  • Resolution: Fixed
  • OS: solaris_10
  • CPU: sparc
  • Submitted: 2007-06-04
  • Updated: 2023-09-04
  • Resolved: 2007-06-12
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
5.0u14 b01Fixed
Related Reports
Duplicate :  
Duplicate :  
Relates :  
Relates :  
Description
Customer's application 'partition' crashed due to SIGBUS in MarkAndPushClosure::do_oop because the argument oopDesc** was a wrong pointer

% /net/cores.germany/cores/CA_37862788/hs_err_pid15823_moreinfo.log     

# Java VM: Java HotSpot(TM) 64-Bit Server VM (1.5.0_10-b03 mixed mode)

Stack: [0xffffffff70100000,0xffffffff70200000),  sp=0xffffffff701fea30,  free space=1018k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x1edf1c];;  void MarkSweep::MarkAndPushClosure::do_oop(oopDesc**)+0x14
V  [libjvm.so+0x2990b0];;  void OopMapSet::all_do(const frame*,CodeBlob*,const RegisterMap*,OopClosure*,void(*)(oopDesc**,oopDesc**),OopClosure*,OopClosure*)+0x2c8
V  [libjvm.so+0x29918c];;  void OopMapSet::oops_do(const frame*,CodeBlob*,const RegisterMap*,OopClosure*)+0x4c
V  [libjvm.so+0x298d6c];;  void frame::oops_code_blob_do(OopClosure*,const RegisterMap*)+0x38
V  [libjvm.so+0x2af888];;  void JavaThread::oops_do(OopClosure*)+0x130
V  [libjvm.so+0x3f680c];;  void Threads::oops_do(OopClosure*)+0x44
V  [libjvm.so+0x41175c];;  void PSMarkSweep::invoke_no_policy(int*,int)+0x4c4
V  [libjvm.so+0x3ad8ec];;  void PSScavenge::invoke(int*)+0x1ac
V  [libjvm.so+0x3adc0c];; HeapWord*ParallelScavengeHeap::failed_mem_allocate(int*,unsigned long,int,int)+0xd4
V  [libjvm.so+0x3ae050];;  void VM_ParallelGCFailedAllocation::doit()+0xc0
V  [libjvm.so+0x3a55cc];;  void VM_Operation::evaluate()+0x8c
V  [libjvm.so+0x47e294];;  void VMThread::run()+0x714
V  [libjvm.so+0x7b2880];;  void*_start(void*)+0x218

VM_Operation (0xfffffffed013d220): parallel gc failed allocation, mode: safepoint, requested by thread 0x00000001013a6000

Heap
 PSYoungGen      total 143232K, used 38108K [0xffffffff50000000, 0xffffffff5d000000, 0xffffffff70000000)
  eden space 79936K, 0% used [0xffffffff50000000,0xffffffff50000000,0xffffffff54e10000)
  from space 63296K, 60% used [0xffffffff58e30000,0xffffffff5b3673b8,0xffffffff5cc00000)
  to   space 64640K, 0% used [0xffffffff54f10000,0xffffffff54f10000,0xffffffff58e30000)
 PSOldGen        total 643072K, used 638477K [0xfffffffef0000000, 0xffffffff17400000, 0xffffffff50000000)
  object space 643072K, 99% used [0xfffffffef0000000,0xffffffff16f83500,0xffffffff17400000)
 PSPermGen       total 61440K, used 59926K [0xfffffffee8000000, 0xfffffffeebc00000, 0xfffffffef0000000)
  object space 61440K, 97% used [0xfffffffee8000000,0xfffffffeeba85980,0xfffffffeebc00000)

VM state:at safepoint (normal execution)

CPU:total 8 has_v8, has_v9, has_vis1, has_vis2, is_ultra3


% /net/cores.germany/cores/CA_37862788/cor_jj.dbx.info

  ---- called from signal handler with signal 10 (SIGBUS) ------
  [8] libjvm.so:MarkSweep::MarkAndPushClosure::do_oop(0xffffffff7f2b63e8, 0xfffffffec923cf60, 0xffffffff7f2b6560, 0x3, 0x0, 0xb0), at 0xffffffff7e9edf1c
  [9] libjvm.so:OopMapSet::all_do(0xffffffff701feec8, 0x1641, 0xffffffff701feee8, 0xffffffff7f2bb5f0, 0xffffffff7f2bb5f0, 0xffffffff7f2b63e8), at 0xffffffff7ea990a8
  [10] libjvm.so:OopMapSet::oops_do(0xffffffff701feec8, 0xffffffff704ea010, 0xffffffff701feee8, 0xffffffff7f2b63e8, 0xb1c0, 0x780eb4), at 0xffffffff7ea99184
  [11] libjvm.so:frame::oops_code_blob_do(0xffffffff701feec8, 0xffffffff7f2b63e8, 0xffffffff704ea010, 0xffffffff701feee8, 0xffffffff7ea5aa34, 0xffffffff7f21a000), at 0xffffffff7ea98d64
....
(dbx) frame 8
0xffffffff7e9edf1c: do_oop+0x0014:      ldx      [%i3], %o5
...
i0-i1    0xffffffff7f2b63e8 0xfffffffec923cf60
i2-i3    0xffffffff7f2b6560 0x0000000000000003
...
0xffffffff7e9edf08: do_oop       :      save     %sp, -176, %sp
0xffffffff7e9edf0c: do_oop+0x0004:      ldx      [%i1], %i3
0xffffffff7e9edf10: do_oop+0x0008:      cmp      %i3, 0
0xffffffff7e9edf14: do_oop+0x000c:      be,pn    %xcc,do_oop+0x224 ! 0xffffffff7e9ee12c
0xffffffff7e9edf18: do_oop+0x0010:      nop
0xffffffff7e9edf1c: do_oop+0x0014:      ldx      [%i3], %o5
...
(dbx) x 0xfffffffec923cf60/LX
0xfffffffec923cf60:      0x0000000000000003
...

% /net/cores.germany/cores/CA_37862788/cor_jj.dbx.args

-Xms512m -Xmx2g
-XX:NewSize=128m -XX:MaxNewSize=512m -XX:NewRatio=2
-XX:SurvivorRatio=6 -XX:MaxTenuringThreshold=100
-XX:MaxPermSize=128m -XX:PermSize=32m
-XX:ReservedCodeCacheSize=128m
-XX:+UseParallelGC
-XX:ParallelGCThreads=2
-XX:MaxGCPauseMillis=2000
-XX:ThreadStackSize=256
-XX:+UseMPSS
-XX:+UseBiasedLocking
-XX:+UseFastJNIAccessors
-XX:+UseFastEmptyMethods
-Xconcurrentio
-XX:+DisableExplicitGC

Comments
SUGGESTED FIX *** src/share/vm/opto/generateOptoStub.cpp Thu Jun 7 11:43:03 2007 --- ../ws_latest/hotspot/src/share/vm/opto/generateOptoStub.cpp Wed Jun 6 23:35:22 2007 *************** *** 1,8 **** #ifdef USE_PRAGMA_IDENT_SRC ! #pragma ident "@(#)generateOptoStub.cpp 1.90 05/08/16 16:04:33 JVM" #endif /* ! * @(#)generateOptoStub.cpp 1.90 05/08/16 * * Copyright 2005 Sun Microsystems, Inc. All rights reserved. * SUN PROPRIETARY/CONFIDENTIAL. Use is subject to license terms. --- 1,8 ---- #ifdef USE_PRAGMA_IDENT_SRC ! #pragma ident "@(#)generateOptoStub.cpp 1.91 07/06/06 23:35:22 JVM" #endif /* ! * @(#)generateOptoStub.cpp 1.91 07/06/06 * * Copyright 2005 Sun Microsystems, Inc. All rights reserved. * SUN PROPRIETARY/CONFIDENTIAL. Use is subject to license terms. *************** *** 454,459 **** --- 454,466 ---- _gvn.transform_no_reclaim(block); set_predefined_output_for_runtime_call(block, NULL); + if (UseBiasedLocking && method->is_synchronized()) { + // The oopmap for the runtime call must match the one used for + // the actual native call so make sure to add the monitor that + // was added above. + block->push_monitor(flock); + } + // Merge control flow post call RegionNode *region = new RegionNode(3); region->set_req( 1, control() );
07-06-2007

WORK AROUND -XX:-UseBiasedLocking or exclude java/lang/Throwable.fillInStackTrace
07-06-2007

EVALUATION Date: Wed, 06 Jun 2007 17:36:21 -0700 From: Tom Rodriguez <###@###.###> Subject: Re: [Fwd: Re: Fwd: Case 37706070 task 43784401: CR 6484364 ....] So I figured out what's wrong. Here's the code of interest: 0xffffffff75653f18: call %o7 + 0x23c 0xffffffff75653f1c: mov %g2, %l7 public native synchronized java.lang.Throwable fillInStackTrace() @0xffffffff6fc2e268 of public class java.lang.Thr\ owable @0xffffffff6fc2e7e8 @ bci = 0 monitors (owner = %i0, oop, lock = stack[384], normal) OopMap: at call = true Oop: %i0 [176] 0xffffffff75653f20: mov %l7, %g2 0xffffffff75653f24: mov 0x5, %l0 0xffffffff75653f28: st %l0, [%l4 + 0x198] 0xffffffff75653f2c: sethi %hi(0x836ffc00), %l0 0xffffffff75653f30: btog 0xfffffc00, %l0 0xffffffff75653f34: inc 0x0, %l0 0xffffffff75653f38: sethi %hi(0x80d3d400), %l1 0xffffffff75653f3c: btog 0xfffffc00, %l1 0xffffffff75653f40: inc 0x1f0, %l1 0xffffffff75653f44: sethi %hi(0x1c00), %l2 0xffffffff75653f48: bset 0x3fc, %l2 0xffffffff75653f4c: srlx %l4, 3, %l3 0xffffffff75653f50: and %l3, %l2, %l2 0xffffffff75653f54: clr [%l0 + %l2] 0xffffffff75653f58: ld [%l4 + 0x3c], %l0 0xffffffff75653f5c: ld [%l1 + %g0], %l1 0xffffffff75653f60: bset %l0, %l1 0xffffffff75653f64: mov %o0, %i0 0xffffffff75653f68: cmp %l1, 0x0 0xffffffff75653f6c: bne,pn %icc, 0xffffffff75654004 0xffffffff75653f70: nop 0xffffffff75654004: mov %l4, %o0 0xffffffff75654008: sethi %hi(0x80ff2000), %o7 0xffffffff7565400c: btog 0xfffffc00, %o7 0xffffffff75654010: nop 0xffffffff75654014: nop 0xffffffff75654018: nop 0xffffffff7565401c: nop 0xffffffff75654020: nop 0xffffffff75654024: call %o7 + 0x270 0xffffffff75654028: mov %g2, %l7 public native synchronized java.lang.Throwable fillInStackTrace() @0xffffffff6fc2e268 of public class java.lang.Thr\ owable @0xffffffff6fc2e7e8 @ bci = 0 OopMap: at call = true Oop: [176] The first call is the actual native call and the second is the call to check_special_condition_for_native_trans. The first thing that's wrong is that they don't have the same oopmap. While the thread in native is considered to be stopped it may execute the code between the native call and the check for SafepointSynchronize::_block being set while at the same time a GC thread may be inspecting the frame. If the oop maps don't agree clearly something isn't right. In particular %i0 is no longer considered to contain an oop but when the safepoint started it was. In the crash you can see that we are here: oop DefNewGeneration::copy_to_survivor_space(oop old, oop* from) { with from == %i2 == 0xfffffffef603bc00 and old == %i5 == 0xfffffffef603bc70. The reason it's in %i5 instead of %i1 is that %i1 was copied into %i5 and %i1 was destroyed. So all of this is ok. The problem is that the location this oopmap entry refers to is no longer an oop. %i2 is SP+64 which corresponds to the address of the register I0 in the register window save area. I0 gets overwritten with the handle returned from the native method at instruction 0xffffffff75653f64. That handle is SP+176, which is 0xfffffffef603bc70. One thing I can't make sense of is that in the core the oopmap for the native call site has %i0 and [176] as oops but when I use +PrintNativeNMethods to dump it from the latest 1.5.0 it only contains [176]. I just realized that they have explicitly turned on UseBiasedLocking and once I add that I get code which looks the same from 1.5. So now I can see what's wrong. The fix for 6298299 added debug info for the monitor for use with biased locking, but it didn't add the monitor to the check_special_condition_for_native_trans call site. Once you do that the oop maps agree and the code seems to work. So here's a fix: *** /tmp/geta9574 Wed Jun 6 17:28:45 2007 --- generateOptoStub.cpp Wed Jun 6 17:25:20 2007 *************** *** 454,459 **** --- 454,466 ---- _gvn.transform_no_reclaim(block); set_predefined_output_for_runtime_call(block, NULL); check_special_condition_for_native_trans call site. Once you do that the oop maps agree and the code seems to work. So here's a fix: *** /tmp/geta9574 Wed Jun 6 17:28:45 2007 --- generateOptoStub.cpp Wed Jun 6 17:25:20 2007 *************** *** 454,459 **** --- 454,466 ---- _gvn.transform_no_reclaim(block); set_predefined_output_for_runtime_call(block, NULL); + if (UseBiasedLocking && method->is_synchronized()) { + // The oopmap for the runtime call must match the one used for + // the actual native call so make sure to add the monitor that + // was added above. + block->push_monitor(flock); + } + // Merge control flow post call RegionNode *region = new RegionNode(3); region->set_req( 1, control() ); and here's a test case that reproduces the crash when run with -XX:+UseBiasedLocking. public class fis { public static void main(String[] args) { int max = Runtime.getRuntime().availableProcessors() * 2; for (int i = 0; i < max; i++) { new Thread(new Runnable() { public void run() { Throwable t = new Throwable(); while (true) { t.fillInStackTrace(); } } }).start(); } while (true) { synchronized (args) { try { args.wait(100); } catch (Exception e) { } System.gc(); System.out.print('.'); } } } } 1.6 doesn't have the problem since it was more carefully coded and client in 1.5 doesn't seem to be effected either. So obvious workarounds are to exclude java/lang/Throwable.fillInStackTrace from compilation or to disable biased locking. Even with this fix we're generally getting lucky that C2's generated stubs are emitting similar things at both call sites. It's not surprising that it does the right thing but it's not really guaranteed. tom
07-06-2007