Bug ID: JDK-5048441 Intermittent crashes in Java MarkSweep garbage collection methods

Type: Bug
Component: hotspot
Sub-Component: runtime
Affected Version: 6

Priority: P3
Status: Closed
Resolution: Cannot Reproduce
OS: solaris_8
CPU: sparc

Submitted: 2004-05-17
Updated: 2005-08-08
Resolved: 2005-08-08

Cingular is experiencing a crash in the JVM 3-4 times per week in production.  The runtime environment consists of JVM 1.4.2_03 with WebLogic Server 8.1 SP2.  The crashes are exhibiting multiple symptoms (some of which look like Bug ID 5008819), but appear to be C2 HotSpot compiler crashes:

 --- called from signal handler with signal -14217216 (SIG Unknown) ---
const Type*URShiftINode::Value(PhaseTransform*)const
Node*PhaseIterGVN::transform_old(Node*)
void PhaseIterGVN::optimize()
void Compile::Optimize()
Compile::Compile(ciEnv*,ciScope*,ciMethod*,int,int,int)
void C2Compiler::compile_method(ciEnv*,ciScope*,ciMethod*,int,int)
void CompileBroker::invoke_compiler_on_method(CompileTask*)
void CompileBroker::compiler_thread_loop()
void JavaThread::run()
_start (f0970, ff271000, 0, 0, 0, 0) + 134        
_lwp_start (0, 0, 0, 0, 0, 0)          


###@###.### 2004-05-17
###@###.### 2004-05-17

EVALUATION ###@###.### 2004-05-17 The crash in URShiftINode::Value() must be the 4951940 bug which was fixed in 1.4.2_05. Try to use 1.4.2_05. ###@###.### 2004-06-03 I investigated the second crash and I think it is GC problem. I have 2Gb core file in /net/jaberwocky/export/home2/work/bugs/5048441/ccbmw01 The crash happend in the next part of MarkSweep::preserve_mark() method: _preserved_mark_stack->push(mark); 0xfed1c110: preserve_mark+0x00e8: mov %l3, %o0 0xfed1c114: preserve_mark+0x00ec: ld [%l3 + 0x4], %g2 0xfed1c118: preserve_mark+0x00f0: ld [%l3], %g5 0xfed1c11c: preserve_mark+0x00f4: cmp %g5, %g2 0xfed1c120: preserve_mark+0x00f8: bne,pt %icc,preserve_mark+0x110 0xfed1c124: preserve_mark+0x00fc: sethi %hi(0x5400), %g2 0xfed1c128: preserve_mark+0x0100: call grow 0xfed1c12c: preserve_mark+0x0104: mov %g5, %o1 0xfed1c130: preserve_mark+0x0108: ld [%l3], %g5 0xfed1c134: preserve_mark+0x010c: sethi %hi(0x5400), %g2 0xfed1c138: preserve_mark+0x0110: ld [%l3 + 0x8], %g3 0xfed1c13c: preserve_mark+0x0114: add %g5, 0x1, %g4 0xfed1c140: preserve_mark+0x0118: add %g2, 0x11c, %g2 0xfed1c144: preserve_mark+0x011c: st %g4, [%l3] 0xfed1c148: preserve_mark+0x0120: sll %g5, 0x2, %g4 0xfed1c14c: preserve_mark+0x0124: ld [%l2 + %g2], %g2 0xfed1c150: preserve_mark+0x0128: st %l1, [%g3 + %g4] <<< SIGBUS here g0-g3 0x00000000 0x00004000 0xff1c2514 0x00000002 g4-g7 0x00000008 0x00000002 0x00000000 0xff270200 o0-o3 0x02916f14 0x8212d588 0x00000028 0xff17e000 o4-o7 0x00000000 0x0000e805 0xfc77f620 0xfed1c02c l0-l3 0xf23c5b10 0x47197381 0xff17e000 0x02916f14 l4-l7 0x02916b18 0xff1d7998 0xffffffff 0x00000004 i0-i3 0xf23c5b10 0x47197381 0x00e23120 0xff17e000 i4-i7 0x00000000 0x029170ac 0xfc77f680 0xfecc72f0 y 0x00000000 ccr 0x00000009 pc 0xfed1c150:preserve_mark+0x128 st %l1, [%g3 + %g4] So it nothing to do with oops. And I think it is something to do with the size of the Heap == 2^31 (and we use 32bit VM). And the fact that it is full: Heap at VM Abort: Heap def new generation total 235968K, used 235968K [0x72000000, 0x82000000, 0x82000000) eden space 209792K, 100% used [0x72000000, 0x7ece0000, 0x7ece0000) from space 26176K, 100% used [0x80670000, 0x82000000, 0x82000000) to space 26176K, 0% used [0x7ece0000, 0x7ece0000, 0x80670000) tenured generation total 1835008K, used 1606773K [0x82000000, 0xf2000000, 0xf2000000) the space 1835008K, 87% used [0x82000000, 0xe411d558, 0xe411d600, 0xf2000000) compacting perm gen total 131072K, used 47593K [0xf2000000, 0xfa000000, 0xfa000000) the space 131072K, 36% used [0xf2000000, 0xf4e7a718, 0xf4e7a800, 0xfa000000) According to the core file the next parameters were used: java -server -verbose:gc -Xms2048m -Xmx2048m -XX:MaxPermSize=128m -XX:PermSize=128m -XX:NewSize=256m -XX:MaxNewSize=256m -XX:SurvivorRatio=8 ... It is Solaris 8: @(#)SunOS 5.8 Generic 111297-01 April 2001 Assuming that the first crash is fixed in 1.4.2_05 I will pass this bug to GC to investigate the second crash. It could be runtime issue with memory allocations of such size. but I don't see how it could be C2 problem (all compiler threads are waiting new task state). ---------------------------------------------------------------- Let me see if I can disassemble this. MarkSweep::preserve_mark gets to this point: _preserved_mark_stack->push(mark); and _preserved_mark_stack is declared as static GrowableArray<markOop>* _preserved_mark_stack; so that's really a call to GrowableArray<markOop>::push which is defined as void push(const E elem) { append(elem); } which inlines to void append(const E elem) { check_nesting(); if (_len == _max) grow(_len); _data[_len++] = (GrET*) elem; } and knowing that GenericGrowableArray defines the fields int _len; // current length int _max; // maximum length GrET** _data; // data array so _len is at offset 0, _max is at offset 4, and _data is at offset 8, that corresponds to the disassembly you show: /* original_len = _len */ 0xfed1c130: preserve_mark+0x0108: ld [%l3], %g5 /* ??? */ 0xfed1c134: preserve_mark+0x010c: sethi %hi(0x5400), %g2 /* address_of_data = _data */ 0xfed1c138: preserve_mark+0x0110: ld [%l3 + 0x8], %g3 /* new_len = original_len + 1 */ 0xfed1c13c: preserve_mark+0x0114: add %g5, 0x1, %g4 /* ??? */ 0xfed1c140: preserve_mark+0x0118: add %g2, 0x11c, %g2 /* _len = new_len */ 0xfed1c144: preserve_mark+0x011c: st %g4, [%l3] /* offset = original_len * sizeof(GrET*) */ 0xfed1c148: preserve_mark+0x0120: sll %g5, 0x2, %g4 /* ??? */ 0xfed1c14c: preserve_mark+0x0124: ld [%l2 + %g2], %g2 /* address_of_data+offset = l1 */ 0xfed1c150: preserve_mark+0x0128: st %l1, [%g3 + %g4] <<< SIGBUS here which would be okay, except that we have original_len: g5: 0x00000002 address_of_data: g3: 0x00000002 offset: g4: 0x00000008 so address_of_data+offset is going to be 0x0000000a, which is misaligned *and* on the zeroth page, so if we hadn't gotten a SIGBUS we would have gotten a SIGFAULT. It is curious that original_length (g5) is 2. That's either because we are just starting to push things onto the GrowableArray<oop*>, or because we've wrapped around the *int* used for _len and _max. I have trouble believing we've wrapped, given that we only have a 2GB heap, so we shouldn't be able to have more than 256K minimal objects, so even if they all were locked or had hashcodes (and so had to be pushed), we wouldn't be any where near wrapping. (This is a concern for the 64-bit VM, though, where we could have more than 2G of objects that had to be pushed.) If you run with -XX:+PrintGC -XX:+Verbose, you should get lines with your full collections from Restoring %d marks that say how many marks were pushed and restored. It would be interesting to see if this number is relatively modest (e.g., a few thousands or less) or ridiculously large (e.g., approaching 2^31). ###@###.### 2004-06-07 ---------------------------------------------------------------- Since we haven't seen a failure of the GrowableArray code in quite a while, what's the chance this is a memory smash? The fact that both the _len and _data fields are 0x2 is suspicious. Is there any user JNI code running? How does the program behave when run with -Xcheck:jni? (Though, in JDK-1.4.2, -Xcheck:jni wasn't nearly as good as it is in JDK-1.5.0.) Do we have any more core files to examine for similarities? Without more data I might have to mark this bug "incomplete". ###@###.### 2004-06-22 ---------------------------------------------------------------- Sending to runtime team for evaluation as this appears to be a memory stomp. ###@###.### 2005-06-13 18:51:05 GMT Will be closing this bug in 1 month 7/13/2005 unless we here back from customer with more details.. Please let us know if this issue is present in 1.4.2_09 release of the JDK .. Been almost a year since the last update. I guess this is not an issue anymore or customer found a workaround. De-commit from mustang.. ###@###.### 2005-06-13 20:52:06 GMT

13-06-2005

CONVERTED DATA BugTraq+ Release Management Values COMMIT TO FIX: mustang

17-09-2004

Duplicate :	JDK-5048446 - Intermittent crashes in Java Compiler2 code
Relates :	JDK-6276921 - SIGBUS in Nightly testing on sparcv9
Relates :	JDK-4951940 - Win: Server VM crashes with test/java/util/Date/DateGregorianCalendarTest.java