JDK-4746263 : JDK 1.4.1 dumps core during ECperf; fails debug assert top <= end
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: unknown
  • Priority: P2
  • Status: Resolved
  • Resolution: Fixed
  • OS: solaris_8,solaris_9
  • CPU: sparc
  • Submitted: 2002-09-12
  • Updated: 2002-09-25
  • Resolved: 2002-09-25
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other Other
1.4.1_01 01Fixed 1.4.2Fixed
Related Reports
Relates :  
Description
Using various builds of JDK 1.4.1 (builds 15-21) on an 8x900Mhz V880,
one or more JVM instances (we use 4 instances of BEA weblogic 7.0)
will terminate intermittantly with either SIGSEGV or SIGBUS at
elevated loads. The problem has been reproduced on 2 different
V880 machines running the benchmark.

JDK 1.4.0 does not exhibit this behavior.

The options used are:

-server -XX:+AggressiveHeap -Xmx2500m -Xms2500m -Xmn300m -XX:MaxTenuringThreshold=3 -XX:SurvivorRatio=20

Currently testing with "-XX:-InlineArrayCopy" option.


Sample reports:

=======================================================================
# more hs_err_pid14484.log

Unexpected Signal : 10 occurred at PC=0xFECC5284
Function=[Unknown. Nearest: SUNWprivate_1.1+0xC5284]
Library=/export/home/VMs/j2sdk1.4.1_MPSS/jre/lib/sparc/server/libjvm.so


Dynamic libraries:
0x10000 	/export/home/VMs/j2sdk1.4.1/bin/java
0xff360000 	/usr/lib/lwp/libthread.so.1
0xff390000 	/usr/lib/libdl.so.1
0xff280000 	/usr/lib/libc.so.1
0xff350000 	/usr/platform/SUNW,Sun-Fire-880/lib/libc_psr.so.1
0xfec00000 	/export/home/VMs/j2sdk1.4.1_MPSS/jre/lib/sparc/server/libjvm.so
0xff230000 	/usr/lib/libCrun.so.1
0xff200000 	/usr/lib/libsocket.so.1
0xfeb00000 	/usr/lib/libnsl.so.1
0xfebd0000 	/usr/lib/libm.so.1
0xff260000 	/usr/lib/libw.so.1
0xfeae0000 	/usr/lib/libmp.so.2
0xfeac0000 	/usr/lib/librt.so.1
0xfeaa0000 	/usr/lib/libaio.so.1
0xfea70000 	/export/home/VMs/j2sdk1.4.1_MPSS/jre/lib/sparc/native_threads/libhpi.so
0xfea30000 	/export/home/VMs/j2sdk1.4.1_MPSS/jre/lib/sparc/libverify.so
0xfe9f0000 	/export/home/VMs/j2sdk1.4.1_MPSS/jre/lib/sparc/libjava.so
0xfe9d0000 	/export/home/VMs/j2sdk1.4.1_MPSS/jre/lib/sparc/libzip.so
0xfab90000 	/export/home/VMs/j2sdk1.4.1_MPSS/jre/lib/sparc/libnet.so
0xfabe0000 	/export/home/VMs/j2sdk1.4.1_MPSS/jre/lib/sparc/libioser12.so
0x5a3b0000 	/export/home/ecperf/wls7_0.fcs/weblogic700/server/lib/solaris/libmuxer.so

Local Time = Wed Sep 11 16:30:11 2002
Elapsed Time = 2080
#
# HotSpot Virtual Machine Error : 10
# Error ID : 4F530E43505002E4 01
# Please report this error at
# http://java.sun.com/cgi-bin/bugreport.cgi
#
# Java VM: Java HotSpot(TM) Server VM (1.4.1-internal mixed mode)
#
===================================================================='
more hs_err_pid684.log

Unexpected Signal : 10 occurred at PC=0xFB001E78
Function=registerSynchronization (compiled Java code)
Library=(N/A)

Current Java thread:

Dynamic libraries:
0x10000 	/export/home/VMs/j2sdk1.4.1/bin/java
0xff350000 	/usr/lib/lwp/libthread.so.1
0xff380000 	/usr/lib/libdl.so.1
0xff200000 	/usr/lib/libc.so.1
0xff340000 	/usr/platform/SUNW,Sun-Fire-880/lib/libc_psr.so.1
0xfec00000 	/export2/VMs/j2sdk1.4.1-rc-b19/jre/lib/sparc/server/libjvm.so
0xff2e0000 	/usr/lib/libCrun.so.1
0xfebe0000 	/usr/lib/libsocket.so.1
0xfeb00000 	/usr/lib/libnsl.so.1
0xfead0000 	/usr/lib/libm.so.1
0xff310000 	/usr/lib/libw.so.1
0xfebc0000 	/usr/lib/libmp.so.2
0xfeaa0000 	/usr/lib/librt.so.1
0xfea80000 	/usr/lib/libaio.so.1
0xfea60000 	/usr/lib/libmd5.so.1
0xfea40000 	/usr/platform/SUNW,Sun-Fire-880/lib/libmd5_psr.so.1
0xfea10000 	/export2/VMs/j2sdk1.4.1-rc-b19/jre/lib/sparc/native_threads/libhpi.so
0xfe9d0000 	/export2/VMs/j2sdk1.4.1-rc-b19/jre/lib/sparc/libverify.so
0xfe990000 	/export2/VMs/j2sdk1.4.1-rc-b19/jre/lib/sparc/libjava.so
0xfe970000 	/export2/VMs/j2sdk1.4.1-rc-b19/jre/lib/sparc/libzip.so
0xfa9d0000 	/export2/VMs/j2sdk1.4.1-rc-b19/jre/lib/sparc/libnet.so
0x5a3d0000 	/export2/VMs/j2sdk1.4.1-rc-b19/jre/lib/sparc/libioser12.so
0xfa880000 	er.so

Local Time = Wed Sep  4 12:34:29 2002
Elapsed Time = 2102
#
# HotSpot Virtual Machine Error : 10
# Error ID : 4F530E43505002E6 01
# Please report this error at
# http://java.sun.com/cgi-bin/bugreport.cgi
#
# Java VM: Java HotSpot(TM) Server VM (1.4.1-rc-b19 mixed mode)
#




Comments
CONVERTED DATA BugTraq+ Release Management Values COMMIT TO FIX: 1.4.1_01 mantis FIXED IN: 1.4.1_01 mantis INTEGRATED IN: 1.4.1_01 mantis mantis-b03
14-06-2004

EVALUATION I've looked at one of the core files ( /export/home/jvm_cores/j2sdk1.4.1.b21/core.1 ) with SA and the problem that caused the seg. fault is a bad oop residing in a java/util/Hashtable. There wasn't any clue about where the bad oop came from. However while I was doing this a run was being made with a debug vm and it failed with the following assert: # # HotSpot Virtual Machine Error, assertion failure # Please report this error at # http://java.sun.com/cgi-bin/bugreport.cgi # # Java VM: Java HotSpot(TM) Server VM (1.4.1-b21-debug mixed mode) # # assert(top <= end, "pointers out of order") # I captured a core file from this run in /export/home/jvm_cores/j2sdk1.4.1.b21_debug/ecperf.24614. I looked at this with a combination of dbx and SA. This looks to me like compiled code has cached a TLAB top value across a gc point. At the gc point the TLAB was taken from the thread and reset. The value of the TLAB for the thread in question is: thread top end size 0x00000000 0x7bf5cd90 0x00000000 0x00000800 The size (and everything but top) is what you'd expect it a ThreadLocalAllocBuffer::reset() was called for this TLAB. The value of top is a legal heap value (in eden). The stack trace for this thread shows that it was executing in weblogic/jdbc/jts/Connection.prepareStatement(Ljava/lang/String;) The thread in question was t@41. I've attached a disassembly of that nmethod and a dump of its registers at the point it did the call (bad.Z). A snippet of the stack trace is here: =>[1] _libc_nanosleep(0x53d7dbe8, 0x53d7dbe0, 0x0, 0x0, 0x4, 0x0), at 0xff2995ac [2] _sleep(0x64, 0x400, 0xff2be164, 0x53d7e0dc, 0x2f, 0x53d7e8dc), at 0xff35e230 [3] os::message_box(0xfec3ba15, 0x53d7e0dc, 0x53d7e12b, 0x53d7e0dc, 0xfeda5356, 0x8eae9d10), at 0xfe8fbd70 [4] report_error(0x1, 0xfec6f496, 0x5e, 0xfec3ba15, 0xfec3ba27, 0xfec6f48b), at 0xfe4a7e04 [5] report_assertion_failure(0xfec6f48b, 0xfec6f496, 0x5e, 0xfec6f4e6, 0x1, 0xf738db98), at 0xfe4a7170 [6] CollectedHeap::allocate_from_tlab(0x3983488, 0x4, 0x0, 0x0, 0x0, 0x0), at 0xfe5e995c [7] CollectedHeap::common_mem_allocate_noinit(0x4, 0x0, 0x3983488, 0x0, 0x0, 0x0), at 0xfe5e8a50 [8] CollectedHeap::common_mem_allocate_init(0x4, 0x0, 0x3983488, 0x0, 0x1, 0x0), at 0xfe5e47ac [9] CollectedHeap::obj_allocate(0x53d7ec7c, 0x4, 0x3983488, 0x0, 0x0, 0x0), at 0xfe5e0c84 [10] instanceKlass::allocate_instance(0xf79ab458, 0x3983488, 0x0, 0x0, 0x0, 0x0), at 0xfe5d0a78 [11] OptoRuntime::new_C(0xf79ab450, 0x3983488, 0x6f, 0x8eae9258, 0xb8e95a6e, 0x8eae9d10), at 0xfe9fcbc0 [12] 0xfa4b0958(0xf79ab450, 0x7f, 0x8c777216, 0x1, 0x7f, 0x8c777212), at 0xfa4b0957 [13] 0xfa952f34(0xf602a198, 0xf6039e40, 0x3983488, 0x8eaea3b8, 0x1, 0xf738db98), at 0xfa952f33 [14] 0xfa959aac(0x8c777268, 0x7bf5cd58, 0xf79bdbb0, 0x8c777280, 0x4, 0x0), at 0xfa959aab [15] 0xfa8f82b4(0xb4d06500, 0xb4d06500, 0xf77f21f8, 0x7bf5cd58, 0xf7b4b400, 0xf79e1b38), at 0xfa8f82b3 This seems to me to be a compiler bug (or gc) and not a likely runtime bug. ###@###.### 2002-09-13 ----- ----- ###@###.### 2002-09-13 This week Cliff has been investigating the assert failure: # assert(top <= end, "pointers out of order") with end==0 and top== pre-GC value of top plus 24 bytes. This reproduces in < 10 minutes with the JIntegra app from Intrinsyc, 4726812. The source of the problem has not been identified, but Cliff is instrumenting to record in each thread object its state when GC starts, and each safepoint's PC. One theory is that the thread is executing native code at the start of GC and escapes being stopped. Another is that part of the generated code's allocation idiom, the update of 'top', has drifted across a safepoint. The JIntegra problem has been confirmed as a problem in generated code. The allocation idiom's load of 'top' precedes a safepoint and the store of (top + size) occurs after the safepoint. ----- ----- ###@###.### 2002-09-20 Testing of the fix for split-if and depends_only_on_test() indicates that this initial problem has been fixed. Original 70% failure rate has been reduced to 15% and investigation indicates that this is a different bug. I am closing this BugId and opening one for the crash that was hidden by this problem. ----- -----
11-06-2004

WORK AROUND Use JDK 1.4.0_01
11-06-2004

SUGGESTED FIX ###@###.### 2002-09-20 Following fix provided by cliffc after investigation of the same failure running JIntegra, 4726812. src/share/vm/opto/loopopts.cpp [1.174 vs. 1.178] @@ -635,9 +635,17 @@ void PhaseIdealLoop::split_if_with_block // If trying to do a 'Split-If' at the loop head, it is only // profitable if the cmp folds up on BOTH paths. Otherwise we // risk peeling a loop forever. - // CNC - Disabled for now. - if( n_ctrl->is_Loop() ) - policy = 999; // Policy requires BOTH paths to win + + // CNC - Disabled for now. Requires careful handling of loop + // body selection for the cloned code. Also, make sure we check + // for any input path not being in the same loop as n_ctrl. For + // irreducible loops we cannot check for 'n_ctrl->is_Loop()' + // because the alternative loop entry points won't be converted + // into LoopNodes. + IdealLoopTree *n_loop = get_loop(n_ctrl); + for( uint j = 1; j < n_ctrl->req(); j++ ) + if( get_loop(n_ctrl->in(j)) != n_loop ) + return; // Split compare 'n' through the merge point if it is profitable Node *phi = split_thru_phi( n, n_ctrl, policy ); src/share/vm/opto/memnode.hpp [1.93 vs. 1.95] @@ -223,6 +223,16 @@ public: virtual int Opcode() const; virtual uint ideal_reg() const { return Op_RegP; } virtual int store_Opcode() const { return Op_StoreP; } + // depends_only_on_test is almost always true, and needs to be almost always + // true to enable key hoisting & commoning optimizations. However, for the + // special case of RawPtr loads from TLS top & end, the control edge carries + // the dependence preventing hoisting past a Safepoint instead of the memory + // edge. (An unfortunate consequence of having Safepoints not set Raw + // Memory; itself an unfortunate consequence of having Nodes which produce + // results (new raw memory state) inside of loops preventing all manner of + // other optimizations). Basically, it's ugly but so is the alternative. + // See comment in graphkit.cpp, around line 1923 GraphKit::allocate_heap. + virtual bool depends_only_on_test() const { return adr_type() != TypeRawPtr::BOTTOM; } }; //------------------------------LoadKlassNode---------------------------------- @@ -233,6 +243,7 @@ public: : LoadPNode(c,mem,adr,at,tk) {} virtual int Opcode() const; virtual const Type *Value( PhaseTransform *phase ) const; + virtual bool depends_only_on_test() const { return true; } }; //------------------------------LoadSNode-------------------------------------- @@ -367,6 +378,7 @@ public: : LoadPNode(c,mem,adr,TypeRawPtr::BOTTOM, TypeRawPtr::BOTTOM) {} virtual int Opcode() const; virtual int store_Opcode() const { return Op_StorePConditional; } + virtual bool depends_only_on_test() const { return true; } }; //------------------------------LoadLLockedNode---------------------------------
11-06-2004