JDK-4833582 : HotSpot Virtual Machine Error : 11 when running MQ stress test...
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 1.4.1_02
  • Priority: P1
  • Status: Closed
  • Resolution: Fixed
  • OS: solaris_10
  • CPU: x86
  • Submitted: 2003-03-17
  • Updated: 2012-10-08
  • Resolved: 2003-04-28
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
1.4.1_03 03Fixed
Related Reports
Relates :  
Relates :  
Description
HotSpot Virtual Machine Error : 11 when running MQ stress test...

Full error log.
Unexpected Signal : 11 occurred at PC=0xDD4C0F49
Function=[Unknown. Nearest: JVM_ArrayCopy+0x2CC7D]
Library=/usr/j2se/jre/lib/i386/server/libjvm.so


Dynamic libraries:
0x8050000       /usr/j2se/jre/bin/java
0xddb90000      /usr/bin/../../usr/lib/libthread.so.1
0xddbb0000      /usr/bin/../../usr/lib/libdl.so.1
0xddac0000      /usr/bin/../../usr/lib/libc.so.1
0xdd400000      /usr/j2se/jre/lib/i386/server/libjvm.so
0xdda80000      /usr/bin/../../usr/lib/libCrun.so.1
0xdda60000      /usr/bin/../../usr/lib/libsocket.so.1
0xdd9c0000      /usr/bin/../../usr/lib/libnsl.so.1
0xdd9a0000      /usr/bin/../../usr/lib/libm.so.1
0xddb80000      /usr/bin/../../usr/lib/libw.so.1
0xdd970000      /usr/bin/../../usr/lib/libmp.so.2
0xdd940000      /usr/j2se/jre/lib/i386/native_threads/libhpi.so
0xdd3d0000      /usr/j2se/jre/lib/i386/libverify.so
0xdd390000      /usr/j2se/jre/lib/i386/libjava.so
0xdd360000      /usr/j2se/jre/lib/i386/libzip.so
0xb4d40000      /usr/lib/libimqutil.so.1
0xb4d20000      /usr/j2se/jre/lib/i386/libnet.so
0xb4d00000      /usr/bin/../../usr/lib/nss_files.so.1
0xb4c90000      /usr/j2se/jre/lib/i386/libnio.so
0xb4c70000      /usr/bin/../../usr/lib/librt.so.1
0xb4c50000      /usr/bin/../../usr/lib/libaio.so.1
0xb4c20000      /usr/bin/../../usr/lib/libmd5.so.1

Local Time = Fri Mar 14 21:44:42 2003
Elapsed Time = 8576
#
# HotSpot Virtual Machine Error : 11
# Error ID : 4F530E43505002E6
# Please report this error at
# http://java.sun.com/cgi-bin/bugreport.cgi
#
# Java VM: Java HotSpot(TM) Server VM (1.4.1_02-b06 mixed mode)
#
# An error report file has been saved as hs_err_pid1060.log.
# Please refer to the file for further information.
#
Abort - core dumped
# 

How to reproduce the bug?
==========================
       
    1) Install MQ3.0.1sp1 on your test machine. For this download bits from
http://jpgserv.red.iplanet.com/JMQ.builds/releases/FCS-3.0.1SP1/bundles/

Install doc is located @
http://docs.sun.com/source/816-6453-10/index.html
http://docs.sun.com/source/817-0355-10/index.html
http://docs.sun.com/source/817-0354-10/index.html

2) Start two MQ brokers in cluster mode as follows.

imqbrokerd -tty -name broker1 -port 7676 -cluster :7777 (on terminal #1)
imqbrokerd -tty -name broker2 -port 7777 -cluster :7676 (on terminal #2)

3) Download Longevity testsuite (Longevity2.jar) from http://jpgserv.red.iplanet.com/imq_qa
to your test location.
    
4) Add Longevity2.jar, "$IMQ_ROOT/lib/imq.jar" and "$IMQ_ROOT/lib/jms.jar" to CLASSPATH

5) start MQ Comsumer from broker2
       java -cp $CLASSPATH -DimqBrokerHostName=localhost -DimqBrokerHostPort=7777 Longevity.LongevityConsumer

6) Start MQ Producer from broker1
       java -cp $CLASSPATH -DimqBrokerHostName=localhost -DimqBrokerHostPort=7676 Longevity.LongevityProducer 2 2 25

For more info on test document please refer to
http://jpgserv.red.iplanet.com/imq_qa/testcases/Longevity_Testsuite.html
http://jpgserv.red.iplanet.com/imq_qa/testcases/Longevity-Howto.html

The bug shows up in the Broker (server) vm after couple of hours....




Comments
CONVERTED DATA BugTraq+ Release Management Values COMMIT TO FIX: 1.4.1_03 FIXED IN: 1.4.1_03 INTEGRATED IN: 1.4.1_03
14-06-2004

EVALUATION I think I found the binaries another way.. I have installed the MQ app and copied the Longevity2.jar file and now I running out of mem... What args do I need to set and where for setting up the same as you? ###@###.### 2003-03-18 What do you mean by broker server.. There appears to be 2 broker's broker1 on port 7777 and another on port 7676 ? I also see these messages in produce terminal window. JMS Exception Occured while SyncUpwithSubscriber Up at Tue Mar 18 12:34:14 EST 2003 javax.jms.JMSException: [C4073]: Consumer limit exceeded on destination LongevityQSyncUp at com.sun.messaging.jmq.jmsclient.ProtocolHandler.addInterest(ProtocolHandler.java:1430) at com.sun.messaging.jmq.jmsclient.WriteChannel.addInterest(WriteChannel.java:55) at com.sun.messaging.jmq.jmsclient.ConnectionImpl.addInterest(ConnectionImpl.java:631) at com.sun.messaging.jmq.jmsclient.Consumer.registerInterest(Consumer.java:101) at com.sun.messaging.jmq.jmsclient.MessageConsumerImpl.addInterest(MessageConsumerImpl.java:127) at com.sun.messaging.jmq.jmsclient.MessageConsumerImpl.init(MessageConsumerImpl.java:122) at com.sun.messaging.jmq.jmsclient.QueueReceiverImpl.<init>(QueueReceiverImpl.java:40) at com.sun.messaging.jmq.jmsclient.UnifiedSessionImpl.createReceiver(UnifiedSessionImpl.java:83) at Longevity.SyncUp.SyncUpwithSubscriber(SyncUp.java:30) at Longevity.LongevityProducer.UpdateCount(LongevityProducer.java:174) at Longevity.SessionsThread.doProduceMessage(SessionsThread.java:524) at Longevity.SessionsThread.run(SessionsThread.java:83) Exception Occured on Sessions Thread at Tue Mar 18 12:34:14 EST 2003 java.lang.NullPointerException at Longevity.SyncUp.SyncUpwithSubscriber(SyncUp.java:51) at Longevity.LongevityProducer.UpdateCount(LongevityProducer.java:174) at Longevity.SessionsThread.doProduceMessage(SessionsThread.java:524) at Longevity.SessionsThread.run(SessionsThread.java:83) Is this normal? Looks like a connection failed and thus returned with an exception. ###@###.### 2003-03-18 And another message.. at Longevity.SessionsThread.run(SessionsThread.java:83) [*0] [*1] 500 ___ [#500.0|Tue Mar 18 12:37:38 EST 2003] [##500.0] [*0] [*1] 600 ... [*0] [*1] [*0] [*1] 600 ___ 700 ... [*0] [*1] [#750.0|Tue Mar 18 12:47:35 EST 2003] Message No. got 500.0 Message Out of Order... Expected Message No. 750.0 Then it appears to do nothing.. test returns to command line. producer exits... ###@###.### 2003-03-18 1. to increase the memory, we use -Xmx512m (I assigned the process 512MB ram each) 2. To increase the broker process memory append "-vmargs -Xmx512m" to both the broker startup commands. 3. There are only two brokers. one running on port 7676 and the other running on port 7777 4. Actually when ever the producer/consumer is restarted, the left over or pending messages in the MQ broker has to be cleaned up. Otherwise, the test will receive message out of order. So, it's a good idea to start the brokers fresh with -reset store option when the clients are restarted. Please let me know if you need more info... ###@###.### 2003-03-18 This is not a mantis issue. Please update bug with mantis-na ###@###.### 2003-03-19 re-assigning to JPSE team for further eval. Internal customer wants to raise an escalation on this bug. ###@###.### 2003-03-20 i have not been able to reproduce the crash so far, i have tried on a s10 machine for 3 of 3-4 hours run and on one of Gary's machine for an >20 hours run, i will keep trying. in the meantime, Mathi, do you have a core dump from the crash you have seen? ###@###.### 2003-03-21 gary has given the location of his core dump core file @ /net/mellow-yellow.east/files/collins the location of our core file is @ /net/jpgserv/mirror0/jmq/test/mathim/core ###@###.### 2003-03-21 ---- I looked at the core file at /net/mellow-yellow.east/files/collins. From the (portion of) stack trace of SEGV getting thread we have ... [13] __sighndlr(), at 0xdfb67a4f [14] sigacthandler(), at 0xdfb75b21 ---- called from signal handler with signal 11 (SIGSEGV) ------ [15] ObjectSynchronizer::inflate(), at 0xde9a067b [16] ObjectSynchronizer::slow_enter(), at 0xde901e1f [17] InterpreterRuntime::monitorenter(), at 0xde91e73e [18] 0xdac10562(), at 0xdac10561 Crash happened when tried to inflate a lock on an object. 'regs' on frame 15 shows, (dbx) regs current thread: t@0 current frame: [15] gs 0x000001b7 fs 0x00000000 es 0x0000001f ds 0x0000001f edi 0xb7a10188 esi 0x0811783c ebp 0xb5dd089c .... The input argument, object that is being inflated is at %ebp + 8. (dbx)x 0xb5dd089c+8/X ! first argument to ObjectSynchronizer::inflate 0xb5dd08a4: 0xb7a10188 So object being inflated is 0xb7a10188. (dbx) x 0xb7a10188/2X 0xb7a10188: 0x00000000 0xd6804578 Let us do some sanity check on 0xb7a10188: The class of object being inflated is 0xd6804578. name of this class is (dbx)x 0xd6804578 + 72/X ! _name is offset 72 from klassoop start 0xd68045c0: 0xd6800f20 From above symbol oop, the actual name of the class is (dbx)print (char*)(0xd6800f20 + 14) (char *) (3598716704U+14) = 0xd6800f2e "java/lang/Object" Looks like an object of java.lang.Object is being inflated. Again from, (dbx) x 0xb7a10188/2X 0xb7a10188: 0x00000000 0xd6804578 we know that the mark oop of inflated object is 0x00000000 (NULL). From source code of ObjectMonitor* ObjectSynchronizer::inflate(oop object) we know that the mark oop of object is set to NULL at the line markOop test = (markOop) atomic::exchange_ptr(0, (intptr_t*)object->mark_addr()); Either the mark oop of input object was NULL even before entering. Or it was set to NULL by executing above atomic::exchange_ptr call. Or both. In the disassembly of ObjectSynchronizer::inflate, looking for call instruction nearest to the SEGV resulting instruction at 0xde9a067b: inflate+0x012a: movl (%ecx),%ecx we have, 0xde9a0610: inflate+0x00bf: movl 8(%ebp),%edi 0xde9a0613: inflate+0x00c2: movl 0x980(%ebx),%eax 0xde9a0619: inflate+0x00c8: movl (%eax),%eax 0xde9a061b: inflate+0x00ca: pushl %edi 0xde9a061c: inflate+0x00cb: pushl $0 0xde9a061e: inflate+0x00cd: call *%eax The above call is must be atomic::exchange_ptr. Why? 1) This accepts two parameters. 2) First argument is 0 and second one is object header (8(%ebp) which mark oop is moved to %edi and is pushed first). 3) It is an indirect call and we know atomic::exchange_ptr is an indirect call. which means that we have passed thru atomic::exchange_ptr call. And this call set the object header to NULL. We still don't what was the original header value that was stored in local variable 'test' in inflate method. Debug build seem to assert by assert(test != NULL, "Unexpected object header as 0!"); Probably, we may want to try using java_g to check whether we hit the above assert. ###@###.### 2003-03-25 i have analyze the core file Mathi provided, and it seems this looks very close to the problem described in 4701482 please see detail in the comment section ###@###.### 2003-03-26 so far i dont to have a environment that reliably reproduce the problem, some system had been able to reproduce the bug for 2-3 hours run but might not crash for 1 day for next run. so i have produced a vm based on 1.4.1_03 with the fix of 4701482. if you have a machine that reproduced this crash once, please install this fix and let me know if it still crash (if it still crashes please save core file) /net/jpsesvr/export5/binaries/4833582/libjvm.so use it to replace <JAVA_HOME>/jre/lib/i386/server/libjvm.so after replacing check with "java -server -version", should see Java HotSpot(TM) Server VM (build 1.4.1-internal, mixed mode) checksum: <99 greyfish:>sum libjvm.so 50560 12555 libjvm.so ###@###.### 2003-04-01 thanks to the help from qa group now i am able reproduced the crash and verified that the binary fixed the problem ###@###.### 2003-04-08
01-04-2003