JDK-6401751 : JVM hangs with threads in "waiting on condition" state
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 1.4.2_09
  • Priority: P2
  • Status: Closed
  • Resolution: Cannot Reproduce
  • OS: linux_redhat_7.3
  • CPU: generic
  • Submitted: 2006-03-21
  • Updated: 2014-02-27
  • Resolved: 2006-05-11
Description
We have attached a thread dump that shows the above problem. Customer's application uses multiple threads, and this for some unknown reason begin to get stuck in a "waiting on condition" state. This has the effect of freezing the application to the point where the JVM eventually appears to be hung.

Many of the threads seem to be stuck on the following class, despite a lack of "waiting on <>", etc. messages that are typically displayed in a heap dump when threads are being blocked on an object monitor.

com.nortelnetworks.ims.cap.prtcl.sip.SipLSC.processIncomingSignal(SipLSC.java:320)

This particular line of code is simply executing a data accessor method. There is no "synchronized" block nor is there any calls to Object.wait(). Nor do we make use of the Condition class in any of our software.

                try {

                    sipMsg = ((SipSignal) inSignal).getSipMessage();    // This is line 320 in SipLSC.java

                }

                catch (ClassCastException e) {

                    // Potential class cast exception if this was not a SipSignal

                }

 

               // SipSignal -

               // Accessor for the sipMsg attribute

               public SipMessage getSipMessage() {return sipMsg;}

The heap dump attached was taken as soon as symptoms of the problem appeared, and the "waiting on condition" seems to first affect the threads that are the busiest doing work. It eventually affects enough threads such that the application effectively becomes completely unresponsive.

Frequency:
This problem has only occurred in one of our labs that we have that is running the application with this JVM/OS/Hardware configuration. We have other labs similarly configured which have not had this problem. In this particular lab the problem is reproducible (typically it will have had occurred after running 12 hours of traffic in the application).

Comments
EVALUATION Currently unable to reproduce this problem.
27-04-2006