One of our customers run a number of different web applications in a production environment. Most of the applications run on Sun Sparc hardware running Solaris 8 and the Sun JVM for Solaris. A small number of applications run on a Sun V20z server with AMD64 processor and RedHat AS3 Linux OS, and the Sun JVM for Linux.
Product is actually J2SE 1.4.2_08 (the latest).
All applications work fine in the Solaris/Sparc environment. and most applications work fine on the Linux/AMD environment.
One application on Linux (only!) hangs at irregular intervals. The hang is particularly vexing to solve because the JVM does not respond to a kill -QUIT (which normally produces a thread dump). They have been able to take a core dump of the application with gdb, but because the JVM builds don't have symbols they are unable to make much sense of the core dump. They don't believe this is an application issue due to the fact that the exact same application never hangs on the Solaris platform. However, even if it is an application issue, it's very hard to debug because the JVM won't allow to take a thread dump so they have virtually no information about what might be causing the hang. They are able to terminate the JVM however.
Environment On Which The Problem Is Seen :
OS is 64-bit RedHat AS3, platform is a Sun V20Z server (64-bit AMD processor). They are running the 32-bit JVM.
Need to diagonize the hang on Linux platform JVM.
Running Linux strace (the equivalent of Solaris truss) shows only that the jvm is stuck in a futex(2) call, which suggests a deadlock (I believe futex(2) is used on Linux to implement Java locking).
The complete core dump of the application in its hung state taken with gdb is available at :
Path : /net/hanwka-home1.sfbay/global/export/home1/27/sd158479/ariba
File Name : core-2.zip and
core-3.zip -->> Totally two files.
###@###.### 2005-07-19 19:35:15 GMT
###@###.### 2005-07-19 20:05:33 GMT