JDK-4873538 : signal handler deadlocked waiting on a malloc lock
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 1.3.1_07,1.3.1_09
  • Priority: P2
  • Status: Closed
  • Resolution: Duplicate
  • OS: solaris_8
  • CPU: sparc
  • Submitted: 2003-06-03
  • Updated: 2003-10-24
  • Resolved: 2003-10-24
Related Reports
Duplicate :  
Relates :  
Relates :  
Description
Customer is using SunONE Application server with jdk1.3.1_07 as the jvm in it. They are occasionally seeing the JVM getting crashed.

When a SIGBUS occured, the JVM should have cored and been restarted by iAS.  However, it did not and I believe this is a result of a bug in the JVM.  The following is the relevant stack info:

current thread: t@268
  [1] __lwp_park(0x0, 0x0, 0x0, 0x0, 0xfe254000, 0xfe7c0600), at 0xfedb4ab0
  [2] mutex_lock_queue(0xe30e1c00, 0xfedc6b6c, 0xfe7c0600, 0xfedc6000, 0x1, 0xfe7c0608), at 0xfedb1524
  [3] slow_lock(0xfe7c0600, 0xe30e1c00, 0xfe7c0600, 0xfe7bc004, 0x0, 0x3), at 0xfedb1c00
  [4] free(0x25af648, 0x25af648, 0x45535400, 0x7efefeff, 0x81010100, 0xff00), at 0xfe742b14
  [5] tzcpy(0x25af648, 0xfe7c2938, 0x0, 0xb, 0xfe7bc004, 0xffbefef6), at 0xfe7534f4
  [6] getzname(0xffbeff01, 0xfe7bf55c, 0x0, 0xfe7bf55c, 0xffbefef6, 0x2), at 0xfe753458
  [7] _ltzset_u(0x3edb64f5, 0xfe7bc004, 0x0, 0x0, 0x0, 0x1), at 0xfe752f5c
  [8] localtime_u(0xe067d540, 0xfe7c2940, 0xe067d540, 0xfecc8000, 0xfe7bc004, 0xfebc5a48), at 0xfe752124
  [9] os::report_fatal_error(0x24fb4b8, 0xffffffff, 0xfec70c30, 0x22250, 0x0, 0xfe742464), at 0xfebc5a48
  [10] os::handle_unexpected_exception(0x24fb4b8, 0xfed38984, 0xfecdc18c, 0xfec70ff4, 0xfecc8000, 0x0), at 0xfebc5e0c
  [11] JVM_handle_solaris_signal(0x0, 0x24fb4b8, 0xe067e178, 0xfecc8000, 0xa, 0xe067e430), at 0xfea0a9bc
  [12] __sighndlr(0xa, 0xe067e430, 0xe067e178, 0xfea0a9d4, 0x0, 0x0), at 0xfedb4cc8
  [13] call_user_handler(0xe30e1c00, 0x10c, 0xfedc78c0, 0xe067e178, 0xe067e430, 0xa), at 0xfedafb00
  [14] sigacthandler(0xe30e1c00, 0xe067e430, 0xe067e178, 0xfedc6000, 0xe067e430, 0xa), at 0xfedafccc
  ---- called from signal handler with signal 10 (SIGBUS) ------
  [15] realfree(0x2ffc228, 0xfe7c2850, 0xfe7bc004, 0x2ffc0f0, 0x139, 0x2ffc0f8), at 0xfe742464
  [16] cleanfree(0x0, 0xfe7bc004, 0xfe7c27c4, 0xfe7c2844, 0xfe7c27c8, 0x0), at 0xfe742c60
  [17] _malloc_unlocked(0x1e, 0xe30e1c00, 0xfe7bc004, 0x20, 0x0, 0x0), at 0xfe741d94
  [18] malloc(0x1e, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfe741c88
  [19] operator new(0x1e, 0x0, 0x13640, 0xe4ad8c78, 0xff029978, 0x1e), at 0xff01635c

If you notice, when the SIGBUS occured, the JVM's signal handler get's called to process the signal.  I believe what has happened is the signal handler has deadlocked waiting on a malloc lock which is held by the JVM thread which caused the SIGBUS to be thrown. 
###@###.### 2003-06-03

Comments
WORK AROUND The following falag has been added to 1.3.1 to allow avoiding this hang: -XX:+SuppressFatalErrorMessage ###@###.### 2003-11-18
18-11-2003

EVALUATION 4852773 (jdk1.2.2_15) is unrelated to current problem. This 1.2.2_xx issue was due to suspending a thread for GC when it is inside malloc. GC suspension code in 1.3.1_xx is totally different. 1.3.1_xx and higher will not suspend a thread for GC when it is inside malloc call. In JVM signal handler, if signal is not an expected one, we print some error info. In the info we print time by calling 'localtime'. localtime indirectly calls 'free'. Our SIGBUS originated from problem in malloc and hence we already hold malloc lock. The 'free' call tries to get the same lock again and hence results in deadlock. I am check whether we can get current time value in some async. safe way (i.e., without affecting malloc lock). ###@###.### 2003-06-04 This is a duplicate of 4515367 which is fixed in 1.5. ###@###.### 2003.10.24
04-06-2003