JDK-4654490 : Volano Mark hang on Linux 7.2 SMP
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 1.4.1
  • Priority: P3
  • Status: Closed
  • Resolution: Duplicate
  • OS: linux
  • CPU: generic
  • Submitted: 2002-03-19
  • Updated: 2002-03-27
  • Resolved: 2002-03-27
Related Reports
Duplicate :  
Description
Please see also 4650839.

Sometimes VM couldn't grab the Threads_lock, Heap_lock or SystemDictionary_lock
and that would cause vmark hang on the client side.

To reproduce:

> java COM.volano.Main
> repeat 1000 java COM.volano.Mark -count 1

It hangs pretty quick (usually in less than 500 COM.volano.Mark runs) on
Redhat 7.2 SMP with product builds. I wasn't able to reproduce the hang 
using Redhat 6.2 SMP or debug builds.

The CPU usage is 0% when VM hangs. It appears that VM couldn't grab one of 
the important system locks (Threads_lock, Heap_lock or SystemDictionary_lock)
with pthread_mutex_lock() call. However, the _owner field of the lock is 0x0.

Looking into the pthread frames that handle the underlying pthread mutex, the
mutex status is non-zero, implying that it is indeed locked by some thread.
By default, LinuxThreads doesn't record the real owner of a mutex unless
the type of mutex is initialized to PTHREAD_MUTEX_ERRORCHECK_NP. I managed
to reproduce the hang with PTHREAD_MUTEX_ERRORCHECK_NP type mutex, the _owner 
field of the pthread mutex is again 0x0. See the following stack trace:

#0  0x40075aa5 in __sigsuspend (set=0x4c5b10c0)
    at ../sysdeps/unix/sysv/linux/sigsuspend.c:45
#1  0x40037079 in __pthread_wait_for_restart_signal (self=0x4c5b1be0)
    at pthread.c:967
#2  0x40038d39 in __pthread_alt_lock (lock=0x805069c, self=0x4c5b1be0)
    at restart.h:34
#3  0x40035c6e in __pthread_mutex_lock (mutex=0x805068c) at mutex.c:116
#4  0x4040be1f in os::Linux::safe_mutex_lock (_mutex=0x805068c)
    at /home/huanghui/main/build/linux/../../src/os_cpu/linux_i486/vm/os_linux_i486.cpp:518
#5  0x40470589 in os::Linux::Event::lock (this=0x8050688)
    at /home/huanghui/main/build/linux/../../src/os/linux/vm/os_linux.hpp:137
#6  0x404703ed in Mutex::wait_for_lock_implementation (this=0x8050660)
    at /home/huanghui/main/build/linux/../../src/os/linux/vm/mutex_linux.inline.hpp:25
#7  0x403fbc1b in Mutex::wait_for_lock_blocking_implementation (
    this=0x8050660, thread=0x807bf30)
    at /home/huanghui/main/build/linux/../../src/os/linux/vm/mutex_linux.cpp:89
#8  0x403fae61 in Mutex::lock (this=0x8050660)
    at /home/huanghui/main/build/linux/../../src/share/vm/runtime/mutex.cpp:42
#9  0x4042aab4 in SystemDictionary::find ()
    at /home/huanghui/main/build/linux/../../src/share/vm/runtime/safepoint.hpp:230
#10 0x4042ac1f in SystemDictionary::find_instance_or_array_klass ()
    at /home/huanghui/main/build/linux/../../src/share/vm/runtime/safepoint.hpp:230
#11 0x402f7dfb in ciEnv::get_klass_by_name_impl ()
   from /home/huanghui/jdk1.4.1/jre/lib/i386/client/libjvm.so
#12 0x402f8251 in ciEnv::get_klass_by_index_impl ()
   from /home/huanghui/jdk1.4.1/jre/lib/i386/client/libjvm.so
#13 0x402f82ef in ciEnv::get_klass_by_index ()
   from /home/huanghui/jdk1.4.1/jre/lib/i386/client/libjvm.so
  ... ... ... ...
(gdb) frame 4
#4  0x4040be1f in os::Linux::safe_mutex_lock (_mutex=0x805068c)
    at /home/huanghui/main/build/linux/../../src/os_cpu/linux_i486/vm/os_linux_i486.cpp:518
518           int status = pthread_mutex_lock(_mutex);
Current language:  auto; currently c++
(gdb) p *_mutex
$7 = {__m_reserved = 0, __m_count = 0, __m_owner = 0x0, __m_kind = 2,
  __m_lock = {__status = 1473963040, __spinlock = 0}}
    >>>>   __m_kind == PTHREAD_MUTEX_ERRORCHECK_NP,  __m_owner = 0x0 <<<<
(gdb) frame 8
#8  0x403fae61 in Mutex::lock (this=0x8050660)
    at /home/huanghui/main/build/linux/../../src/share/vm/runtime/mutex.cpp:42
42              wait_for_lock_blocking_implementation((JavaThread*)thread);
(gdb) p *this
$8 = {<CHeapObj> = {<No data fields>}, _lock_count = 0, _lock_event = 0x8050688,
  _supress_signal = 0, _owner = 0x0,
  _name = 0x4058a636 "SystemDictionary_lock", static INVALID_THREAD = 0x0}

Note from "$7" that __m_kind = 2, which is PTHREAD_MUTEX_ERRORCHECK_NP.
__m_lock.__status is not 0 or 1, but __m_owner == 0x0.

"$8" shows that the (HotSpot) _owner field of SystemDictionary_lock is 0x0.

Comments
EVALUATION This hang is caused by race in Linux SMP kernel that duplicate PIDs are assigned to different threads. Fixed in kernel 2.4.18. ###@###.### 2002-03-26
26-03-2002