Bug ID: JDK-4499805 Java application becomes deadlocked

JDK-4499805 : Java application becomes deadlocked

Type: Bug
Component: hotspot
Sub-Component: runtime
Affected Version: 1.3.1_01,1.4.0

Priority: P1
Status: Closed
Resolution: Fixed
OS: solaris_7,solaris_8
CPU: sparc

Submitted: 2001-09-05
Updated: 2012-10-08
Resolved: 2001-09-18

Versions (Unresolved/Resolved/Fixed)

The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.

Other	Other
1.3.1_02 02Fixed	1.4.0Fixed

Related Reports

Duplicate :	JDK-4509128 - vm deadlock/race hazard
Relates :	JDK-4496133 - JVM dumps core

Description

Client/server application becomes deadlocked.

Both client and server are running with /usr/lib/lwp/libthread, and are doing
lots of compression and storage.


-----------------
Additional problem in this code area:
(Reported by Karen Kinnear)

I was just running b78 (from last week's code base) and turned
on 
VM option '+ShowMessageBoxOnError'
VM option '+FullGCALot'
VM option '+GCALotAtAllSafepoints'
VM option '+VerifyBeforeGC'
VM option '+VerifyAfterGC'
VM option 'FullGCALotInterval=1000'


And ran into an assertion:

# assert(_no_handle_mark_nesting == 0, "allocating handle inside NoHandleMark")

The stack trace looks like:

  [6] report_assertion_failure(code_str = 0xfee0fbd1 "_no_handle_mark_nesting == 
0", file_name = 0xfee0fbee 
"/net/karenspc/files/merlinmain_fixes/src/share/vm/runti
me/handles.cpp", line_no = 23, message = 0xfee0fc34 "allocating handle inside 
NoHandleMark"), line 148 in "debug.cpp"
  [7] HandleArea::allocate_handle(this = 0x39a20, obj = 0xf1f502d0), line 23 in 
"handles.cpp"
  [8] Handle::Handle(0xffbed284, 0xf1f502d0, 0xffbedb38, 0xf9c214c0, 0xf1800400, 
0x0), at 0xfeae3830
  [9] VM_GC_Operation::acquire_pending_list_lock(this = 0xffbed434), line 66 in 
"vm_operations.cpp"
  [10] VM_GC_Operation::doit_prologue(this = 0xffbed434), line 91 in 
"vm_operations.cpp"
  [11] VMThread::execute(op = 0xffbed434), line 381 in "vmThread.cpp"
  [12] GenCollectedHeap::collect_locked(this = 0xad1a8, cause = _full_gc_alot, 
max_level = 1), line 446 in "genCollectedHeap.cpp"
  [13] GenCollectedHeap::collect(this = 0xad1a8, cause = _full_gc_alot), line 
421in "genCollectedHeap.cpp"
  [14] InterfaceSupport::gc_alot(), line 56 in "interfaceSupport.cpp"
  [15] InterfaceSupport::check_gc_alot(), line 62 in "interfaceSupport.hpp"
  [16] Thread::check_for_valid_safepoint_state(this = 0x39430, 
potential_vm_operation = 0), line 638 in "thread.cpp"
  [17] Mutex::check_prelock_state(this = 0x387f0, thread = 0x39430), line 264 in 
"mutex.cpp"
  [18] Mutex::lock(this = 0x387f0), line 28 in "mutex.cpp"
  [19] MutexLocker::MutexLocker(0xffbed834, 0x387f0, 0x6, 0x0, 0x0, 0x0), at 
0xfe79338c
  [20] GC_locker::lock_critical(thread = 0x39430), line 84 in "gcLocker.hpp"
  [21] jni_GetStringCritical(env = 0x394c8, string = 0x128144, isCopy = (nil)), 
line 1604 in "jni.cpp"
dbx: warning: can't find file 
"/export/home3/jdk/jdk1.4/ws/control/build/solaris-sparc/tmp/java/java.lang/java
/obj_g/jni_util.o"
  [22] getString646_USChars(0x394c8, 0x128144, 0x0, 0x0, 0x0, 0x0), at 
0xfe36608c@




###@###.### 2001-09-11

Comments

CONVERTED DATA BugTraq+ Release Management Values COMMIT TO FIX: 1.3.1_02 merlin-beta3 FIXED IN: 1.3.1_02 merlin-beta3 INTEGRATED IN: 1.3.1_02 merlin-beta3

14-06-2004

EVALUATION There was a race condition in heap_expanded() in the previous fix of bug 4411230. ###@###.### 2001-09-05 Here's Mingyao's analysis of the problem code (comments added): static void heap_expanded() { if (_jni_lock_count > 0) { /* Remaining JNI release critical region calls break in here. WON'T do a GC since _needs_gc is not set yet */ // cast away volatile atomic::store((jint*)&_needs_gc, (jint)true); } /* All JNI enter critical region will be blocked from now on. But no JNI release critical region is going to happen any more. */ }

11-06-2004

SUGGESTED FIX For the first problem: After some discussion, Mingyao and I settled on modifying lock_critical() to add one more clause: while (is_jni_active() && needs_gc() && !thread->in_critical()) { ^^^^^^^^^^^^^^ For the second problem: jni_GetPrimitiveArrayCritical and jni_GetStringCritical should not be JNI_QUICK_ENTRY routines, they should be JNI_ENTRY. Because locks are taken in these routines, they don't qualify as QUICK any more. And for the record other fixes discussed for the first problem were: - locking the heap_expanded method quickly gets into lock rank issues, as heap_expanded is called from a method which holds at least the ExpandHeap_lock (and maybe others, as it can be called while we're in or not in a safepoint) - modifying the heap_expanded() code to not include an if statement: static void heap_expanded() { // cast away volatile atomic_store((jint*)&_needs_gc, (jint)_jni_lock_count); } but there was concern that maybe all the other processors wouldn't see the new value of _needs_gc in time. We decided simply forcing a read of a variable that's only written under the JNICritical_lock would do the trick. ###@###.### 2001-09-11 ------- gcLocker.hpp ------- 50a51,60 > // > // Note that it would be best if this routine took the JNICritical_lock, > // as it might write _needs_gc. However, this is difficult because > // of lock ordering issues. > // As a result, lock_critical() makes an additional check that we've > // got an active jni critical region before blocking any threads. > // (The case we're concerned about is if we determine there are JNI > // active regions here, but are interrupted and they all exit before > // we manage to set _needs_gc -- thus we'd have set _needs_gc when there > // are actually no JNI active regions!) 52c62 < if (_jni_lock_count > 0) { --- > if (is_jni_active()) { 68c78 < // all threads in critcal regions to complete, but not allowing --- > // all threads in critical regions to complete, but not allowing 85c95,101 < while (needs_gc() && !thread->in_critical()) { --- > // Block entering threads if we know at least one thread is in a > // JNI critical region, we need a GC, and this thread isn't already > // in a critical region. > // We check that at least one thread is in a critical region before > // blocking because blocked threads are woken up by a thread exiting > // a JNI critical region. > while (is_jni_active() && needs_gc() && !thread->in_critical()) { Fix for the assertion failure: ------- jni.cpp ------- 1577c1577 < JNI_QUICK_ENTRY(void*, jni_GetPrimitiveArrayCritical(JNIEnv *env, jarray array, jboolean *isCopy)) --- > JNI_ENTRY(void*, jni_GetPrimitiveArrayCritical(JNIEnv *env, jarray array, jboolean *isCopy)) 1602c1602 < JNI_QUICK_ENTRY(const jchar*, jni_GetStringCritical(JNIEnv *env, jstring string, jboolean *isCopy)) --- > JNI_ENTRY(const jchar*, jni_GetStringCritical(JNIEnv *env, jstring string, jboolean *isCopy)) ###@###.### 2001-09-13

13-09-2001