United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
Bug ID: JDK-4962516 CMS thread/SLT deadlock problem
JDK-4962516 : CMS thread/SLT deadlock problem

Details
Type:
Bug
Submit Date:
2003-12-03
Status:
Resolved
Updated Date:
2004-06-25
Project Name:
JDK
Resolved Date:
2004-01-28
Component:
hotspot
OS:
solaris_8,linux_redhat_9.0,solaris_2.6,generic
Sub-Component:
gc
CPU:
x86,sparc,generic
Priority:
P2
Resolution:
Fixed
Affected Versions:
1.4.2_03,5.0
Fixed Versions:
1.4.2_06 (06)

Related Reports
Backport:
Duplicate:

Sub Tasks

Description
There is a bug in the communication mechanism between the CMS
thread and the SurrogateLockerThread (SLT) which results ultimately in a
deadlock between the VMThread, CMS thread and the SLT thread. I am
looking for feedback and possibly a fix for this problem from the CMS
developers at SUN.

The root of the problem is that the implementation of Monitor::wait can
block at SafepointSynchronize::block before waiting on the condition
variable associated with the monitor, keeping the mutex of the critical
region held. Normally this isn't a problem since any synchronization
associated with the Monitor can be postponed until after the GC. However
if the monitor itself is part of the garbage collection mechanism, then
it is a problem. This has been the nature of the CMS problem that I've
been debugging. A monitor that falls into this category is SLT_lock.
Below is the complete sequence of events that lead to deadlock between
the CMS thread (background GC), the SLT (Surrogate locker thread) and
foreground GC (VMthread). The CMS thread and SLT thread synchronize
using the monitor SLT_lock; if the CMS thread is unable to proceed then
the foreground GC keeps waiting causing the system hang.

----

1. Java threads initiate safepoint synchronization.

2. Meanwhile CMS thread is executing
ConcurrentMarkSweepThread::manipulatePLL which does:
a) SLT_lock.lock() followed by 
b) SLT_lock.notify() indicating a "message" has been sent to the SLT
thread. 
c) It then goes and waits using STL_lock.wait( no_safepoint ).

3. An already waiting SLT thread (java thread) wakes (in function
SurrogateLockerThread::loop) up by the notification from (2b) and then
carries out the action associated with the "message".

4. The SLT thread then does
a) SLT_lock.lock() which it is able to acquire because (2c) released it.
b) Does SLT_lock.notify() to resume the CMS thread waiting at (2c).
c) Does a safepointed wait (since it is a Java thread) using
SLT_lock.wait(). However because safepoint synchronization was already
initiated, the code blocks at SafepointSynchronize::block. The wait on
the condition variable will be executed only after returning from the
safepoint - the mutex of the critical region is kept held. This impl. is
in Monitor::wait.

5. The CMS thread waiting from (2c) receives the condition variable
signal from (4c), resumes execution, and attempts to grab the SLT_lock
mutex (all this within a pthread_cond_wait or equivalent call), which is
the normal monitor behavior. It is unable to own the lock and waits
since SLT thread is waiting at a safepoint while holding
the SLT_lock mutex. The CMS thread is unable to proceed with its
execution.

6. The system is at safepoint and the VM thread starts executing
foreground GC. The foreground CMS collection algorithm requires the
background thread to inform of "okay to switchover" from background GC
to foreground GC. Because the CMS thread is still stuck at
SLT_lock.wait in (5), the foreground collector has to keep waiting.

This results in a deadlock!


atg hang with b32 in C1 mode using CMS collector after 1 hour 16 minutes. The hang could be an instance of bug 4962516.
The test machine is jtg-linux4.sfbay

###@###.### 2003-12-22

Stack trace from atg hang:
Thread 60 (Thread 1100073776 (LWP 13089)):
#0  0xffffe002 in ?? ()
#1  0x4003a5d5 in pthread_cond_wait@@GLIBC_2.3.2 ()
    from /lib/tls/libpthread.so.0
#2  0x402ef9e9 in os::Linux::safe_cond_wait(pthread_cond_t*, 
pthread_mutex_t*)
     () from /usr/j2se/jre/lib/i386/client/libjvm.so
#3  0x402dce31 in Monitor::wait(int, long) ()
    from /usr/j2se/jre/lib/i386/client/libjvm.so
#4  0x4015a23d in 
ConcurrentMarkSweepThread::manipulatePLL(SurrogateLockerThread::SLT_msg_type) 
() from /usr/j2se/jre/lib/i386/client/libjvm.so
#5  0x4014d8ec in CMSCollector::collect_in_background(int) ()
    from /usr/j2se/jre/lib/i386/client/libjvm.so
#6  0x4015952a in ConcurrentMarkSweepThread::run() ()
    from /usr/j2se/jre/lib/i386/client/libjvm.so
#7  0x402f0704 in _start(Thread*) ()
    from /usr/j2se/jre/lib/i386/client/libjvm.so
#8  0x40038484 in start_thread () from /lib/tls/libpthread.so.0


Thread 59 (Thread 1100122928 (LWP 13090)):
#0  0xffffe002 in ?? ()
#1  0x402ef9e9 in os::Linux::safe_cond_wait(pthread_cond_t*, 
pthread_mutex_t*)
     () from /usr/j2se/jre/lib/i386/client/libjvm.so
#2  0x402dce31 in Monitor::wait(int, long) ()
    from /usr/j2se/jre/lib/i386/client/libjvm.so
#3  0x4014d13d in CMSCollector::acquire_control_and_collect(int, int) ()
    from /usr/j2se/jre/lib/i386/client/libjvm.so
#4  0x4014ceb4 in ConcurrentMarkSweepGeneration::collect(int, int, 
unsigned, int, int) () from /usr/j2se/jre/lib/i386/client/libjvm.so
#5  0x4017802f in GenCollectedHeap::do_collection(int, int, unsigned, 
int, int, int, int*) () from /usr/j2se/jre/lib/i386/client/libjvm.so
#6  0x40138d31 in 
TwoGenerationCollectorPolicy::satisfy_failed_allocation(unsigned, int, 
int, int*) () from /usr/j2se/jre/lib/i386/client/libjvm.so
#7  0x40178292 in GenCollectedHeap::satisfy_failed_allocation(unsigned, 
int, int, int*) () from /usr/j2se/jre/lib/i386/client/libjvm.so
#8  0x4037c784 in VM_GenCollectForAllocation::doit() ()
    from /usr/j2se/jre/lib/i386/client/libjvm.so
#9  0x4037c4c6 in VM_Operation::evaluate() ()
    from /usr/j2se/jre/lib/i386/client/libjvm.so
#10 0x4037bb37 in VMThread::evaluate_operation(VM_Operation*) ()
    from /usr/j2se/jre/lib/i386/client/libjvm.so
#11 0x4037bd45 in VMThread::loop() ()
    from /usr/j2se/jre/lib/i386/client/libjvm.so
#12 0x4037b950 in VMThread::run() ()
    from /usr/j2se/jre/lib/i386/client/libjvm.so
#13 0x402f0704 in _start(Thread*) ()
    from /usr/j2se/jre/lib/i386/client/libjvm.so
#14 0x40038484 in start_thread () from /lib/tls/libpthread.so.0


###@###.### 2003-12-22

                                    

Comments
SUGGESTED FIX

Here are the essential changes:

*** /net/spot/archive02/ysr/clone/webrev/src/share/vm/memory/concurrentMarkSweepGeneration.cpp- Mon Jan  5 10:42:03 2004
--- concurrentMarkSweepGeneration.cpp   Mon Jan  5 10:37:12 2004

*** 1,7 ****
  #ifdef USE_PRAGMA_IDENT_SRC
! #pragma ident "@(#)concurrentMarkSweepGeneration.cpp  1.171 03/12/17 17:59:05 JVM"
  #endif
  /*
   * Copyright 2003 Sun Microsystems, Inc.  All rights reserved.
   * SUN PROPRIETARY/CONFIDENTIAL.  Use is subject to license terms.
   */
--- 1,7 ----
  #ifdef USE_PRAGMA_IDENT_SRC
! #pragma ident "@(#)concurrentMarkSweepGeneration.cpp  1.173 04/01/05 10:37:13 JVM"
  #endif
  /*
   * Copyright 2003 Sun Microsystems, Inc.  All rights reserved.
   * SUN PROPRIETARY/CONFIDENTIAL.  Use is subject to license terms.
   */

*** 1627,1636 ****
--- 1629,1665 ----
    assert_lock_strong(_permGen->freelistLock());
    PRODUCT_ONLY(ShouldNotReachHere());
    return true;
  }
  
+ // A utility class that is used by the CMS collector to
+ // temporarily "release" the foreground collector from its
+ // usual obligation to wait for the background collector to
+ // complete an ongoing phase before proceeding.
+ class ReleaseForegroundGC: public StackObj {
+  private:
+   CMSCollector* _c;
+  public:
+   ReleaseForegroundGC(CMSCollector* c) : _c(c) {
+     assert(_c->_foregroundGCShouldWait, "Else should not need to call");
+     MutexLockerEx x(CMS_lock, Mutex::_no_safepoint_check_flag);
+     // allow a potentially blocked foreground collector to proceed
+     _c->_foregroundGCShouldWait = false;
+     if (_c->_foregroundGCIsActive) {
+       CMS_lock->notify();
+     }
+     assert(!ConcurrentMarkSweepThread::cms_thread_has_cms_token(),
+            "Possible deadlock");
+   }
+ 
+   ~ReleaseForegroundGC() {
+     assert(!_c->_foregroundGCShouldWait, "Usage protocol violation?");
+     MutexLockerEx x(CMS_lock, Mutex::_no_safepoint_check_flag);
+     _c->_foregroundGCShouldWait = true;
+   }
+ };
+ 
  // There are separate collect_in_background and collect_in_foreground because of
  // the different locking requirements of the background collector and the
  // foreground collector.  There was originally an attempt to share
  // one "collect" method between the background collector and the foreground
  // collector but the if-then-else required made it cleaner to have

*** 1760,1816 ****
          abortable_preclean();
          assert(_collectorState == FinalMarking, "Collector state should "
            "have changed");
          break;
        case FinalMarking:
        {
            // If a foreground collection is in progress, it already has
          // the pending list lock.  This is similar to the situation
!         // with the Heap_lock.  See comments in stopWorldAndDo()
          // about racing for the Heap_lock. 
-           { 
-             MutexLockerEx x(CMS_lock, Mutex::_no_safepoint_check_flag);
-             // allow a potentially blocked foreground collector to proceed
-             _foregroundGCShouldWait = false;
-             if (_foregroundGCIsActive) {
-               CMS_lock->notify();
-             }
-             assert(!ConcurrentMarkSweepThread::cms_thread_has_cms_token(),
-                    "Possible deadlock");
-           }
  
          ConcurrentMarkSweepThread::manipulatePLL(
              SurrogateLockerThread::acquirePLL);
-           bool didSomeWork = false;
-           {
-             MutexLockerEx x(CMS_lock, Mutex::_no_safepoint_check_flag);
-             // For regularity, set _foregroundGCShouldWait
-           // The background collector is grabbing the locks
-           // it needs to do the collection but on releasing the
-           // locks still make the foreground collector wait for
-           // the _foregroundGCShouldWait flag.
-             _foregroundGCShouldWait = true;
            }
            if (_collectorState == FinalMarking) {
              // we didn't lose a race to FG thread
!           stopWorldAndDo(CMS_op_checkpointRootsFinal);
!             didSomeWork = true;
            } else {
              // else we did lose a race to FG thread
            assert(_collectorState == Idling, "The foreground collector"
                   " should have finished the collection");
          }
!           // Check if we need to post a notification on PLL;
!           if (didSomeWork &&
                _ref_processor->read_and_reset_notify_ref_lock()) {
              ConcurrentMarkSweepThread::manipulatePLL(
                SurrogateLockerThread::releaseAndNotifyPLL);
            } else {
              ConcurrentMarkSweepThread::manipulatePLL(
                SurrogateLockerThread::releasePLL);
            }
        }
        break;
        case Sweeping:
        // final marking in checkpointRootsFinal has been completed
          sweep(true);
        assert(_collectorState == Resetting, "Collector state change "
--- 1792,1843 ----
          abortable_preclean();
          assert(_collectorState == FinalMarking, "Collector state should "
            "have changed");
          break;
        case FinalMarking:
+         assert(_foregroundGCShouldWait, "block pre-condition");
          {
            // If a foreground collection is in progress, it already has
            // the pending list lock.  This is similar to the situation
!           // with the Heap_lock.  See comments in stop_world_and_do()
            // about racing for the Heap_lock.
  
+           // We may block while trying to communicate with the
+           // SLT thread in order to manipulate the PLL. We make
+           // sure that the foreground collector will not block
+           // waiting for us to complete communication with the
+           // SLT thread. See, for instance, bug XXXX.
+           {
+             ReleaseForegroundGC x(this);
            ConcurrentMarkSweepThread::manipulatePLL(
                SurrogateLockerThread::acquirePLL);
            }
+           bool did_some_work = false;
            if (_collectorState == FinalMarking) {
              // we didn't lose a race to FG thread
!           did_some_work = stop_world_and_do(CMS_op_checkpointRootsFinal);
            } else {
              // else we did lose a race to FG thread
            assert(_collectorState == Idling, "The foreground collector"
                   " should have finished the collection");
          }
!           // Post a notification on PLL, as necessary, taking
!           // care to make sure that the foreground collector will
!           // not stall waiting for us to return promptly from the call.
!           {
!             ReleaseForegroundGC x(this);
!             if (did_some_work &&
                  _ref_processor->read_and_reset_notify_ref_lock()) {
                ConcurrentMarkSweepThread::manipulatePLL(
                  SurrogateLockerThread::releaseAndNotifyPLL);
              } else {
                ConcurrentMarkSweepThread::manipulatePLL(
                  SurrogateLockerThread::releasePLL);
              }
            }
+       }
+         assert(_foregroundGCShouldWait, "block post-condition");
        break;
        case Sweeping:
        // final marking in checkpointRootsFinal has been completed
          sweep(true);
        assert(_collectorState == Resetting, "Collector state change "


----

*** /net/spot/archive02/ysr/clone/webrev/src/share/vm/memory/concurrentMarkSweepGeneration.hpp- Mon Jan  5 10:42:07 2004
--- concurrentMarkSweepGeneration.hpp   Mon Jan  5 10:33:08 2004

*** 1,7 ****
  #ifdef USE_PRAGMA_IDENT_HDR
! #pragma ident "@(#)concurrentMarkSweepGeneration.hpp  1.99 03/12/02 13:54:31 JVM"
  #endif
  /*
   * Copyright 2003 Sun Microsystems, Inc.  All rights reserved.
   * SUN PROPRIETARY/CONFIDENTIAL.  Use is subject to license terms.
   */
--- 1,7 ----
  #ifdef USE_PRAGMA_IDENT_HDR
! #pragma ident "@(#)concurrentMarkSweepGeneration.hpp  1.100 04/01/05 10:33:10 JVM"
  #endif
  /*
   * Copyright 2003 Sun Microsystems, Inc.  All rights reserved.
   * SUN PROPRIETARY/CONFIDENTIAL.  Use is subject to license terms.
   */

*** 386,395 ****
--- 386,396 ----
    friend class Par_PushAndMarkClosure;        //  -- ditto --
    friend class CMSKeepAliveClosure;           //  -- ditto --
    friend class CMSDrainMarkingStackClosure;   //  -- ditto --
    friend class CMSInnerParMarkAndPushClosure; //  -- ditto --
    NOT_PRODUCT(friend class ScanMarkedObjectsAgainClosure;) //  assertion on _overflow_list
+   friend class ReleaseForegroundGC;  // to access _foregroundGCShouldWait
  
   private:
    jlong _time_of_last_gc;
    void update_time_of_last_gc(jlong now) {
      _time_of_last_gc = now;

*** 502,513 ****
    enum CMS_op_type {
      CMS_op_checkpointRootsInitial,
      CMS_op_checkpointRootsFinal
    };
  
!   void doCMSOperation(CMS_op_type op);
!   bool stopWorldAndDo(CMS_op_type op);
  
    OopTaskQueueSet* task_queues() { return _task_queues; }
    int*             hash_seed(int i) { return &_hash_seed[i]; }
  
    // Support for parallelizing young gen rescan in CMS remark phase
--- 503,514 ----
    enum CMS_op_type {
      CMS_op_checkpointRootsInitial,
      CMS_op_checkpointRootsFinal
    };
  
!   void do_CMS_operation(CMS_op_type op);
!   bool stop_world_and_do(CMS_op_type op);
  
    OopTaskQueueSet* task_queues() { return _task_queues; }
    int*             hash_seed(int i) { return &_hash_seed[i]; }
  
    // Support for parallelizing young gen rescan in CMS remark phase

###@###.### 2004-01-14: Ignore the above; a simpler fix has been
engineered, but the diffs are too numerous (as indeed the above were) to list
here in their entirety. Please refer to:
   http://analemma.sfbay/net/spot/archive02/ysr/clone/webrev

Event:            putback-to
Parent workspace: /net/jano.sfbay/export/disk05/hotspot/ws/main/gc_baseline
                  (jano.sfbay:/export/disk05/hotspot/ws/main/gc_baseline)
Child workspace:  /prt-workspaces/20040114174425.ysr.clone/workspace
                  (prt-web:/prt-workspaces/20040114174425.ysr.clone/workspace)
User:             ysr

Comment:

---------------------------------------------------------

Original workspace:     neeraja:/net/spot/archive02/ysr/clone
Submitter:              ysr
Archived data:          /net/prt-archiver.sfbay/export2/archived_workspaces/main/gc_baseline/2004/20040114174425.ysr.clone/
Webrev:                 http://analemma.sfbay.sun.com/net/prt-archiver.sfbay/export2/archived_workspaces/main/gc_baseline/2004/20040114174425.ysr.clone/workspace/webrevs/webrev-2004.01.15/index.html


Fixed: 4962516 CMS thread/SLT deadlock problem

Webrev: http://analemma.sfbay/net/spot/archive02/ysr/clone/webrev

The deadlock was first identified and analyzed by Amit Nene of
HP early December. Since this is a day-one bug, which we
had never seen in practice, we thought we'd defer the fix to 1.5.1,
reasoning that the bug needed quite adversarial scheduling
to manifest. Alas, the new linux kernel exposes the
deadlock within a few hours running ATG (ATG is run with a low
CMSInitiatingOccupancy setting so that it's essentially doing
CMS collections all the time, which increases the probability
of the deadlock event; the new linux scheduler probably helps
quite a bit as well, since that's the only configuration
that shows up the problem).

The problem arises because when the CMS thread is communicating
with the SLT (surrogate locker thread) thread (which is a
JavaThread that does PLL (pending list lock) locking on behalf
of the CMS thread around the CMS remark phase), the
VM thread may have initiated a safepoint and the SLT thread
might suspend in SLT_lock->wait() after having released the
PLL and notified the CMS thread waiting on the SLT_lock,
but while itself holding the mutex underlying the SLT_lock
monitor. At this point, the CMS thread has not released
the "baton" to the foreground collector (the vm thread),
resulting in a 3-way deadlock between the SLT, CMS and
VM threads.

Our solution is to release the foreground gc thread
around all stop-world phases, and a fortiori around the
SLT communication above. During these stop-world phases,
the PLL lock and the Heap_lock prevent interference with
the foreground thread.

(Our initial solution was to recognize that the CMS<->SLT
 2-way handshake protocol is subject to delay because of
 safepointing and to "release the baton" (to the foreground
 collector) around such communication. This is safe because
 CMS<->SLT communication is used for PLL lock/notify/unlock which
 is done outside of the remark phase. It can be shown
 that the temporal phase during which the PLL is held
 is not subject to interruption. The window of vulnerability,
 which we avoid with this fix, is when the PLL has been
 released by the SLT thread but before the SLT thread has
 completed the "ack" in the 2-way handshake protocol in
 its communication with the CMS thread.

 The final solution followed from an optimization suggestion
 of Jon Masamitsu when reviewing the initial solution.)

Thanks to June Zhong and Dxo-Shin Chen for testing help.

Reviewed by: Jon Masamitsu (more reviews welcome)

Approved by: server project team
Fix Verified: yes

Verification Testing: ATG on linux, -client and -server, CMS
   ongoing for > 24 hours (without fix would deadlock -- on
   specific Linux machines -- within ~4-5 hour)

Other testing: (linux/intel,sparc/solaris,fastdebug/product,c1/c2,cms):
  . refworkload
  . ATG
  . volano
  . quicklook -full

Files:
update: src/share/vm/memory/concurrentMarkSweepGeneration.cpp
update: src/share/vm/memory/concurrentMarkSweepGeneration.hpp

Examined files: 3150

Contents Summary:
       2   update
    3148   no action (unchanged)
                                     
2004-07-08
EVALUATION

see comments section.
                                     
2004-07-08
CONVERTED DATA

BugTraq+ Release Management Values

COMMIT TO FIX:
1.4.2_06
generic
tiger-beta2

FIXED IN:
1.4.2_06
tiger-beta2

INTEGRATED IN:
1.4.2_06
tiger-b36
tiger-beta2


                                     
2004-07-08



Hardware and Software, Engineered to Work Together