United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
Bug ID: JDK-6782457 CMS: Livelock in CompactibleFreeListSpace::block_size().
JDK-6782457 : CMS: Livelock in CompactibleFreeListSpace::block_size().

Details
Type:
Bug
Submit Date:
2008-12-08
Status:
Resolved
Updated Date:
2011-12-27
Project Name:
JDK
Resolved Date:
2009-01-31
Component:
hotspot
OS:
generic
Sub-Component:
gc
CPU:
generic
Priority:
P2
Resolution:
Fixed
Affected Versions:
7
Fixed Versions:
hs14 (b09)

Related Reports
Backport:
Backport:
Backport:
Backport:
Backport:
Relates:
Relates:
Relates:
Relates:

Sub Tasks

Description
Running the test Compact_InternedStrings I observe a live lock when running
with CMS where 

Thread 26 has done an allocation out of the perm gen and is trying
to reacquire  the heap lock and is blocked.  The perm gen object has
not been initialized.

Thread 25 has requested a collection (VM_GenCollectForPermanentAllocation) and is
executing in VMThread::execute() and has grabbed the heap lock as
part of the doit_prologue().

Thread 16 is the VM thread and it is  trying to do a compaction.
It is in the loop in CompactibleFreeListSpace::block_size() waiting
for the block to be initialized (never will be).

                                    

Comments
EVALUATION

See description and comments sections.
                                     
2008-12-09
SUGGESTED FIX

The fix turns out to be fairly straightforward. We restructure the
locking code in the loop so as to avoid the attempt to relock the heap lock
-- which can cause us to stall -- after having succeeded in obtaining
(not yet initialized) storage.

diff -r 8a0c882e46d6 src/share/vm/memory/permGen.cpp
--- a/src/share/vm/memory/permGen.cpp   Thu Dec 04 13:21:16 2008 -0800
+++ b/src/share/vm/memory/permGen.cpp   Tue Dec 09 17:35:19 2008 -0800
@@ -26,20 +26,24 @@
 #include "incls/_permGen.cpp.incl"
 
 HeapWord* PermGen::mem_allocate_in_gen(size_t size, Generation* gen) {
-  MutexLocker ml(Heap_lock);
   GCCause::Cause next_cause = GCCause::_permanent_generation_full;
   GCCause::Cause prev_cause = GCCause::_no_gc;
+  unsigned int gc_count_before, full_gc_count_before;
+  HeapWord* obj;
 
   for (;;) {
-    HeapWord* obj = gen->allocate(size, false);
-    if (obj != NULL) {
-      return obj;
-    }
-    if (gen->capacity() < _capacity_expansion_limit ||
-        prev_cause != GCCause::_no_gc) {
-      obj = gen->expand_and_allocate(size, false);
-    }
-    if (obj == NULL && prev_cause != GCCause::_last_ditch_collection) {
+    {
+      MutexLocker ml(Heap_lock);
+      if ((obj = gen->allocate(size, false)) != NULL) {
+        return obj;
+      }
+      if (gen->capacity() < _capacity_expansion_limit ||
+          prev_cause != GCCause::_no_gc) {
+        obj = gen->expand_and_allocate(size, false);
+      }
+      if (obj != NULL || prev_cause == GCCause::_last_ditch_collection) {
+        return obj;
+      }
       if (GC_locker::is_active_and_needs_gc()) {
         // If this thread is not in a jni critical section, we stall
         // the requestor until the critical section has cleared and
@@ -61,31 +65,27 @@ HeapWord* PermGen::mem_allocate_in_gen(s
           return NULL;
         }
       }
+      // Read the GC count while holding the Heap_lock
+      gc_count_before      = SharedHeap::heap()->total_collections();
+      full_gc_count_before = SharedHeap::heap()->total_full_collections();
+    }
 
-      // Read the GC count while holding the Heap_lock
-      unsigned int gc_count_before      = SharedHeap::heap()->total_collections();
-      unsigned int full_gc_count_before = SharedHeap::heap()->total_full_collections();
-      {
-        MutexUnlocker mu(Heap_lock);  // give up heap lock, execute gets it back
-        VM_GenCollectForPermanentAllocation op(size, gc_count_before, full_gc_count_before,
-                                               next_cause);
-        VMThread::execute(&op);
-        if (!op.prologue_succeeded() || op.gc_locked()) {
-          assert(op.result() == NULL, "must be NULL if gc_locked() is true");
-          continue;  // retry and/or stall as necessary
-        }
-        obj = op.result();
-        assert(obj == NULL || SharedHeap::heap()->is_in_reserved(obj),
-               "result not in heap");
-        if (obj != NULL) {
-          return obj;
-        }
-      }
-      prev_cause = next_cause;
-      next_cause = GCCause::_last_ditch_collection;
-    } else {
+    // Give up heap lock above, VMThread::execute below gets it back
+    VM_GenCollectForPermanentAllocation op(size, gc_count_before, full_gc_count_before,
+                                           next_cause);
+    VMThread::execute(&op);
+    if (!op.prologue_succeeded() || op.gc_locked()) {
+      assert(op.result() == NULL, "must be NULL if gc_locked() is true");
+      continue;  // retry and/or stall as necessary
+    }
+    obj = op.result();
+    assert(obj == NULL || SharedHeap::heap()->is_in_reserved(obj),
+           "result not in heap");
+    if (obj != NULL) {
       return obj;
     }
+    prev_cause = next_cause;
+    next_cause = GCCause::_last_ditch_collection;
   }
 }
                                     
2008-12-10
EVALUATION

This and CR 6782457 have the same root cause, so
they will both be fixed by the following:-

6782457: CMS: Livelock in CompactibleFreeListSpace::block_size()
6736295: SIGSEGV in product jvm, assertion "these are the only valid states during a mark sweep" in fastdebug

webrev: http://analemma.sfbay/net/neeraja/export/ysr/gclocker/webrev

Despite the synopsis, 6782457 turned out to be more than
just a CMS-only livelock. Because of the need to parse cards
while allocating into them, CMS expects transiently
uninitialized but eventually initialized storage.
In this case, there was a mutator thread that blocked at
a GC-safepoint while holding on to uninitialized storage,
so we got into a livelock in CMS with the VM/GC thread busy-waiting
for the storage to be initialized. The basic problem is
that we cannot have uninitialized storage for any of
our current GC's, but some code restructuring while
implementing 6539517 inadvertently broke that invariant.
In particular, CR 6736295, which can currently
affect any of the SerialGC, CMS or G1, is also a
result of this bug.

The simple fix is to restructure the (locking in the)
code in the perm gen allocation retry loop to avoid
such blocking.

Many thanks to Jon for first finding this bug and,
to him and John for ongoing testing help as well.

Testing: jck12a017 from CR 6736295; jprt;
         other testing in progress
                                     
2008-12-11
EVALUATION

http://hg.openjdk.java.net/jdk7/hotspot-gc/hotspot/rev/d249b360e026
                                     
2008-12-11
SUGGESTED FIX

http://hg.openjdk.java.net/jdk7/hotspot-gc/hotspot/rev/d249b360e026
                                     
2008-12-13



Hardware and Software, Engineered to Work Together