JDK-6782457 : CMS: Livelock in CompactibleFreeListSpace::block_size().
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 7
  • Priority: P2
  • Status: Resolved
  • Resolution: Fixed
  • OS: generic
  • CPU: generic
  • Submitted: 2008-12-08
  • Updated: 2011-12-27
  • Resolved: 2009-01-31
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other JDK 6 JDK 7 Other
1.4.2_22,hs14Resolved 6u14Fixed 7Fixed hs14Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Relates :  
Description
Running the test Compact_InternedStrings I observe a live lock when running
with CMS where 

Thread 26 has done an allocation out of the perm gen and is trying
to reacquire  the heap lock and is blocked.  The perm gen object has
not been initialized.

Thread 25 has requested a collection (VM_GenCollectForPermanentAllocation) and is
executing in VMThread::execute() and has grabbed the heap lock as
part of the doit_prologue().

Thread 16 is the VM thread and it is  trying to do a compaction.
It is in the loop in CompactibleFreeListSpace::block_size() waiting
for the block to be initialized (never will be).

Comments
SUGGESTED FIX http://hg.openjdk.java.net/jdk7/hotspot-gc/hotspot/rev/d249b360e026
13-12-2008

EVALUATION http://hg.openjdk.java.net/jdk7/hotspot-gc/hotspot/rev/d249b360e026
11-12-2008

EVALUATION This and CR 6782457 have the same root cause, so they will both be fixed by the following:- 6782457: CMS: Livelock in CompactibleFreeListSpace::block_size() 6736295: SIGSEGV in product jvm, assertion "these are the only valid states during a mark sweep" in fastdebug webrev: http://analemma.sfbay/net/neeraja/export/ysr/gclocker/webrev Despite the synopsis, 6782457 turned out to be more than just a CMS-only livelock. Because of the need to parse cards while allocating into them, CMS expects transiently uninitialized but eventually initialized storage. In this case, there was a mutator thread that blocked at a GC-safepoint while holding on to uninitialized storage, so we got into a livelock in CMS with the VM/GC thread busy-waiting for the storage to be initialized. The basic problem is that we cannot have uninitialized storage for any of our current GC's, but some code restructuring while implementing 6539517 inadvertently broke that invariant. In particular, CR 6736295, which can currently affect any of the SerialGC, CMS or G1, is also a result of this bug. The simple fix is to restructure the (locking in the) code in the perm gen allocation retry loop to avoid such blocking. Many thanks to Jon for first finding this bug and, to him and John for ongoing testing help as well. Testing: jck12a017 from CR 6736295; jprt; other testing in progress
11-12-2008

SUGGESTED FIX The fix turns out to be fairly straightforward. We restructure the locking code in the loop so as to avoid the attempt to relock the heap lock -- which can cause us to stall -- after having succeeded in obtaining (not yet initialized) storage. diff -r 8a0c882e46d6 src/share/vm/memory/permGen.cpp --- a/src/share/vm/memory/permGen.cpp Thu Dec 04 13:21:16 2008 -0800 +++ b/src/share/vm/memory/permGen.cpp Tue Dec 09 17:35:19 2008 -0800 @@ -26,20 +26,24 @@ #include "incls/_permGen.cpp.incl" HeapWord* PermGen::mem_allocate_in_gen(size_t size, Generation* gen) { - MutexLocker ml(Heap_lock); GCCause::Cause next_cause = GCCause::_permanent_generation_full; GCCause::Cause prev_cause = GCCause::_no_gc; + unsigned int gc_count_before, full_gc_count_before; + HeapWord* obj; for (;;) { - HeapWord* obj = gen->allocate(size, false); - if (obj != NULL) { - return obj; - } - if (gen->capacity() < _capacity_expansion_limit || - prev_cause != GCCause::_no_gc) { - obj = gen->expand_and_allocate(size, false); - } - if (obj == NULL && prev_cause != GCCause::_last_ditch_collection) { + { + MutexLocker ml(Heap_lock); + if ((obj = gen->allocate(size, false)) != NULL) { + return obj; + } + if (gen->capacity() < _capacity_expansion_limit || + prev_cause != GCCause::_no_gc) { + obj = gen->expand_and_allocate(size, false); + } + if (obj != NULL || prev_cause == GCCause::_last_ditch_collection) { + return obj; + } if (GC_locker::is_active_and_needs_gc()) { // If this thread is not in a jni critical section, we stall // the requestor until the critical section has cleared and @@ -61,31 +65,27 @@ HeapWord* PermGen::mem_allocate_in_gen(s return NULL; } } + // Read the GC count while holding the Heap_lock + gc_count_before = SharedHeap::heap()->total_collections(); + full_gc_count_before = SharedHeap::heap()->total_full_collections(); + } - // Read the GC count while holding the Heap_lock - unsigned int gc_count_before = SharedHeap::heap()->total_collections(); - unsigned int full_gc_count_before = SharedHeap::heap()->total_full_collections(); - { - MutexUnlocker mu(Heap_lock); // give up heap lock, execute gets it back - VM_GenCollectForPermanentAllocation op(size, gc_count_before, full_gc_count_before, - next_cause); - VMThread::execute(&op); - if (!op.prologue_succeeded() || op.gc_locked()) { - assert(op.result() == NULL, "must be NULL if gc_locked() is true"); - continue; // retry and/or stall as necessary - } - obj = op.result(); - assert(obj == NULL || SharedHeap::heap()->is_in_reserved(obj), - "result not in heap"); - if (obj != NULL) { - return obj; - } - } - prev_cause = next_cause; - next_cause = GCCause::_last_ditch_collection; - } else { + // Give up heap lock above, VMThread::execute below gets it back + VM_GenCollectForPermanentAllocation op(size, gc_count_before, full_gc_count_before, + next_cause); + VMThread::execute(&op); + if (!op.prologue_succeeded() || op.gc_locked()) { + assert(op.result() == NULL, "must be NULL if gc_locked() is true"); + continue; // retry and/or stall as necessary + } + obj = op.result(); + assert(obj == NULL || SharedHeap::heap()->is_in_reserved(obj), + "result not in heap"); + if (obj != NULL) { return obj; } + prev_cause = next_cause; + next_cause = GCCause::_last_ditch_collection; } }
10-12-2008

EVALUATION See description and comments sections.
09-12-2008