United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
Bug ID: JDK-6944166 G1: explicit GCs are not always handled correctly
JDK-6944166 : G1: explicit GCs are not always handled correctly

Details
Type:
Bug
Submit Date:
2010-04-15
Status:
Closed
Updated Date:
2012-10-01
Project Name:
JDK
Resolved Date:
2011-03-08
Component:
hotspot
OS:
generic,solaris_10
Sub-Component:
gc
CPU:
x86,generic
Priority:
P3
Resolution:
Fixed
Affected Versions:
hs17,hs18,6u21
Fixed Versions:
hs19 (b05)

Related Reports
Backport:
Backport:
Backport:
Backport:
Duplicate:
Relates:
Relates:
Relates:

Sub Tasks

Description
G1 does not currently correctly handle explicit GCs and might drop them when it shouldn't. In particular, explicit full GCs that are triggered by the heap inspection tools (i.e., jmap) will be ignored if the DisableExplicitGC parameter is set to true.

                                    

Comments
SUGGESTED FIX

First, do_collection() should not return if full == true and DisableExplicitGC is set.

Second, G1 should not do a Full GC when do_collection() is called with GCCause::_gc_locker and should instead trigger an evacuation pause if possible.
                                     
2010-04-15
EVALUATION

In the do_collection() method, which will perform a Full GC, we actually do the following test:

  if (full && DisableExplicitGC) {
    return;
  }

("full" in this case means "explicit")

-----------------------------------------------------

Here's some analysis on what the above check implies (from an e-mail I sent to Ramki):

Your intuition seems to be correct. First, you are correct that a System.gc() should never enqueue a VM op if DisableExplicitGC is set:

JVM_ENTRY_NO_ENV(void, JVM_GC(void))
 JVMWrapper("JVM_GC");
 if (!DisableExplicitGC) {
   Universe::heap()->collect(GCCause::_java_lang_system_gc);
 }
JVM_END

Now, the GC locker can initiate a GC using the collect() method:

void GC_locker::jni_unlock_slow() {
   ...
   if (!is_active()) {
     _doing_gc = true;
     {
       // Must give up the lock while at a safepoint
       MutexUnlocker munlock(JNICritical_lock);
       Universe::heap()->collect(GCCause::_gc_locker);
     }
     _doing_gc = false;
   }
   ...
}

and G1 seems to initiate a VM_G1CollectFull VM op for all causes but _scavenge_alot:

void G1CollectedHeap::collect(GCCause::Cause cause) {
 ...
 switch (cause) {
   case GCCause::_scavenge_alot: {
     ...
   }
   default: {
     // In all other cases, we currently do a full gc.
     VM_G1CollectFull op(gc_count_before, cause);
     VMThread::execute(&op);
   }
 }
}

VM_G1CollectionFull is one of two places where do_full_collection() is called from:

void VM_G1CollectFull::doit() {
 JvmtiGCFullMarker jgcm;
 G1CollectedHeap* g1h = G1CollectedHeap::heap();
 GCCauseSetter x(g1h, _gc_cause);
 g1h->do_full_collection(false /* clear_all_soft_refs */);
}

the other one being:

void G1CollectedHeap::collect_as_vm_thread(GCCause::Cause cause) {
 ...
 switch (cause) {
   case GCCause::_heap_inspection:
   case GCCause::_heap_dump: {
     HandleMark hm;
     do_full_collection(false);         // don't clear all soft refs
     break;
   }
   default: // XXX FIX ME
     ShouldNotReachHere(); // Unexpected use of this function
 }
}

Finally, do_full_collection() is the only place where do_collection() is called with full == true:

void G1CollectedHeap::do_full_collection(bool clear_all_soft_refs) {
 do_collection(true /* full */, clear_all_soft_refs, 0);
}

So, could the fact that we do return from the Full GC if DisableExplicitGC is on be actually wrong and prevent Full GCs from happening when they should actually happen for reasons other than System.gc()? This is what it looks like to me. 

-----------------------------------------------------

Ramki's reply (which basically confirms that the heap inspection-induced Full GCs will not happen if DisableExplicitGC is set):

So the things that are affected are:
(1) gc locker induced full gc -- this should ideally be some form of
    "trigger a concurrent collection cycle"; not doing a full STW gc
    is, serendipitously, probably the right thing here.
(2) heap inspection induced full gc -- here jmap -histo:live or
    jmap -dump or +PrintClassHistogram
    will not do a full GC before they dump the heap; so it's apt to be
    misleading at best, and a violation of expected semantics at worst.

So while I agree that the current state of affairs is not correct,
removing the return without making other changes may, because it
would convert (1) to a full STW GC, be somewhat worse (although it
would fix the sematics of (2) which is desirable, the harm from
a full STW gc from GC locker may possibly be worse for performance;
i am using the argument that performance is more important than
the slight loss of serviceability in this case. It is possible that
that is the wrong tradeoff for some.).

*** (#1 of 1): [ UNSAVED ] ###@###.###
                                     
2010-04-15
SUGGESTED FIX

To diable abandoned pauses, we ensure that choose_collection_set() always returns false:
 
 @@ -2996,7 +2997,7 @@
   _recorded_non_young_cset_choice_time_ms =
     (non_young_end_time_sec - non_young_start_time_sec) * 1000.0;
 
-  return abandon_collection;
+  return false;
 }

and we probably have to remove some asserts that assume that the collection set is non-empty. We'll remove the code paths that are only touched by pauses being abandoned as part of another CR (6963209).
                                     
2010-06-22
EVALUATION

http://hg.openjdk.java.net/jdk7/hotspot-gc/hotspot/rev/4e5661ba9d98
                                     
2010-07-15
EVALUATION

http://hg.openjdk.java.net/jdk7/hotspot/hotspot/rev/4e5661ba9d98
                                     
2010-07-21
EVALUATION

http://hg.openjdk.java.net/jdk7/hotspot-rt/hotspot/rev/4e5661ba9d98
                                     
2010-07-25



Hardware and Software, Engineered to Work Together