JDK-7072527 : CMS: JMM GC counters overcount in some cases
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: hs22
  • Priority: P4
  • Status: Closed
  • Resolution: Fixed
  • OS: generic
  • CPU: generic
  • Submitted: 2011-07-29
  • Updated: 2013-04-24
  • Resolved: 2011-11-25
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 7 Other
7u2Fixed hs22Fixed
Related Reports
Relates :  
Relates :  
Description
From Krystal Mok (###@###.###) on the hotspot-gc-dev@o.j.n list:-

Hi all,

I've been looking at a strange inconsistency of full GC count recorded by jvmstat and JMM counters. I'd like to know which ones of the following behaviors are by design, which ones are bugs, and which ones are just my misunderstanding. I apologize for making a short story long...

=====================================================

The counters involved:

* A jvmstat counter named "sun.gc.collector.1.invocations" keeps track of the number of pauses occured as a result of a major collection. It is used by utilities such as jstat as the source of "FGC" (full collection count), and the old gen collection count in Visual GC. It's updated by an TraceCollectorStats object.
* A JMM counter, GCMemoryManager::_num_collections, keeps track of the number of collections that have ended. This counter is used as HotSpot's implementation of the JMX GarbageCollectorMXBean.getCollectionCount(). It's updated by either a TraceMemoryManagerStats object or a TraceCMSMemoryManagerStats object.

To show the situation, I've made a screenshot of a VisualVM and a JConsole running side by side, both are monitoring the VisualVM's GC stats:
http://dl.iteye.com/upload/attachment/524811/913cb0e1-7add-3ac0-a718-24ca705cad22.png
(I'll upload the screenshot to somewhere else if anybody can't see it)
The VisualVM instance is running on JDK6u26, with ParNew+CMS.
In the screenshot, Visual GC reports that the old gen collection count is 20, while JConsole reports 10.

I see that there was this bug:
6580448: CMS: Full GC collection count mismatch between GarbageCollectorMXBean and jvmstat (VisualGC)
I don't think the current implementation has a bug in the sense that the two counters don't report the same number.

This behavior seems reasonable, but the naming of the value in these tools are confusing: both tools say "collections", but apparently the number in Visual GC means "number of pauses" where as the number in JConsole means "number of collection cycles". It'd be great if the difference could be documented somewhere, if that's the intended behavior.

And then the buggy behavior. Code demo posted on gist: https://gist.github.com/1106263
Starting from JDK6u23, when using CMS without ExplicitGCInvokesConcurrent, System.gc() (or Runtime.getRuntime().gc(), or MemoryMXBean.gc() via JMX) would make the JMM GC counter increment by 2 per invocation, while the jvmstat counter is only incremented by 1. I believe the latter is correct and the former needs some fixing.

=====================================================

My understanding of the behavior shown above:

1. The concurrent GC part:

There are 2 pauses in a CMS concurrent GC cycle, one in the initial mark phase, and one in the final remark phase.
To trigger a concurrent GC cycle, the CMS thread wakes up periodically to see if it shouldConcurrentCollect(), and trigger a cycle when the predicate returned true, or goes back to sleep if the predicate returned false. The whole concurrent GC cycle doesn't go through GenCollectedHeap::do_collection().

The jvmstat counter for old gen pauses is updated in CMSCollector::do_CMS_operation(CMS_op_type op), which exactly covers both pause phases.

The JMM counter, however, is updated in the concurrent sweep phase, CMSCollector::sweep(bool asynch), if there was no concurrent mode failure; or it is updated in CMSCollector::do_compaction_work(bool clear_all_soft_refs) in case of a bailout due to concurrent mode failure (advertised as so in the code comments). So that's an increment by 1 per concurrent GC cycle, which does reflect the "number of collection cycles", fair enough.

So far so good.

2. The System.gc() part:

Without ExplicitGCInvokesConcurrent set, System.gc() does a stop-the-world full GC, which consists of only one pause, so "number of pauses" would equal "number of collections" in this case.
It will go into GenCollectedHeap::do_collection(); both the jvmstat and the JMM GC counter gets incremented by 1 here,

TraceCollectorStats tcs(_gens[i]->counters());
TraceMemoryManagerStats tmms(_gens[i]->kind());

But, drilling down into:
_gens[i]->collect(full, do_clear_all_soft_refs, size, is_tlab);

That'll eventually go into:
CMSCollector::acquire_control_and_collect(bool full, bool clear_all_soft_refs)

System.gc() is user requested so that'll go further into mark-sweep-compact:
CMSCollector::do_compaction_work(bool clear_all_soft_refs)
And here, it increments the JMM GC counter again (remember it was in the concurrent GC path too, to handle bailouts), even though this is still in the same collection. This leads to the "buggy behavior" mentioned earlier.

The JMM GC counter wasn't added to CMS until this fix got in:
6581734: CMS Old Gen's collection usage is zero after GC which is incorrect

The code added to CMSCollector::do_compaction_work() works fine in the concurrent GC path, but interacts badly with the original logic in GenCollectedHeap::do_collection().

=====================================================

I thought all concurrent mode failures/interrupts come from GenCollectedHeap::do_collection(). If that's the case, then it seems unnecessary to update the JMM GC counter in CMSCollector::do_compaction_work(), simply removing it should fix the problem.

With that, I'd purpose the following (XS) change: (diff against HS20)

diff -r f0f676c5a2c6 src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp
--- a/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp      Tue Mar 15 19:30:16 2011 -0700
+++ b/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp      Thu Jul 28 00:02:41 2011 +0800
@@ -2022,9 +2022,6 @@
                                             _intra_sweep_estimate.padded_average());
   }

-  {
-    TraceCMSMemoryManagerStats();
-  }
   GenMarkSweep::invoke_at_safepoint(_cmsGen->level(),
     ref_processor(), clear_all_soft_refs);
   #ifdef ASSERT

The same goes for the changes in:
7036199: Adding a notification to the implementation of GarbageCollectorMXBeans

=====================================================

P.S. Is there an "official" name for the counters that I referred to as "jvmstat counters" above? Is it just jvmstat, or PerfData or HSPERFDATA?

Comments
EVALUATION http://hg.openjdk.java.net/hsx/hotspot-rt/hotspot/rev/41e6ee74f879
17-08-2011

EVALUATION http://hg.openjdk.java.net/hsx/hotspot-main/hotspot/rev/41e6ee74f879
17-08-2011

EVALUATION http://hg.openjdk.java.net/hsx/hotspot-gc/hotspot/rev/41e6ee74f879
11-08-2011

EVALUATION As submitted, webrev at: http://cr.openjdk.java.net/~kevinw/7072527/webrev.01/
02-08-2011

PUBLIC COMMENTS Some follow-up email:- Hi Kevin -- thanks for jumping on this! More inline below ... On 07/28/11 09:33, Krystal Mok wrote: > Hi Kevin, > > Thank you for taking care of this, and it's good to see the problem is verified. > > I think whether or not the suggested fix is sufficient depends on what paths can reach CMSCollector::do_compaction_work(). If all paths that can reach CMSCollector::do_compaction_work() come from GenCollectedHeap::do_collection(), then the fix should be good to go. Otherwise it'll need a better workaround. > > I believe all concurrent mode failures/interrupts (which includes the System.gc() case) does come from GenCollectedHeap::do_collection(), but I'm not exactly sure about this, could anybody please clarify on it? Yes, i believe this is indeed the case, and my browsing of the code using cscope seemed to confirm that belief. More below ... > > Regards, > Kris Mok > > On Thu, Jul 28, 2011 at 10:12 PM, Kevin Walls <###@###.### <mailto:###@###.###>> wrote: > > __ > Hi -- > > 6580448 was marked as a duplicate of 6581734, which fixed the fact > that CMS collections were just not counted at all - with CMS, only a > stop the world full gc would be counted in the stats. > > But looks like you're right... Here is a quick variation of the > testcase from 6581734 which shows the same thing, and this verifies > the same, and is solved by ExplicitGCInvokesConcurrent. If there is > no other feedback I can test if the removal of the > TraceCMSMemoryManagerStats() call in > CMSCollector::do_compaction_work is all we need... Kevin, yes, it would be great if you could verify this and push the fix. I am not sure if the push would need to wait for the signing of OCA from Kris, but best to check with Those Who Would Know Such Things. Since the original CR has been closed, i'll open one momentarily and can make you RE (if that's OK with you). I'll be happy to serve as reviewer of the change. As regards the jstat counter reporting two pauses per concurrent CMS cycle, I am of two minds on what the original intention was. I'd have originally regarded the double increment as a bug, but as you state it is really two pauses, even if part of a single cycle. And it makes sense to count them as two. I agree that this should be documented and left alone, given how long we have had this behaviour, and the alternative (of counting cycles, rather than pauses) may be no better (or arguably worse). There's actually an open CR for this which we can redirect into a CR to update the relevant documentation. -- ramki > > Regards > Kevin > > > /* > * Copyright (c) 2011, Oracle and/or its affiliates. All rights > reserved. > * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. > * > * This code is free software; you can redistribute it and/or modify it > * under the terms of the GNU General Public License version 2 only, as > * published by the Free Software Foundation. > * > * This code is distributed in the hope that it will be useful, but > WITHOUT > * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or > * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License > * version 2 for more details (a copy is included in the LICENSE > file that > * accompanied this code). > * > * You should have received a copy of the GNU General Public License > version > * 2 along with this work; if not, write to the Free Software > Foundation, > * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA. > * > * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA > 94065 USA > * or visit www.oracle.com <http://www.oracle.com> if you need > additional information or have any > * questions. > */ > > /* > * @test TestFullGCount.java > * @bug > * @summary > * @run main/othervm -XX:+UseConcMarkSweepGC TestFullGCCount > * > */ > import java.util.*; > import java.lang.management.*; > > > > public class TestFullGCCount { > > private String poolName = "CMS"; > private String collectorName = "ConcurrentMarkSweep"; > > public static void main(String [] args) { > > TestFullGCCount t = null; > if (args.length==2) { > t = new TestFullGCCount(args[0], args[1]); > } else { > System.out.println("Defaulting to monitor CMS pool and > collector."); > t = new TestFullGCCount(); > } > t.run(); > > } > > public TestFullGCCount(String pool, String collector) { > poolName = pool; > collectorName = collector; > } > public TestFullGCCount() { > } > > public void run() { > > int count = 0; > int iterations = 20; > long counts[] = new long[iterations]; > boolean diffAlways2 = true; // assume we will fail > > for (int i=0; i<iterations; i++) { > System.gc(); > counts[i] = checkStats(); > if (i>0) { > if (counts[i] - counts[i-1] != 2) { > diffAlways2 = false; > } > } > } > if (diffAlways2) { > throw new RuntimeException("FAILED: difference in count is > always 2."); > } > System.out.println("Passed."); > } > > private long checkStats() { > long count = 0; > List<MemoryPoolMXBean> pools = > ManagementFactory.getMemoryPoolMXBeans(); > List<GarbageCollectorMXBean> collectors = > ManagementFactory.getGarbageCollectorMXBeans(); > for (int i=0; i<collectors.size(); i++) { > GarbageCollectorMXBean collector = collectors.get(i); > String name = collector.getName(); > if (name.contains(collectorName)) { > System.out.println(name + ": collection count = " > + collector.getCollectionCount()); > count = collector.getCollectionCount(); > } > } > return count; > > } > > } > > > On 27/07/11 17:12, Krystal Mok wrote: >> Hi all, >> >> I've been looking at a strange inconsistency of full GC count >> recorded by jvmstat and JMM counters. I'd like to know which ones >> of the following behaviors are by design, which ones are bugs, and >> which ones are just my misunderstanding. I apologize for making a >> short story long... >> >> ===================================================== >> >> The counters involved: >> >> * A jvmstat counter named "sun.gc.collector.1.invocations" keeps >> track of the number of pauses occured as a result of a major >> collection. It is used by utilities such as jstat as the source of >> "FGC" (full collection count), and the old gen collection count in >> Visual GC. It's updated by an TraceCollectorStats object. >> * A JMM counter, GCMemoryManager::_num_collections, keeps track of >> the number of collections that have ended. This counter is used as >> HotSpot's implementation of the JMX >> GarbageCollectorMXBean.getCollectionCount(). It's updated by >> either a TraceMemoryManagerStats object or a >> TraceCMSMemoryManagerStats object. >> >> To show the situation, I've made a screenshot of a VisualVM and a >> JConsole running side by side, both are monitoring the VisualVM's >> GC stats: >> http://dl.iteye.com/upload/attachment/524811/913cb0e1-7add-3ac0-a718-24ca705cad22.png >> (I'll upload the screenshot to somewhere else if anybody can't see it) >> The VisualVM instance is running on JDK6u26, with ParNew+CMS. >> In the screenshot, Visual GC reports that the old gen collection >> count is 20, while JConsole reports 10. >> >> I see that there was this bug: >> 6580448: CMS: Full GC collection count mismatch between >> GarbageCollectorMXBean and jvmstat (VisualGC) >> I don't think the current implementation has a bug in the sense >> that the two counters don't report the same number. >> >> This behavior seems reasonable, but the naming of the value in >> these tools are confusing: both tools say "collections", but >> apparently the number in Visual GC means "number of pauses" where >> as the number in JConsole means "number of collection cycles". >> It'd be great if the difference could be documented somewhere, if >> that's the intended behavior. >> >> And then the buggy behavior. Code demo posted on gist: >> https://gist.github.com/1106263 >> Starting from JDK6u23, when using CMS without >> ExplicitGCInvokesConcurrent, System.gc() (or >> Runtime.getRuntime().gc(), or MemoryMXBean.gc() via JMX) would >> make the JMM GC counter increment by 2 per invocation, while the >> jvmstat counter is only incremented by 1. I believe the latter is >> correct and the former needs some fixing. >> >> ===================================================== >> >> My understanding of the behavior shown above: >> >> 1. The concurrent GC part: >> >> There are 2 pauses in a CMS concurrent GC cycle, one in the >> initial mark phase, and one in the final remark phase. >> To trigger a concurrent GC cycle, the CMS thread wakes up >> periodically to see if it shouldConcurrentCollect(), and trigger a >> cycle when the predicate returned true, or goes back to sleep if >> the predicate returned false. The whole concurrent GC cycle >> doesn't go through GenCollectedHeap::do_collection(). >> >> The jvmstat counter for old gen pauses is updated in >> CMSCollector::do_CMS_operation(CMS_op_type op), which exactly >> covers both pause phases. >> >> The JMM counter, however, is updated in the concurrent sweep >> phase, CMSCollector::sweep(bool asynch), if there was no >> concurrent mode failure; or it is updated in >> CMSCollector::do_compaction_work(bool clear_all_soft_refs) in case >> of a bailout due to concurrent mode failure (advertised as so in >> the code comments). So that's an increment by 1 per concurrent GC >> cycle, which does reflect the "number of collection cycles", fair >> enough. >> >> So far so good. >> >> 2. The System.gc() part: >> >> Without ExplicitGCInvokesConcurrent set, System.gc() does a >> stop-the-world full GC, which consists of only one pause, so >> "number of pauses" would equal "number of collections" in this case. >> It will go into GenCollectedHeap::do_collection(); both the >> jvmstat and the JMM GC counter gets incremented by 1 here, >> >> TraceCollectorStats tcs(_gens[i]->counters()); >> TraceMemoryManagerStats tmms(_gens[i]->kind()); >> >> But, drilling down into: >> _gens[i]->collect(full, do_clear_all_soft_refs, size, is_tlab); >> >> That'll eventually go into: >> CMSCollector::acquire_control_and_collect(bool full, bool >> clear_all_soft_refs) >> >> System.gc() is user requested so that'll go further into >> mark-sweep-compact: >> CMSCollector::do_compaction_work(bool clear_all_soft_refs) >> And here, it increments the JMM GC counter again (remember it was >> in the concurrent GC path too, to handle bailouts), even though >> this is still in the same collection. This leads to the "buggy >> behavior" mentioned earlier. >> >> The JMM GC counter wasn't added to CMS until this fix got in: >> 6581734: CMS Old Gen's collection usage is zero after GC which is >> incorrect >> >> The code added to CMSCollector::do_compaction_work() works fine in >> the concurrent GC path, but interacts badly with the original >> logic in GenCollectedHeap::do_collection(). >> >> ===================================================== >> >> I thought all concurrent mode failures/interrupts come from >> GenCollectedHeap::do_collection(). If that's the case, then it >> seems unnecessary to update the JMM GC counter in >> CMSCollector::do_compaction_work(), simply removing it should fix >> the problem. >> >> With that, I'd purpose the following (XS) change: (diff against HS20) >> >> diff -r f0f676c5a2c6 >> src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp >> --- >> a/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp >> Tue Mar 15 19:30:16 2011 -0700 >> +++ >> b/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp >> Thu Jul 28 00:02:41 2011 +0800 >> @@ -2022,9 +2022,6 @@ >> _intra_sweep_estimate.padded_average()); >> } >> >> - { >> - TraceCMSMemoryManagerStats(); >> - } >> GenMarkSweep::invoke_at_safepoint(_cmsGen->level(), >> ref_processor(), clear_all_soft_refs); >> #ifdef ASSERT >> >> The same goes for the changes in: >> 7036199: Adding a notification to the implementation of >> GarbageCollectorMXBeans >> >> ===================================================== >> >> P.S. Is there an "official" name for the counters that I referred >> to as "jvmstat counters" above? Is it just jvmstat, or PerfData or >> HSPERFDATA? >
29-07-2011