JDK-8215523 : jstat reports incorrect values for OU for CMS GC
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 8u192
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2018-12-18
  • Updated: 2019-08-12
  • Resolved: 2019-07-01
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 14
14 b04Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Description
With CMS GC, at certain points, 'OU' (Old usage) values reported in the jstat output increase first and then drop down. This is wrong because the 'old usage' drops down without any old generation collection.

Comments
Fwiw, I could pinpoint the situation a bit more: these weird spikes seem to occur when CMS expands the old gen heap, i.e. I think making the CMS data structures cover more area than before.
02-07-2019

Solution: For CMS, we can maintain an internal variable for the space usage. This internal variable gets updated only at points where the CMS space usage computation is guaranteed to be accurate. jstat and other memory usage monitoring tools get access to this stable value of used space.
21-06-2019

For the 'used' values of different memory spaces, jstat uses a periodic task to read these values and updates the corresponding jstat counters. For CMS generation, there is a sampler that periodically updates the 'used' counter: class GenerationUsedHelper : public PerfLongSampleHelper { 103 private: 104 Generation* _gen; 105 106 public: 107 GenerationUsedHelper(Generation* g) : _gen(g) { } 108 109 inline jlong take_sample() { 110 return _gen->used(); 111 } 112 }; The above used() call in turn makes a call to CompactibleFreeListSpace::used(). 376 size_t CompactibleFreeListSpace::used() const { 377 return capacity() - free(); 378 } 379 380 size_t CompactibleFreeListSpace::free() const { 381 // "MT-safe, but not MT-precise"(TM), if you will: i.e. 382 // if you do this while the structures are in flux you 383 // may get an approximate answer only; for instance 384 // because there is concurrent allocation either 385 // directly by mutators or for promotion during a GC. 386 // It's "MT-safe", however, in the sense that you are guaranteed 387 // not to crash and burn, for instance, because of walking 388 // pointers that could disappear as you were walking them. 389 // The approximation is because the various components 390 // that are read below are not read atomically (and 391 // further the computation of totalSizeInIndexedFreeLists() 392 // is itself a non-atomic computation. The normal use of 393 // this is during a resize operation at the end of GC 394 // and at that time you are guaranteed to get the 395 // correct actual value. However, for instance, this is 396 // also read completely asynchronously by the "perf-sampler" 397 // that supports jvmstat, and you are apt to see the values 398 // flicker in such cases. 399 assert(_dictionary != NULL, "No _dictionary?"); 400 return (_dictionary->total_chunk_size(DEBUG_ONLY(freelistLock())) + 401 totalSizeInIndexedFreeLists() + 402 _smallLinearAllocBlock._word_size) * HeapWordSize; 403 } As stated above in the code comments, free() can return an imprecise approximate value when allocations are happening in the CompactibleFreeListSpace at the same time, e.g. during direct allocations or promotions. That is the reason we see the OU value as 349122.2 because it was computed when objects were being promoted into the CompactibleFreeListSpace.
21-06-2019

Attached is the test program, and the jstat and gc log files generated for this test program by running the script perf_jstat.sh. Looking at the logs in detail reveals that jstat is actually reading the intermediate usage values for the CMS Gen while a ParNew collection is in progress. Let's consider the following jstat log entries. Timestamp S0C S1C S0U S1U EC EU OC OU MC MU CCSC CCSU YGC YGCT FGC FGCT GCT 7.5 17472.0 17472.0 6386.3 0.0 139776.0 0.0 349568.0 27193.6 4864.0 2668.2 512.0 284.5 179 0.589 0 0.000 0.589 7.5 17472.0 17472.0 6386.3 17472.0 139776.0 139776.0 349568.0 349122.2 4864.0 2668.2 512.0 284.5 179 0.589 0 0.000 0.589 7.5 17472.0 17472.0 0.0 17472.0 139776.0 0.0 349568.0 27664.2 4864.0 2668.2 512.0 284.5 179 0.593 0 0.000 0.593 These entries correspond to the 179th ParNew collection. The first entry corresponds to the time right after the 178th collection, and we can see the Eden usage is 0. Eden starts filling up after that. The second entry corresponds to the time when 179 was ongoing, it had copied over the live objects to S1 and the Old gen but had not yet cleared the old copies of those live objects. This is evident from the fact that S0U is still 6386.3 which becomes 0 in the third log entry. Due to this intermediate reading of values, old gen usage shows a large amount of growth (349122.2 in second entry) because ParNew collection copied over objects to old gen but the actual old gen usage (27664.2 in third entry) had not been calculated yet. The third entry shows that S0 was cleared and the old gen usage had gotten computed (27664.2) at that point. The 179th young collection from gc logs: 7.533: [GC (Allocation Failure) 7.533: [ParNew: 146162K->17472K(157248K), 0.0037769 secs] 173355K->45136K(506816K), 0.0038270 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
21-06-2019