Bug ID: JDK-8013895 G1: G1SummarizeRSetStats output on Linux needs improvement

Type: Bug
Component: hotspot
Sub-Component: gc
Affected Version: hs24,hs25

Priority: P4
Status: Resolved
Resolution: Fixed
OS: linux
CPU: x86

Submitted: 2013-05-03
Updated: 2014-04-17
Resolved: 2013-06-03

JDK 7	JDK 8	Other
7u60Fixed	8Fixed	hs25Fixed

Here's how one set of stats looks like:
 Concurrent RS processed -1761455529 cards
  Of 9224262 completed buffers:
      9224262 (100.0%) by conc RS threads.
            0 (  0.0%) by mutator threads.
  Conc RS threads times(s)
          0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00
  Total heap region rem set sizes = 1120947K.  Max = 4047K.
  Static structures = 600K, free_lists = 5312K.
    717820015 occupied cards represented.
    Max size region = 378:(O)[0x00000004fa000000,0x00000004fb000000,0x00000004fb000000], size = 4048K, occupied = 37303K.
    Did 209966 coarsenings.

There are multiple issues as listed:
1) -ve number for cards
2) Conc RS threads times are always 0.00
3) In-spite of specifying a period for stats (in this case -XX:G1SummarizeRSetStatsPeriod=1), the data seems to be cumulative.
Also, it seems that cards processed by mutator threads are always 0 (0.0%).

For #2, I understand that it is OS related, but, I'd appreciate any information, doesn't have to be exact (since per thread times are listed, so helps with performance analysis to see how many refinement threads were used during the specified period)
For #3, I guess that the stats period was probably introduced so as not to pollute with excess data, but I take that to mean that the stats per period would be printed and not cummulative data.

I am removing the critical-request label - this is a P4 and there is no approvals needed for P4/P5 or RFEs. However the focus should be on P1-P3 at this time of the release.
25-06-2013
I agree that os::elapsedTime() is better than nothing. I do not see why not try getrusage() though first. Maybe we could always print the summary/accumulated information at VM exit (If G1SummarizeRSetStats is enabled of course) I.e. if period flag is 0 -> only summary at VM exit, and if the period is >= 0; print every period'th GC the output for the current period, plus an accumulated summary at the end What do you think?
07-05-2013
I went through the output with Monica explaining the root causes for the issues raised before the CR was submitted. The information that Monica was looking at came from a third party. For the overflow I agree with Thomas - the data type should be changed. For the times - using the current os::elapsedTime() (which is is used to implement os::elapsedVTime() on linux) would be better than what is shown now. It won't be as accurate as gethrvtime() but would probably be sufficient. I also think this information that's printed should be cleared at the start of a new period. The current default for the period flag is 0. This means that the information is printed once at the end of the Java process. What's a reasonable default?
06-05-2013
#1: overflow, use of wrong data type for G1RemSet::_conc_refinement_cards #2: as suspected, this is an OS thing and only implemented for Solaris right now; maybe getrusage(RUSAGE_THREAD, ...) could be be used? (this is a 2.6.26 call, not sure what is Hotspot minimum Linux support) #3: this is true. It might be better to present this information per period. The number of cards processed by mutator threads is the number of buffers processed by the application threads while the application is running (and also not buffers created by deferred remembered set update, i.e. cards created during GC). Buffer processing during stw pause is accounted to the rs threads - maybe this should be separate? Did you try basically disabling the rs threads by setting -XX:-G1UseAdaptiveConcRefinement and -XX:G1ConcRefinementGreenZone=0 -XX:G1ConcRefinementYellowZone=0 -XX:G1ConcRefinementRedZone=0 ? If I do that, it shows some buffers processed by the mutators. E.g. Concurrent RS processed 451889 cards Of 2058 completed buffers: 1388 ( 67.4%) by conc RS threads. 670 ( 32.6%) by mutator threads. Conc RS threads times(s) 6.91 6.74 0.04 Total heap region rem set sizes = 468K. Max = 12K. Static structures = 6K, free_lists = 43K. 309242 occupied cards represented. Max size region = 3:(O)[0xe1900000,0xe19fff60,0xe1a00000], size = 13K, occupied = 25K. Did 0 coarsenings. edit: (the example also shows some thread times, but this was a run on sparc/solaris just to show that it is possible to get nonzero numbers for mutator processing; the code to retrieve thread times on linux is missing completely)
06-05-2013
This probably affects hs24 but AFAIK we don't plan to fix this in hs24
06-05-2013

Relates :	JDK-8040804 - G1: Concurrent mark stuck in loop calling os::elapsedVTime()
Relates :	JDK-8014078 - G1: improve remembered set summary information by providing per region type information