United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
JDK-8013895 : G1: G1SummarizeRSetStats output on Linux needs improvement

Details
Type:
Bug
Submit Date:
2013-05-03
Status:
Resolved
Updated Date:
2014-04-17
Project Name:
JDK
Resolved Date:
2013-06-03
Component:
hotspot
OS:
linux
Sub-Component:
gc
CPU:
x86
Priority:
P4
Resolution:
Fixed
Affected Versions:
hs24,hs25
Fixed Versions:
hs25 (b36)

Related Reports
Backport:
Backport:
Backport:
Backport:
Relates:
Relates:

Sub Tasks

Description
Here's how one set of stats looks like:
 Concurrent RS processed -1761455529 cards
  Of 9224262 completed buffers:
      9224262 (100.0%) by conc RS threads.
            0 (  0.0%) by mutator threads.
  Conc RS threads times(s)
          0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00
  Total heap region rem set sizes = 1120947K.  Max = 4047K.
  Static structures = 600K, free_lists = 5312K.
    717820015 occupied cards represented.
    Max size region = 378:(O)[0x00000004fa000000,0x00000004fb000000,0x00000004fb000000], size = 4048K, occupied = 37303K.
    Did 209966 coarsenings.

There are multiple issues as listed:
1) -ve number for cards
2) Conc RS threads times are always 0.00
3) In-spite of specifying a period for stats (in this case -XX:G1SummarizeRSetStatsPeriod=1), the data seems to be cumulative.
Also, it seems that cards processed by mutator threads are always 0 (0.0%).

For #2, I understand that it is OS related, but, I'd appreciate any information, doesn't have to be exact (since per thread times are listed, so helps with performance analysis to see how many refinement threads were used during the specified period)
For #3, I guess that the stats period was probably introduced so as not to pollute with excess data, but I take that to mean that the stats per period would be printed and not cummulative data.



                                    

Comments
This probably affects hs24 but AFAIK we don't plan to fix this in hs24
                                     
2013-05-06
#1: overflow, use of wrong data type for G1RemSet::_conc_refinement_cards
#2: as suspected, this is an OS thing and only implemented for Solaris right now; maybe getrusage(RUSAGE_THREAD, ...) could be be used? (this is a 2.6.26 call, not sure what is Hotspot minimum Linux support)
#3: this is true. It might be better to present this information per period.

The number of cards processed by mutator threads is the number of buffers processed by the application threads while the application is running (and also not buffers created by deferred remembered set update, i.e. cards created during GC). Buffer processing during stw pause is accounted to the rs threads - maybe this should be separate?

Did you try basically disabling the rs threads by setting -XX:-G1UseAdaptiveConcRefinement and -XX:G1ConcRefinementGreenZone=0 -XX:G1ConcRefinementYellowZone=0 -XX:G1ConcRefinementRedZone=0 ? If I do that, it shows some buffers processed by the mutators.
E.g.

 Concurrent RS processed 451889 cards
  Of 2058 completed buffers:
         1388 ( 67.4%) by conc RS threads.
          670 ( 32.6%) by mutator threads.
  Conc RS threads times(s)
          6.91     6.74     0.04
  Total heap region rem set sizes = 468K.  Max = 12K.
  Static structures = 6K, free_lists = 43K.
    309242 occupied cards represented.
    Max size region = 3:(O)[0xe1900000,0xe19fff60,0xe1a00000], size = 13K, occupied = 25K.
    Did 0 coarsenings.

edit: (the example also shows some thread times, but this was a run on sparc/solaris just to show that it is possible to get nonzero numbers for mutator processing; the code to retrieve thread times on linux is missing completely)
                                     
2013-05-06
I agree that os::elapsedTime() is better than nothing. I do not see why not try getrusage() though first.

Maybe we could always print the summary/accumulated information at VM exit (If G1SummarizeRSetStats is enabled of course)
I.e. if period flag is 0 -> only summary at VM exit, and if the period is >= 0; print every period'th GC the output for the current period, plus an accumulated summary at the end

What do you think?
                                     
2013-05-07
URL:   http://hg.openjdk.java.net/hsx/hotspot-gc/hotspot/rev/e72f7eecc96d
User:  brutisso
Date:  2013-06-03 13:14:26 +0000

                                     
2013-06-03
URL:   http://hg.openjdk.java.net/hsx/hsx25/hotspot/rev/e72f7eecc96d
User:  amurillo
Date:  2013-06-07 19:51:24 +0000

                                     
2013-06-07
I am removing the critical-request label - this is a P4 and there is no approvals needed for P4/P5 or RFEs.  However the focus should be on P1-P3 at this time of the release.
                                     
2013-06-25
I went through the output with Monica explaining the root causes for the issues raised before the CR was submitted. The information that Monica was looking at came from a third party.

For the overflow I agree with Thomas - the data type should be changed.

For the times - using the current os::elapsedTime() (which is is used to implement os::elapsedVTime() on linux) would be better than what is shown now. It won't be as accurate as gethrvtime() but would probably be sufficient.

I also think this information that's printed should be cleared at the start of a new period. The current default for the period flag is 0. This means that the information is printed once at the end of the Java process. What's a reasonable default?


                                     
2013-05-06



Hardware and Software, Engineered to Work Together