United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
JDK-6990419 : CMS: Remaining work for 6572569: consistently skewed work distribution in (long) re-mark pauses

Details
Type:
Bug
Submit Date:
2010-10-07
Status:
Resolved
Updated Date:
2014-09-04
Project Name:
JDK
Resolved Date:
2013-07-25
Component:
hotspot
OS:
generic,linux_redhat_5.2
Sub-Component:
gc
CPU:
x86,generic
Priority:
P3
Resolution:
Fixed
Affected Versions:
hs20,6u21
Fixed Versions:
hs25 (b44)

Related Reports
Backport:
Backport:
Backport:
Backport:
Backport:
Backport:
Relates:
Relates:

Sub Tasks

Description
For SR information, please consult the SR list for 6572569.
6572569 was used for fixing a bug in CMSScavengeBeforeRemark (for JDK 7) and
for backporting the CMSScavengeBeforeRemark feature to older releases which
latter subCR's were closed as "Fix Delivered".

There is however work remaining to be done to fully address this bug, which
was deferred at that time and CR 6572569 kept open for that reason. However,
for process reasons we should not be overloading one CR for these multiple
pieces of work, so the balance of the work planned under that CR is
being brought forward into this CR. I have copied verbatim the contents of
the Suggested Fix and Evaluation sections of the old CR, so they are
recorded here for if/when we get to it.

                                    

Comments
WORK AROUND

-XX:+CMSScavengeBeforeRemark partially mitigates the issue, but is not enabled by
default and can leave performance on the table under some circumstances because
of running with an Eden that is smaller than what might otherwise have been
optimal for the application.
                                     
2010-10-07
EVALUATION

Transferred verbatim from CR 6572569:
=====================================

The heap shape and workload are such that a CMS cycle starts and
finishes between two scavenges. Under these circumstances it
is possible for the Eden space parallelization to not work very
well. This can be partially worked around by means of
-XX:+CMSScavengeBeforeRemark.

Other heuristics to deal with this are also possible and will
be investigated while we await customer feedback on the efficacy
of +CMScavengeBeforeRemark in their case.

...

One simple approach towards fixing this problem is to not operate the
phase timeout until at least one scavenge suring the phase, i.e.
something along the lines of:

    if (time_spent_in_phase > MAX(max_default,2*recent_inter_scavenge_duration)
        && at_least_one_scavenge_during_phase)
    then abort_phase.

We should see if one of the customers (or a suitable in-house configuration)
can test/verify the efficacy of such a heuristic across a range of
conditions.
                                     
2010-10-07
SUGGESTED FIX

Transferred verbatim from CR 6572569:
=====================================

One simple approach towards fixing this problem is to not operate the
phase timeout until at least one scavenge suring the phase, i.e.
something along the lines of:

    if (time_spent_in_phase > MAX(max_default,2*recent_inter_scavenge_duration)
        && at_least_one_scavenge_during_phase)
    then abort_phase.

We should see if one of the customers (or a suitable in-house configuration)
can test/verify the efficacy of such a heuristic across a range of
conditions.

*** (#1 of 1): [ UNSAVED ] ###@###.###
                                     
2010-10-07
An openjdk contribution has been received for this bug.
                                     
2013-07-18
URL:   http://hg.openjdk.java.net/hsx/hotspot-gc/hotspot/rev/7b06ae405d7b
User:  jmasa
Date:  2013-07-25 16:34:59 +0000

                                     
2013-07-25
URL:   http://hg.openjdk.java.net/hsx/hsx25/hotspot/rev/7b06ae405d7b
User:  amurillo
Date:  2013-08-02 14:01:30 +0000

                                     
2013-08-02
The fix to do better load  balancing of parallel work.  Bad load balancing happened depending on how many young collections occurred during a particular phase of CMS (precleaning phase).  Reliably detecting that this change fixes bad load balancing is difficult (hard to know when the bad load balancing would have occurred).
                                     
2013-08-06



Hardware and Software, Engineered to Work Together