JDK-6990419 : CMS: Remaining work for 6572569: consistently skewed work distribution in (long) re-mark pauses
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: hs20,6u21
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • OS: generic,linux_redhat_5.2
  • CPU: generic,x86
  • Submitted: 2010-10-07
  • Updated: 2014-10-15
  • Resolved: 2013-07-25
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 7 JDK 8 Other
7u51Fixed 8Fixed hs25Fixed
Related Reports
Relates :  
Relates :  
Description
For SR information, please consult the SR list for 6572569.
6572569 was used for fixing a bug in CMSScavengeBeforeRemark (for JDK 7) and
for backporting the CMSScavengeBeforeRemark feature to older releases which
latter subCR's were closed as "Fix Delivered".

There is however work remaining to be done to fully address this bug, which
was deferred at that time and CR 6572569 kept open for that reason. However,
for process reasons we should not be overloading one CR for these multiple
pieces of work, so the balance of the work planned under that CR is
being brought forward into this CR. I have copied verbatim the contents of
the Suggested Fix and Evaluation sections of the old CR, so they are
recorded here for if/when we get to it.

Comments
The fix to do better load balancing of parallel work. Bad load balancing happened depending on how many young collections occurred during a particular phase of CMS (precleaning phase). Reliably detecting that this change fixes bad load balancing is difficult (hard to know when the bad load balancing would have occurred).
06-08-2013

An openjdk contribution has been received for this bug.
18-07-2013

EVALUATION Transferred verbatim from CR 6572569: ===================================== The heap shape and workload are such that a CMS cycle starts and finishes between two scavenges. Under these circumstances it is possible for the Eden space parallelization to not work very well. This can be partially worked around by means of -XX:+CMSScavengeBeforeRemark. Other heuristics to deal with this are also possible and will be investigated while we await customer feedback on the efficacy of +CMScavengeBeforeRemark in their case. ... One simple approach towards fixing this problem is to not operate the phase timeout until at least one scavenge suring the phase, i.e. something along the lines of: if (time_spent_in_phase > MAX(max_default,2*recent_inter_scavenge_duration) && at_least_one_scavenge_during_phase) then abort_phase. We should see if one of the customers (or a suitable in-house configuration) can test/verify the efficacy of such a heuristic across a range of conditions.
07-10-2010

WORK AROUND -XX:+CMSScavengeBeforeRemark partially mitigates the issue, but is not enabled by default and can leave performance on the table under some circumstances because of running with an Eden that is smaller than what might otherwise have been optimal for the application.
07-10-2010

SUGGESTED FIX Transferred verbatim from CR 6572569: ===================================== One simple approach towards fixing this problem is to not operate the phase timeout until at least one scavenge suring the phase, i.e. something along the lines of: if (time_spent_in_phase > MAX(max_default,2*recent_inter_scavenge_duration) && at_least_one_scavenge_during_phase) then abort_phase. We should see if one of the customers (or a suitable in-house configuration) can test/verify the efficacy of such a heuristic across a range of conditions. *** (#1 of 1): [ UNSAVED ] ###@###.###
07-10-2010