JDK-8057003 : Large reference arrays cause extremely long synchronization times
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 7u80,8u40,9
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2014-09-01
  • Updated: 2020-08-19
  • Resolved: 2016-11-24
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 9 Other
9 b150Fixed openjdk8u272Fixed
Related Reports
Blocks :  
Duplicate :  
Duplicate :  
Relates :  
Description
Large reference arrays cause extremely long synchronization or pause times on both CMS and G1 during marking/pre-cleaning/other tracing related tasks.

The problem is that large reference arrays are not split up into parts during processing, which means that the thread that is currently working on it needs to finish completely before the GC pause can be started.

In G1, run the attached test program with -Xmx14G -XX:+PrintGCDetails -XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1 -XX:+UseG1GC -XX:InitiatingHeapOccupancyPercent=50 to see that synchronizing for the safepoint takes extremely long

         vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
829.726: G1IncCollectionPause             [      21          0              0    ]      [     0     0 30576     0     6    ]  0   

There is an additional problem: after finishing the marking, at least in the example program there is a guaranteed mark stack overflow, indicating that the marking will never make progress. This may be handled separately.
Comments
Thanks, Andrew. Pushed with Contributed-by: maoliang.ml@alibaba-inc.com.
19-08-2020

Approved. As there was a fair bit of work involved here to make this applicable to 8u, I would credit this to the backport author. If they don't have an OpenJDK ID, please use your own ID as committer and a 'Contributed-by' credit.
19-08-2020

Review approval (RFR closed): https://mail.openjdk.java.net/pipermail/jdk8u-dev/2020-July/012179.html
17-07-2020

RFR is still open (no review, yet).
07-07-2020

Fix Request (8u) As the description in the bug, it's a serious G1 issue which leads to 10X longer concurrent mark time and hang the Java threads for seconds. We encounter the problem in several applications. Since G1 is widely used in 8u, we need to fix it. This patch cannot apply to 8u cleanly. RFR: https://mail.openjdk.java.net/pipermail/jdk8u-dev/2020-July/012069.html
07-07-2020

[~ddong] There is no "Fix Request" comment explaining why an OpenJDK 8u backport approval is being requested. Please add that with references to a potential webrev, RFR etc. and then re-apply the label.
01-07-2020

FC Extension Request Justification: This change fixes performance problems during marking through live objects when using G1 on large heaps where there are large arrays of references (j.l.Object). In the current implementation of the marking, processing of these objects are effectively serialized. In case of arrays of hundreds of MB, this delays completion of this liveness analysis for too long, causing 20+mins full gcs after 500+ seconds of trying to finish. With this patch, in combination with JDK-8159422, this marking completes in 15-30s. Risk assessment: Low - while it is a significant change, it has been run many times on benchmarks, also completed test runs. Proposed due date: Nov 20 2016
20-10-2016

FC Extension Request Justification: This change fixes performance problems during marking through live objects when using G1 on large heaps where there are large arrays of references (j.l.Object). In the current implementation of the marking, processing of these objects are effectively serialized. In case of arrays of hundreds of MB, this delays completion of this liveness analysis for too long, causing 20+mins full gcs after 500+ seconds of trying to finish. With this patch, in combination with JDK-8159422, this marking completes in 15-30s. Risk assessment: Low - while it is a significant change, it has been run many times on benchmarks, also completed test runs. Proposed due date: Sep 20 2016
08-07-2016

And there is one issue with the test application: after allocation, clearing the 8G of memory (with compressed oops) delays safepointing. Fixing this would be out of scope for this change though :)
14-06-2016

There is a second synchronization related problem in the existing implementation: the methods to flush the stacks (drain_local/global_queue) do not call regular_clock_call(). I.e. even after splitting reference arrays into parts, these methods tend to delay safepoints.
14-06-2016

Also see the thread "Unexplained long stop the world pauses during concurrent marking step in G1 Collector" on the hotspot-gc-use mailing list. Update: link to thread http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2014-September/002037.html
19-04-2016