JDK-8166899 : Deferred card marking of large objArrays generates lots of unnecessary work
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 9
  • Priority: P3
  • Status: Open
  • Resolution: Unresolved
  • Submitted: 2016-09-29
  • Updated: 2024-10-11
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdUnresolved
Related Reports
Relates :  
Relates :  
Relates :  
Description
In the current scheme to reduce initial card marks (-XX:+ReduceInitialCardMarks) G1 marks entire objArrays as dirty. I.e. the current code pushes and enqueues the entire array as a sequence of cards.

This wastes a lot of time and space at every stage (splitting it into a sequence of cards, time to do this, space to do this, and then for every card have the given overhead to find the object start etc). It also adds latency during the gc pause due to handling of the most recent "deferred card marks".

Following options:
1) improve the deferred card mark algorithm to not have that much overhead, or that that overhead is not always deferred until the start of the gc pause; one option would be to disable the mechanism for (large) objArrays
2) improve the mechanism to invalidate large areas of memory. At the moment it is very time and space intensive. Note that even with option 1), this may be interesting for cases when these large amounts of cards are not generated by the deferred card marking algorithm.
Comments
The main problem persists: pushing (large) array's dirty cards card by card (and then immediately merging them back onto the RS) is slow and resource intensive. Better integration with merging these cards onto the RS might improve the performance quite a bit.
21-05-2024

Re-evaluate if this is still an issue.
15-05-2024

The code for enqueuing large areas into the DCQ itself is also very bad. Obvious shortcuts for filtering out young gen objects are not taken. E.g. - an object < heap region size must be contained within a single region, so a single check on the region type is sufficient to determine whether we actually need to do the work (instead of scanning the card table from start to end). - objects can not span generations, so a single check for that filtering is sufficient. Instead, the code even checks for is-young gen even after it determined the object is not. - not sure what the probability of a sufficiently large area is to contain a significant amount of already dirty cards (e.g. the HCC is like 1k entries for the entire heap) Improvements could enqueue batches and/or ranges of cards instead single cards.
03-10-2016

Another trivial optimization: for humongous non-objArrays, we only need to mark its header, as G1 expects that card marks for such objects are imprecise anyway. This will reduce the effort for enqueuing such objects drastically.
03-10-2016