Bug ID: JDK-8087198 G1 card refinement: batching, sorting

Versions (Unresolved/Resolved/Fixed)

The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.

JDK 14
14 b25Fixed

Basically we currently process cards one by one even though they always
come in groups of at the very least G1UpdateBufferSize. The cards may or
may not follow a particularly cache friendly order, depending on how lucky
we are, and need to do things like seek the start of the card quite often,
as well as cutting arrays at the card intervals, even though many times
the batches of cards could be linearly scanned and follow a contiguous
order.

It would be great if the batches could be scanned as a batch instead of one by one in a more efficient manner by sorting cards by address, collapsing adjacent cards and prefetching the next cards to be processed.

URL: https://hg.openjdk.java.net/jdk/jdk/rev/bd9dba789919 User: manc Date: 2019-11-23 01:07:02 +0000

23-11-2019

I have implemented an updated version based on the webrev in http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2015-June/013798.html. Implementing this RFE first would help implementing epoch synchronization for JDK-8226731, making the other patch smaller. Major differences from the 2015 webrev: - New version does not save the MemRegions for the cards in a buffer. I noticed considerable memory overhead with BigRamTester if we save the MemRegions. - New version handles SuspendibleThreadSetJoiner::should_yield() in a more timely fashion. Instead of forcing refining all buffered cards, the new version can abandon the buffered cards. - New version only batches and sorts the cards, not joining and prefetching. I have not investigated whether joining and prefetching help much. I think it is OK to investigate them in a separate RFE later. I tested performance with DaCapo and BigRamTester. DaCapo mostly looks neutral, because there is not much concurrent refinement work. BigRamTester sees some improvement in pause time, and shows that sorting the cards actually helps reduce CPU time of the concurrent refinement threads. I attached the result for BigRamTester.

12-11-2019

RFR thread starts at http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2015-June/013798.html

20-09-2016

Blocks :	JDK-8233990 - G1 card refinement: joining, prefetching
Relates :	JDK-8166899 - Deferred card marking of large objArrays generates lots of unnecessary work
Relates :	JDK-8226731 - Remove StoreLoad in G1 post barrier