JDK-6850869 : G1: RSet "scrubbing" scrubs too much
The Version table provides details related to the release that this issue/RFE will be addressed.
Unresolved : Release in which this issue/RFE will be addressed. Resolved: Release in which this issue/RFE has been resolved. Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.
The RSet scrubbing code is OK, it's the information that it reads that it is malformed.
The RSet scrubbing code takes a bitmap as input, with one bit per region. If the bit for a particular region is 1, then it means that the region has live objects. If the bit for a region is 0, then it means that the region is totally empty and any RSet that points into it should be scrubbed of those entries.
The problem is that the bitmap can be incorrectly set up for continues humongous regions. When the process that iterates over the regions to populate the bitmap comes across a continues humongous region, it checks whether the bit on the bitmap of its corresponding starts humongous region is set. If it is, then the bit for the continues humongous region is also set. The problem is that, the parallel heap region iteration chunks the regions into groups and it's possible for a continues humongous region to be processed before its corresponding starts humongous region. If this happens, we can miss setting the bit of a continues humongous region that does contain a part of a live object and, hence, some RSet entries that point to said object will be scrubbed.
The fix is to ignore continues humongous regions during the iteration (see Evaluation) but set (or not set) their corresponding bits when their corresponding starts humongous region is processed.
When I enble heap verification, I got "Missing RSet Entry" failures in first evacuation pause after a cleanup pause.
Further investigation revealed that verification would pass at the beginning of cleanup and fail at the end of cleanup and, in fact, it would pass before, but fail after, the RSet scrubbing phase of cleanup.
I added instrumentation code to dump the contents of RSets before and after scrubbing and, indeed, the entries in question seem to have been nuked by RSet scrubbing.
I'm investigating further on what could be causing this.