JDK-8368935 : G1: Too much refinement activity after JEP 522 causing throughput regression in stress test
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 26
  • Priority: P4
  • Status: Open
  • Resolution: Unresolved
  • Submitted: 2025-09-30
  • Updated: 2025-10-03
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 26
26Unresolved
Related Reports
Relates :  
Relates :  
Description
From mail.openjdk.org/pipermail/hotspot-gc-dev/2025-September/055035.html:

"I'm observing an interesting regression between build 26-ea+16-1649 and 
build 26-ea+17-1764 with respect to concurrent refinement. I suspect the 
cause is JEP 522.

I ran a stress test in which GC pauses are about 25ms in the steady 
state. With 26-ea+16-1649, concurrent refinement is never initiated.

With 26-ea+17-1764 (JEP 522) concurrent refinement starts after the test 
has run for a few minutes. Eventually, concurrent refinement runs back 
to back forever, wasting a CPU core and affecting overall performance."

GC logs before ("baseline") and after ("changes") are attached.
Comments
The application could be described as a single huge reference array that for a few ms references newly allocated objects that are then discarded. E.g. Object[] ref_holder = new Object[<something-large>]; do forever: int index = random(ref_holder.length); ref_holder[index] = new Object(); [.. do something very quickly, or nothing] ref_holder[index] = null; What is important is that old gen does not change, only changes tons of references that are almost immediately cleared, and then nulled out again. This results in the observed behavior, and if the pre-jep522 refinement stays below the refinement threshold (based on load and machine), it does not do any refinement while with jep522 g1 does constant refinement.
01-10-2025

The stress test continuously sets and clears references, causing (useless) card mark activity. The refinement just observes this as changes, clearing the cards over and over. In the previous implementation the refinement buffers and the cards were kept in the refinement buffers (and kept dirty), resulting in no refinement activity at all because the overall amount of dirty cards is kept exactly at a level below the refinement threshold (the 10% default value of G1RSetUpdatingPauseTimePercent, i.e. card scanning takes just a little below 20ms). This is like _the_ worst case for the new refinement compared to the old one. From one of the follow-up emails: Pre JEP 522 no conc refinement: 499 seconds Pre JEP 522 with conc refinement: 492 seconds Post JEP 522 no conc refinement: 488 seconds Post JEP 522 with conc refinement: 518 seconds Results in around 5% less throughput. Possible workarounds: (maybe) increase G1RSetUpdatingPauseTimePercent, disable concurrent refinement (-XX:-G1UseConcRefinement). The latter makes the stress test faster than before (just a little).
30-09-2025