JDK-8322645 : Release Note: Parallel: Precise Parallel Scanning of Large Object Arrays for Young Collection Roots
  • Type: Sub-task
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 21.0.4-oracle,22
  • Priority: P4
  • Status: Resolved
  • Resolution: Delivered
  • Submitted: 2023-12-21
  • Updated: 2024-03-15
  • Resolved: 2024-01-10
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 21 JDK 22
21.0.4-oracleResolved 22Resolved
Description
During a young collection, ParallelGC partitions the old generation into 64kB stripes when scanning it for references into the young generation. These stripes are assigned to worker threads that do the scanning in parallel as work units.

Before this change, Parallel GC always scanned these stripes completely even if only a small part had been known to contain interesting references. Additionally, every worker thread processed the objects that start in that stripe by itself, including parts of objects that extend into other stripes.  This behavior limited parallelism when processing large objects. A single large object, potentially containing thousands of references, had been scanned by a single thread only and in full. This would cause bad scaling due to memory sharing and cache misses in the subsequent long, work stealing phase.

With this change, Parallel GC workers limit work to their stripe and only process interesting parts of large object arrays. This reduces the work done by a single thread for a stripe, improves parallelism, and reduces the amount of work stealing. Parallel GC pauses are now on par with G1 in presence of large object arrays, reducing pause times by 4-5 times in some cases.

Comments
From my POV you can set this release note to Resolved->Delivered now so that it gets reviewed by the doc team and added to the release notes :) Thanks!
10-01-2024

JDK-8322397 is maybe a bit misleading because the behavior of using the mentioned flags did not change; only their interpretation changed slightly. The release note first describes the previous interpretation, then the new one.
10-01-2024

From my POV too. I'll do it just now. Let's see if I can :) Thanks, Richard.
10-01-2024

Sorry, I misread the first paragraph as new behavior. My original version started with the new behavior (following the template JDK-8322397). I thought this would make sense for a release note. Optionally describing how things worked before in a subsequent section. > Maybe that part starting with "Every worker thread processes the objects that > start in that stripe by itself including..." should be moved in the next > paragraph where the old mechanism is described and likely makes more sense > then. Yes sure :)
08-01-2024

Yeah, maybe the sentence is too confusing :) However I am not sure it is right now. I wanted to express that there is no parallelization going on for large objects extending over multiple stripes (if I got that right from the old code that had this comment: // Process a stripe iff it contains any obj-start if (!start_array->object_starts_in_range(cur_stripe_addr, cur_stripe_end_addr)) { continue; } i.e. if there was no object start in a given stripe (e.g. another object from a previous stripe covered it completely), the thread would do no work for that stripe; from that follows that the thread that got the stripe with the humongous object start had to process the whole object by itself. Maybe that part starting with "Every worker thread processes the objects that start in that stripe by itself including..." should be moved in the next paragraph where the old mechanism is described and likely makes more sense then. To make the structure of this note also follow the structure I most of the time use: introduction, background old behavior new behavior impact What do you think?
08-01-2024

"Every worker thread processes the objects that start in that stripe by itself including any part of objects that extend into other stripes." Hm, this sentence does not read right to me. Likely you meant "excluding", right? I'll come up with a suggestion...
08-01-2024

[~rrich]: I did a significant amount of edits which I believe improve the readability quite a bit. Please go over the text again, it could be that I got something wrong.
08-01-2024