JDK-8246718 : ParallelGC should not check for forward objects for copy task queue
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 15
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2020-06-07
  • Updated: 2024-10-17
  • Resolved: 2020-06-09
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 15
15 b27Fixed
Related Reports
Relates :  
Description
When pushing items into the object copy task queue, ParallelGC has a "fast path" that checks for already forwarded objects. It does not push them into the task queue, but instead fixes them up and does any remembered set processing inline. G1 takes a different approach, instead getting the referenced object for the item and prefetching from that object, then pushes the item on the queue.

Recent measurements show that for ParallelGC, using the approach taken by G1 provides better performance. Either approach seems at least as good as doing nothing at all. Comparing the already forwarded check against prefetch and push without any check, the latter varies from neutral to significantly better than the former, depending on the hardware configuration.

The reason for this difference seems to be (1) the cost of the check is relatively high because it is likely to take a cache miss, and (2) already forwarded objects are uncommon, so the fast path isn't often taken, failing to recover the cost of the check.

Focusing on specjbb2015:

* average fast path rate < 4%

* critical-jOPS improvement for prefetch vs check for forwarded

(1) non-NUMA x64 - no significant difference
(2) x64 2 sockets x 8 cores - 5% improvement
(3) x64 2 sockets x 8 cores (hyperthreading off) - 9.5% improvement
(4) aarch64 - 2.25% improvement 

(3) might not be a production configuration, but is interesting because hyperthreading should mitigate cache misses.

Comments
URL: https://hg.openjdk.java.net/jdk/jdk/rev/474709480635 User: kbarrett Date: 2020-06-09 22:51:37 +0000
09-06-2020