JDK-8004687 : G1: Parallelize object self-forwarding and scanning during an evacuation failure
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 8,9
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2012-12-07
  • Updated: 2015-09-10
  • Resolved: 2015-07-23
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 9
9 b77Fixed
Related Reports
Relates :  
Relates :  
Description
During an evacuation pause that experiences an evacuation failure, a GC thread first tries to copy an object. The thread first attempts to allocate space for the object. When this fails we have an evacuation failure and the thread enters the code that handles such a failure.

The evacuation failure handling code first attempts to atomically forward the object to itself (in case another thread was attempting to copy the object at the the same time). If successful the thread grabs a lock and installs it's own data data structures into some global fields. While holding the lock the thread pushes the failed object to a global refs_to_scan_stack and drains this stack.

Draining the global refs_to_scan_stack involves popping the failed object, scanning its reference fields and applying a copy closure specialized for evacuation failure to the referenced objects. This specialized closure will attempt to copy the referenced object and re-enter the evac failure handling code again - which pushes the referenced object.

As a result the objects that are reachable from the original failed object are self-forwarded and scanned by the thread while holding the lock. Meanwhile other threads that have their own failed copy are waiting to acquire the lock in the evac failure handling code.

Hence the evacuation failure handling mechanism is effectively serialized. The code that pushes the self-forwarded object is executed by multiple threads but not in parallel.

This contributes to the excessive object copying times seen when an evacuation failure occurs.
Comments
Implemented suggestion 2). Actually the entire evacuation closure can be removed as it does the same thing as G1ParScanClosure. Even without skipping the object copying code, the change will remove the whole additional locking and improve code maintenance by reducing the required code significantly. For this reason, the object copying code skipping will be implemented in a later CR.
14-07-2015

Couple of ideas on how to do this: 1. Remove the global pointer _evac_failure_closure from G1CollectedHeap; Remove the global _evac_failure_scan_stack from G1CollectedHeap; Remove the global _evac_failure_scan_stack push and drain operations from G1CollectedHeap; Remove the routines init_for_evac_failure() and finalize_evac_failure() from G1CollectedHeap; Add equivalents to the above to the G1ParScanThreadState structure: * add the evac_failure_scan_stack field * add a push operation * add a drain operation. The G1ParScanThreadState constructor will allocate the initial evac_failure_scan_stack. This is the thread-specific version of init_for_evacuation_failure(). Allocating the evac failure scan stack could be done lazily when we handle the first evacuation failure - but that means checking if the evac failure scan stack is non-null when processing every object. The G1ParScanThreadState destructor will perform equivalent behavior to finalize_evac_failure() - namely verify that the evac failure scan stack is either null or empty. The push operation simply pushes the current oop on to the evac_failure_scan_stack. The drain operation pops objects off the evac_failure_scan_stack and applies the evacuation failure copy closure (from the PSS itself) to the reference fields in the popped object. The evacuation failure handling code is then accessing the data and operations from the thread-specific PSS without having to take the lock (except where it has to preserve the object header - but that will go away when 8003235 is addressed. Pluses: + Matches existing semantics - but using thread-local (or thread specific data) + Should be low space overhead (a few pointers and a growable array that starts out at 40 entries long) per worker thread. Minuses: - Uses Growable arrays - which can grow very large since expansion involves doubling the capacity and copying the contents (i.e. slow). - No stealing. We would have to implement our own stealing strategy to maximize throughput 2. An alternative approach might be to move the evacuation handling mechanism into the copy closures. Hence if we failed to allocate space for the object we would not try to copy it, but we would use the embedded G1ParScanClosure to scan the fields referenced by the object we failed to copy. Pluses: + Reuses a lot of existing code. + No special evacuation failure scan stack - should be able to use the ref to scan work queue. + We get stealing for free. Minuses - Need an additional template parameter to "skip" object copying code. In either approach there will be some amount of refactoring.
11-12-2012