Bug ID: JDK-8004687 G1: Parallelize object self-forwarding and scanning during an evacuation failure

Type: Enhancement
Component: hotspot
Sub-Component: gc
Affected Version: 8,9

Priority: P4
Status: Resolved
Resolution: Fixed

Submitted: 2012-12-07
Updated: 2015-09-10
Resolved: 2015-07-23

JDK 9
9 b77Fixed

During an evacuation pause that experiences an evacuation failure, a GC thread first tries to copy an object. The thread first attempts to allocate space for the object. When this fails we have an evacuation failure and the thread enters the code that handles such a failure.

The evacuation failure handling code first attempts to atomically forward the object to itself (in case another thread was attempting to copy the object at the the same time). If successful the thread grabs a lock and installs it's own data data structures into some global fields. While holding the lock the thread pushes the failed object to a global refs_to_scan_stack and drains this stack.

Draining the global refs_to_scan_stack involves popping the failed object, scanning its reference fields and applying a copy closure specialized for evacuation failure to the referenced objects. This specialized closure will attempt to copy the referenced object and re-enter the evac failure handling code again - which pushes the referenced object.

As a result the objects that are reachable from the original failed object are self-forwarded and scanned by the thread while holding the lock. Meanwhile other threads that have their own failed copy are waiting to acquire the lock in the evac failure handling code.

Hence the evacuation failure handling mechanism is effectively serialized. The code that pushes the self-forwarded object is executed by multiple threads but not in parallel.

This contributes to the excessive object copying times seen when an evacuation failure occurs.

Implemented suggestion 2). Actually the entire evacuation closure can be removed as it does the same thing as G1ParScanClosure. Even without skipping the object copying code, the change will remove the whole additional locking and improve code maintenance by reducing the required code significantly. For this reason, the object copying code skipping will be implemented in a later CR.

14-07-2015

Couple of ideas on how to do this: 1. Remove the global pointer _evac_failure_closure from G1CollectedHeap; Remove the global _evac_failure_scan_stack from G1CollectedHeap; Remove the global _evac_failure_scan_stack push and drain operations from G1CollectedHeap; Remove the routines init_for_evac_failure() and finalize_evac_failure() from G1CollectedHeap; Add equivalents to the above to the G1ParScanThreadState structure: * add the evac_failure_scan_stack field * add a push operation * add a drain operation. The G1ParScanThreadState constructor will allocate the initial evac_failure_scan_stack. This is the thread-specific version of init_for_evacuation_failure(). Allocating the evac failure scan stack could be done lazily when we handle the first evacuation failure - but that means checking if the evac failure scan stack is non-null when processing every object. The G1ParScanThreadState destructor will perform equivalent behavior to finalize_evac_failure() - namely verify that the evac failure scan stack is either null or empty. The push operation simply pushes the current oop on to the evac_failure_scan_stack. The drain operation pops objects off the evac_failure_scan_stack and applies the evacuation failure copy closure (from the PSS itself) to the reference fields in the popped object. The evacuation failure handling code is then accessing the data and operations from the thread-specific PSS without having to take the lock (except where it has to preserve the object header - but that will go away when 8003235 is addressed. Pluses: + Matches existing semantics - but using thread-local (or thread specific data) + Should be low space overhead (a few pointers and a growable array that starts out at 40 entries long) per worker thread. Minuses: - Uses Growable arrays - which can grow very large since expansion involves doubling the capacity and copying the contents (i.e. slow). - No stealing. We would have to implement our own stealing strategy to maximize throughput 2. An alternative approach might be to move the evacuation handling mechanism into the copy closures. Hence if we failed to allocate space for the object we would not try to copy it, but we would use the embedded G1ParScanClosure to scan the fields referenced by the object we failed to copy. Pluses: + Reuses a lot of existing code. + No special evacuation failure scan stack - should be able to use the ref to scan work queue. + We get stealing for free. Minuses - Need an additional template parameter to "skip" object copying code. In either approach there will be some amount of refactoring.

11-12-2012

Relates :	JDK-8003237 - G1: Reduce unnecessary (and failing) allocation attempts when handling an evacuation failure
Relates :	JDK-8003235 - G1: Parallelize displaced header restoration during evacuation failures