JDK-8159422 : Very high Concurrent Mark mark stack contention
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 9
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2016-06-14
  • Updated: 2018-06-21
  • Resolved: 2016-09-15
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 9
9 b139Fixed
Related Reports
Blocks :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Description
Concurrent mark uses a global mark stack to store objects that do not fit the local ones.

Threads are competing for these objects if their local stacks are empty.

Now, the following problems:
  - the number of references that are pushed/popped per operation is 16. Considering that the local mark stacks contain 128k (entries I think), quite a few push/pop operations need to be done to fill/empty this stack (note that we never completely fill/empty the local stack, but we do partial filling/emptying)
  - the global mark stack is protected by a single global lock (that is not even a lock specifically for the mark stack, that lock is shared by other stuff in the GC too!). This means death on machines with 100+ threads, particularly at the "end" of marking when all threads are emptying the global mark stack

Some applications need a mark stack with 1G entries. Above two issues are really problematic in these cases.
Comments
Attached patch fixes all but the ABA problem. I recommend using versioned pointers, since addresses are aligned to page size anyway. All other techniques need additional infrastructure that is not available.
16-08-2016

FC Extension Request Justification: This change fixes performance problems during marking through live objects when using G1 on large heaps where there are large arrays of references (j.l.Object). In the current implementation of the marking, we need to remember objects that still need processing on a global mark stack; accessing this mark stack, i.e. retrieving and pushing work there requires taking a global lock. On large machines, with hundreds of threads, particularly towards the end of marking when local mark buffers have been exhausted, access to this mark stack is a serious performance bottleneck. In case of arrays of hundreds of MB in size, which cause many accesses to this global mark stack, this delays completion of this liveness analysis for too long, causing 20+mins full gcs after 500+ seconds of trying to finish. With this patch, in combination with JDK-8057003, this marking completes in 15-30s. Risk assessment: Low - while it is a significant change, it has been run many times on benchmarks, also completed test runs. Proposed due date: Sep 20 2016
10-08-2016

Another significant advantage of this change will be that remark pause time may likely drop significantly in some cases, as it increases the scalability of marking in general a lot (and might explain the remark pause times of multiple seconds some people are reporting).
14-07-2016