Blocks :
|
|
Relates :
|
|
Relates :
|
|
Relates :
|
|
Relates :
|
|
Relates :
|
|
Relates :
|
|
Relates :
|
|
Relates :
|
While looking at some more demanding large microbenchmarks (e.g. BigRamtester, 20g heap, 1M regions) enqueueing dirty cards during GC in G1ParScanThreadState::update_rs incurs a significant amount of wait (idle) time. The reason is that enqueuing completed buffers takes a global lock (basically ending up in PtrQueue::handle_zero_index() and PtrQueueSet::enqueue_complete_buffer(); there is also some strange locking/unlocking going on in PtrQueue::locking_enqueue_completed_buffer()). That does not scale beyond a few threads. The problem is harder than it seems because after providing a per-thread DCQS, performance does not improve a lot. The stalling is moved to the malloc() calls done when allocating new DCQ buffers.
|