During some large benchmark runs - notably the ATG CRM demo - some inconsistencies were seen inthe PrintGCDetails output between the global Pallel Other time (the interval from when all workers are started to when all workers have finished) and the per-worker parallel time (the duration of the worker's work() method). The timings were consistently around 35-37ms.
We suspect this is because some destructors are executed at the end of the work() method and outside the timing scope. We should fix that and make sure that those two measurements are consistent.
Stack allocate the DirtyCardToOopClosure instance in ScanRSClosure::scanCard.
An instance of a DirtyCardToOopClosure was being resource allocated for every card scanned during RSet scanning. While freeing the Chunks to the ChunkPools, upon execution of the ResourceMark desctructor, the GC worker threads were contending on the pthread mutex that is utilized by the ThreadCritical object.