Other |
---|
tbdUnresolved |
Relates :
|
|
Relates :
|
|
Relates :
|
|
Relates :
|
While migrating our production services from CMS to G1, we found that G1���s complicated write post-barrier incurs considerable CPU cost. Currently the post-barrier is the following for a write ���p.f = q���: if ((p xor q) >> LOG_REGION_BITS != 0) { // if the write crosses region boundary if (q != null) { card_address = &card_table[addr_to_index(p)] if (*card_address != YOUNG) { store_load_fence; if (*card_address != DIRTY) { *card_address = DIRTY; T.dirtyCardQueue.enqueue(card_address); } } } } And for CMS the write barrier is only: card_address = &card_table[addr_to_index(p)] *card_address = DIRTY; The complexity of G1���s write post-barrier is due to the need to support concurrently refinement threads. However, even if user has set -XX:G1ConcRefinementThreads=0, the write post-barrier remains the same. Ideally the write post-barrier could be much simpler if there is no concurrent refinement. This RFE proposes to add a mode to G1 that uses a simplified write post-barrier: if ((p xor q) >> LOG_REGION_BITS != 0) { if (q != null) { card_address = &card_table[addr_to_index(p)] *card_address = DIRTY; } } In this mode, G1 would disable concurrent refinement and per-Java-thread dirty card queue. G1 would need to process all dirty cards during a collection pause. Thus pause time could become longer, but as long as MaxGCPauseMillis is reasonably large with regard to the heap size, G1���s adaptive heuristics should still be able to adjust the young-gen size to meet the pause time goal. This new mode would reduce G1���s CPU usage considerably. It will be particularly helpful for certain types of workloads, e.g.: - Workloads heavily tuned for CMS to minimize old-gen collections, and sensitive to CPU usage; - Workloads that mainly care about throughput and CPU usage; I have implemented a prototype for this mode, and attached some preliminary results.
|