There are some simple optimizations we can do do improve the performance of the concurrent marking phase.
- CMOopClosure, which is used to scan objects during marking, is not specialized. By specializing it we will be able to get a nice performance boost. We should also rename it with a G1-specific name (say: G1CMOopClosure) given that its declaration will move to an .hpp file.
- There are a couple of methods in the fast path that will benefit from getting inlined. These are CMTask::deal_with_reference() and CMTask::push()
- We are using the wrong bitmap operations! In the parallel case we are using par_at_put() which in turn calls either par_set_bit() or par_clear_bit(). We should call the latter directly (which will also be inlined, par_at_put() is not). Ditto for at_put() and set_bit() / clear_bit().
- There are places where we can use the slightly more efficient heap_region_containing_raw(), instead of heap_region_containing(), as we know that the address is in the G1 heap.
- When we check whether an object is live or not we should first check whether it's marked on the bitmap and, only if it's not, get its containing region.