G1 uses the BufferingOopClosure to separate the root iteration from the object copying. The oop*/narrowOop* roots are gathered in a buffer and then bulk processed. By only timing the processing part and not the root iteration, the object copying time can be measured.
Currently, the BufferingOopClosure uses StarTask, from the taskqueue code, to differentiate between oop* and narrowOop* roots. The StarTask uses the least significant bit to mark whether the address contains an oop or a narrowOop. This works for most of the roots, since the addresses are aligned and the LSB is always 0. However, oops can be embedded as immediates in the JITed code and the addresses for these are not necessarily aligned. This prevents us from using StarTasks with oops in the CodeCache.
See this comment from g1_process_strong_roots:
// Walk the code cache/strong code roots w/o buffering, because StarTask
// cannot handle unaligned oop locations.
CodeBlobToOopClosure eager_scan_code_roots(scan_non_heap_roots, true /* do_marking */);
I suggest that we replace the StartTask usage with another implementation that allows the BufferingOopClosures to be used for the CodeCache scanning.