Currently, we periodically force flushing of SATB queues. This works by activating a flag every 100ms in every thread, which causes that thread to enqueue its SATB buffer the next time it overflows, even if it doesn't meet its threshold after filtering. This is somewhat problematic when a thread does not actually overflow its SATB queue in time. The whole point of the exercise is to try and avoid having too much left-over work when we reach final-mark.
We can do better than that: when concurrent mark is done we can handshake all threads, and let them flush their respective SATB queues, and re-enter concurrent mark loop again, until flushing yields no more work. Experiments show that it usually takes 1-3 flushes to clean out leftover work properly.