When looking at it with John Cuthbertson, we noticed that ConcurrentG1Refine::clean_up_cache() might miss processing some cards. The code looks like this:
int start_ind = _hot_cache_idx-1;
for (int i = 0; i < _n_hot; i++) {
int ind = start_ind - i;
if (ind < 0) ind = ind + _hot_cache_size;
jbyte* entry = _hot_cache[ind];
if (entry != NULL) {
g1rs->concurrentRefineOneCard(entry, worker_i);
}
}
_hot_cache_size is the max size of the cache, _n_hot is how many entries we've added to it and all the entries we need to process are at the bottom of the array [0, _n_hot). But, the above will miss processing [_hot_cache_idx, _n_hot) and instead process (incorrectly) the very top of the array.