The fix for JDK-7145569 introduced work distribution of nmethod scanning on a per region basis: nmethods are "attached" to the regions they have references into.
While the fix helps lowering the maximum scan time, in large applications code root marking is very unbalanced: a few regions contain most of the nmethods that need to be scanned (see nmethod_distribution.png), and as the work is distributed on a per-region basis typically one or two threads take much longer than the others.
See maximum/avg times for code root marking in the attached figure (code_root_marking_avg_max.png).
This still negatively impacts overall gc pause times significantly.