Even after fixing JDK-8165313, a significant amount of time is spent in the Free Collection Set Serial part of the GC.
E.g. with -Xmx20g -Xms20g -XX:NewSizePercent=65 -XX:MaxNewSizePercent=80 -XX:G1RegionSize=1m on gcbasher, time spent in that phase is still in the range of 20ms (instead of >100ms) with ~14k regions.
Running with analyzer enabled, the work distribution is kind of this (accumulated times of a single run in ms):
G1SerialFreeCollectionSetClosure::doHeapRegion: 107
G1CollectedHeap::free_region 38
TruncatedSeq::add(double) 18
SurvRateGroup::record_surviving_words 9
G1COntiguousSpace::used 2
(I kind of think that the TruncatedSeq::add() measurements were part of SurvRateGroup::record_surviving_words(), but somehow did not get attributed correctly)