Even after fixing JDK-8165313, a significant amount of time is spent in the Free Collection Set Serial part of the GC.
E.g. with -Xmx20g -Xms20g -XX:NewSizePercent=65 -XX:MaxNewSizePercent=80 -XX:G1RegionSize=1m on gcbasher, time spent in that phase is still in the range of 20ms (instead of >100ms) with ~14k regions.
Running with analyzer enabled, the work distribution is kind of this (accumulated times of a single run in ms):
(I kind of think that the TruncatedSeq::add() measurements were part of SurvRateGroup::record_surviving_words(), but somehow did not get attributed correctly)