A customer opened a case saying they have a gc bug. He observes:
1) The increase in average collection time for the new generation over the 4-day period is 176%. If you exclude data from the low-volume day (Sunday), the increase from Monday to Wednesday is 50.3%.
2) The increases do not appear to consistently drop after a CMS collection. The time of a new-generation collection increases (after a CMS collection) 8 out of 13 times.
3) During the time that the average new-generation collection time increased by 50%, the old-generation fragmentation increased by 49%. From Monday to Tuesday, the average new-generation collection times and old-generation fragmentation increased by 28% and 14%. There is not enough information in two data points to draw much of a conclusion here about the correlation, and the relationship may not be linear, either. But it seems reasonable to conclude that fragmentation may be an issue. Note that the time to mark the old generation does not correlate with the old-generation fragmentation at all.
4) After a server restart, during a period of reasonable (~600 chats) activity, the new-generation collection times drop significantly.
After the above comments were written, they tried scheduling a Full GC in the middle of the night, to ensure the old generation is compacted and fragmentation is cleared. After a Full GC, there was no decrease in the time spent in new-generation collection. So the only way to bring the numbers down is to restart the Server instance. This implies that their theory that fragmentation might be connected with the increasing times was not correct.