During young GCs, we seem to scan the entire system dictionary as part of the root set scan. For applications with a lot of loaded classes (or a lot of short-lived classes which will only be reclaimed during the next full GC / GC cycle), the system dictionary scanning might take a considerable amount of time.
The reason for scanning the system dictionary during young GCs is quite limited. Each entry in the dictionary points to a class object and its class loader. All class objects reside in the perm gen and only recently allocated class loaders really need to be discovered by the young GC.
It might be worth keeping track of all newly-allocated system dictionary entries with class loaders potentially in the young gen and only scan those during young GCs. When a young GC discovers that the class loader object of a particular entry has been moved to the old gen, then that entry does not need to be treated specially any more, as it does not need to be scanned during subsequent young GCs.
There are a couple of related tables that are iterated over during system dictionary iteration (constraints, placeholders, etc.). Maybe, largest is the dictionary itself, so treating that specially first might give us the biggest gain.
There are a couple of ways to track newly-allocated system dictionary entries: add extra words to the entries to link them (next and prev), or track them in a growable array. Even though we do not really like growable arrays, I think it maybe the best approach here, given that the number of newly-allocated entries should be mostly small.