During strong root processing of the GC, the intern string table is considered a single root and the task to scan the entire table is claimed and performed by a single thread.
With some work loads we have seen the scanning of the string table to be the main dominator of the GC and the other worker threads wait in the termination protocol until the thread that successfully claims the string table completes the scan.
A better approach is to have multiple GC worker threads scan the table in parallel - similar to how we use multiple GC worker threads to scan the Java threads' stacks.
The string table is a regular hash table with each bucket being the head of a singly linked list. A simple approach would be to have each participating worker claim a chunk of buckets and scan the strings in those buckets. This still has the potential of having unbalanced scan times if the lengths of the bucket chains are massively unbalanced. Fortunately this does not seem to be a problem for the hash functions and we see a more or less uniform distribution.