United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
Bug ID: JDK-8015237 Parallelize string table scanning during strong root processing
JDK-8015237 : Parallelize string table scanning during strong root processing

Details
Type:
Enhancement
Submit Date:
2013-05-22
Status:
Resolved
Updated Date:
2013-10-04
Project Name:
JDK
Resolved Date:
2013-06-25
Component:
hotspot
OS:
Sub-Component:
gc
CPU:
Priority:
P4
Resolution:
Fixed
Affected Versions:
hs23,hs24,hs25
Fixed Versions:
hs24 (b51)

Related Reports
Backport:
Backport:
Backport:
Backport:
Backport:
Relates:
Relates:

Sub Tasks

Description
During strong root processing of the GC, the intern string table is considered a single root and the task to scan the entire table is claimed and performed by a single thread.

With some work loads we have seen the scanning of the string table to be the main dominator of the GC and the other worker threads wait in the termination protocol until the thread that successfully claims the string table completes the scan.

A better approach is to have multiple GC worker threads scan the table in parallel - similar to how we use multiple GC worker threads to scan the Java threads' stacks.

The string table is a regular hash table with each bucket being the head of a singly linked list. A simple approach would be to have each participating worker claim a chunk of buckets and scan the strings in those buckets. This still has the potential of having unbalanced scan times if the lengths of the bucket chains are massively unbalanced. Fortunately this does not seem to be a problem for the hash functions and we see a more or less uniform distribution. 
                                    

Comments
Exactly as in description.

Ran some performance numbers with Kitchensink to gauge the synchronization overhead:

	0-threads	1-thread	1-thread-cas	1-thread-chunked
Min:	0.200	0.700	0.300	0.300
Max:	6.900	12.200	8.800	11.200
Avg:	0.658	1.637	0.785	0.794

The 0-threads column is the serial code (i.e. a single thread performing the scan without synchronization); the 1-thread column uses a single thread claim one bucket at a time (using an atomic add instruction). The 1-thread-cas column uses a a single thread which claims 20 buckets at a time using a CAS operation. The 1-thread-chunked uses a single thread to claim 20 buckets using an atomic add.

The overheads are very similar but the code for the atomic add case is much simpler.
                                     
2013-05-22
JBS created a backport for the push of these changes into hs25:

JDK-8016915 - Parallelize string table scanning during strong root processing

I've modified this CR to remove the multiple fix versions and will use this CR # to push the changes to hs24.
                                     
2013-06-19
7u40-critical-request justification:

The changes for this CR are needed in 7u40 (hs24) to help eliminate (or reduce) performance issues seen when running FMW with G1.
                                     
2013-06-19
I will approve this - provided that 1) there's SQE-OK and 2) the release team reviews and doesn't have any concerns. Please get SQE to review and mark with SQE-OK if they agree to this fix 
                                     
2013-06-20
is it possible to create regression test here?
                                     
2013-06-20
Any of the existing GC tests with CMS/ParNew or G1 with ParallelGCThreads > 1 should test the code.
                                     
2013-06-20
I just noticed that this is a P4 - you do not need Release Team approval for P4 or P5 bugs.  Please remove the critical-request label so it no longer shows up on the critical request list.
                                     
2013-06-20
URL:   http://hg.openjdk.java.net/hsx/hsx24/hotspot/rev/76e9dce17c31
User:  johnc
Date:  2013-06-25 00:18:18 +0000

                                     
2013-06-25



Hardware and Software, Engineered to Work Together