JDK-8158871 : Long response times with G1 and StringDeduplication
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 8u92,9
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2016-06-06
  • Updated: 2018-06-21
  • Resolved: 2016-06-27
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 8 JDK 9
8u112Fixed 9 b127Fixed
Description
Using JDK 8u92 with -XX:+UseG1GC -XX:+UseStringDeduplication -Xms13g -Xmx13g,
several observations were made:

1) During load testing application response times are significantly impacted
(increase to 14 sec response times; normal response times are 1 sec) during
the next time string deduplication runs AFTER a concurrent g1 marking cycle;
further that particular string deduplication takes 14 sec to run.

2) All other string deduplications appear normal with durations in the <0.2
sec range.

3) The impact can be lessened by increasing parallel gc threads however it is
still quite significant (14 sec impact w/16 threads, 3 sec impact w/83
threads).

4) The impact occurs 100% of the time just after a concurrent g1 cycle when
the next string deduplication runs.

When String Deduplication was removed, application response times are normal
throughout the load test even after g1 concurrent cycles.


Comments
Currently testing a patch, which does the cache trimming without the need to hold the StringTable_lock and only temporarily joins the SuspendibleThreadSet to bulk fetch entries to delete.
08-06-2016

I wrote a test to reproduce the pattern where lots of deduplicated Sting objects die all at once. On Solaris/Sparc I was able to provoke a 3.7 seconds long cache trimming (freeing several million cache entries). The default malloc()/free() used on Solaris is known for performing poorly in a multi-threaded environments, so other threads in the VM (or JNI threads) were doing malloc()/free() concurrently with the dedup thread one can imagine that this could severely prolong the time to trim the cache.
08-06-2016

Had a look at this. My current theory is that it's the trimming of the table entry cache that is causing the long dedup time and potentially blocking safepoints and String.intern() calls from happening, which in turn would cause application stalls. From the log, you can see that after the first concurrent GC phase about 2GB is old space is declared dead. The following dedup cycle now finds about 5.5million dead strings which are removed from the dedup table. This causes the table entry cache to fill up and later needs to be trimmed by the dedup thread. When trimming the cache the dedup thread grabs the StringDedupTable_lock, which effectively blocks String.intern() calls. Further, the dedup thread is still inside the the SuspendibleThreadSet while trimming, which effectively blocks safepoints to happen. This would normally not be a problem if it wasn't for the fact that we're trimming a very large number of entries (5.5 million), so trimming might take a long time. I will try to write a reproducer for this to see if I can prove my theory.
08-06-2016