United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
JDK-8180048 : Interned string and symbol table leak memory during parallel unlinking

Details
Type:
Bug
Submit Date:
2017-05-09
Status:
Closed
Updated Date:
2017-08-17
Project Name:
JDK
Resolved Date:
2017-05-15
Component:
hotspot
OS:
other
Sub-Component:
gc
CPU:
generic
Priority:
P1
Resolution:
Fixed
Affected Versions:
8,9
Fixed Versions:

Related Reports
Backport:
Backport:
Backport:
Blocks:

Sub Tasks

Description
FULL PRODUCT VERSION :
java version "1.8.0_131"
Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)

java version "9-ea"
Java(TM) SE Runtime Environment (build 9-ea+168)
Java HotSpot(TM) 64-Bit Server VM (build 9-ea+168, mixed mode)

ADDITIONAL OS VERSION INFORMATION :
Linux 2.6.32-642.11.1.el6.x86_64 #1 SMP Wed Oct 26 10:25:23 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux, Red Hat Enterprise Linux Server release 6.8 (Santiago)
Windows 7 Enterprise 64-bit, Service pack 1

A DESCRIPTION OF THE PROBLEM :
Other bugs, for example JDK-8015332, state that G1 only collects unused interned strings when it performs a full GC. In our test case they are not even connected with a full GC, they are not collected at all. Furthermore the maximum metaspace size is not honored, native memory is allocated for the interned strings even after the maximum has been reached.

With the classic concurrent mark sweep the memory is reclaimed correctly, but not with G1.

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Use the Java code below to generate and intern strings and to trigger full garbage collections. Run the program like this:
-------------------
java -XX:NativeMemoryTracking=detail -XX:+UnlockDiagnosticVMOptions -XX:+PrintNMTStatistics -XX:MetaspaceSize=100m -XX:MaxMetaspaceSize=100m -XX:+UseStringDeduplication -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Xloggc:gc-jdk8-marksweep.log -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xms256m -Xmx256m -cp classes StringInterner
java -XX:NativeMemoryTracking=detail -XX:+UnlockDiagnosticVMOptions -XX:+PrintNMTStatistics -XX:+UseG1GC -XX:MetaspaceSize=100m -XX:MaxMetaspaceSize=100m -Xloggc:gc-jdk8-g1.log -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xms256m -Xmx256m -cp classes StringInterner
-------------------
Check the output and search for Symbol. We got:

Symbol (reserved=27549KB, committed=27549KB)
Symbol (reserved=424789KB, committed=424789KB)

Note that the memory for symbols (interned strings etc) is 27M for concurrent mark sweep and 424M for G1 when the maximum metaspace size was configured to be 100M.

Just in case, double-check the gc logs to make sure that there were full gc:s performed. We got four lines similar to:

2017-05-09T10:29:24.738+0000: 24.028: [Full GC (System.gc())  124M->386K(256M), 0.0570296 secs]

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
1) The interned strings should be garbage collected when G1 performs a full gc, so we should get a similar size for Symbol with G1 as we got with concurrent mark sweep.

2) The maximum amount of native memory allocated should never be larger than the configured maximum metaspace size, 100M.
ACTUAL -
The native memory occupied by interned strings continued to grow with G1 passed the maximum metaspace size even though full garbage collections were performed.

REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
public class StringInterner {
    public static volatile String lastString;

    public static void main(String[] args) {
        for (int iterations = 0; iterations < 40;) {
            String baseName = UUID.randomUUID().toString();
            for (int i = 0; i < 1_000_000; i++) {
                lastString = (baseName + i).intern();
            }
            if (++iterations % 10 == 0) {
                System.gc();
            }
            LockSupport.parkNanos(500_000_000);
        }
    }
}

---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
Use another garbage collector, in our case concurrent mark sweep.


                                    

Comments
Verified by test manually by Thomas.
                                     
2017-07-26
Some comment about the second expectation of that test:
- NMT tracks native memory allocated by the VM not managed in Metaspace. Which means that they are not limited by MaxMetaspaceSize in general.
- the issue is that some C++ helper data structure for managing the interned string list leaks in G1; it is shown in the "Symbol" category in NMT output, but that has nothing to do with Metaspace.
                                     
2017-05-11
During unlinking of stale entries from the hash table we move old entries to a free list of entries. This move to the free list is not thread safe, so entries are lost (and not reused in the future) when doing string and symbol table unlinking in parallel.


                                     
2017-05-10
Impact = H (memory leak with interned strings with g1)
Likelihood = M 
Workaround = H (None / use another GC / use   -XX:ParallelGCThreads=1)
HMH => P1
                                     
2017-05-10
Deleted comment
                                     
2017-05-10
Below are the observations with UseConcMarkSweepGC when MetaspaceSize=100m
8u152 ==
 Symbol (reserved=20206KB, committed=20206KB)
                            (malloc=19686KB #2440)
                            (arena=520KB #1)
9 ea b168 ==
 Symbol (reserved=39187KB, committed=39187KB)
                            (malloc=38284KB #5981)
                            (arena=903KB #1)

==
With G1GC when MetaspaceSize=100m
8u152 ==
 Symbol (reserved=794118KB, committed=794118KB)
                            (malloc=793598KB #99179)
                            (arena=520KB #1)
9 ea b168 ==
 Symbol (reserved=813491KB, committed=813491KB)
                            (malloc=812588KB #102769)
                            (arena=903KB #1)
==

Observation with G1GC,  native memory occupied by interned strings continued to grow and full GC is unable to reclaim.
                                     
2017-05-10



Hardware and Software, Engineered to Work Together