Bug ID: JDK-8079274 Long ParNew Pauses When Large Static Object Previously Allocated

JDK-8079274 : Long ParNew Pauses When Large Static Object Previously Allocated

Type: Enhancement
Component: hotspot
Sub-Component: gc
Affected Version: 8u45

Priority: P4
Status: Resolved
Resolution: Duplicate
OS: linux_ubuntu
CPU: x86

Submitted: 2015-04-28
Updated: 2015-11-03
Resolved: 2015-11-03

Versions (Unresolved/Resolved/Fixed)

The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.

JDK 9
9Resolved

Related Reports

Duplicate :

JDK-8081629 - CMS split_block() does not correctly fix up block-offset-table for large blocks

Description

FULL PRODUCT VERSION :
java version "1.8.0_45"
Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)


FULL OS VERSION :
Dev:
Linux jlent 3.2.0-23-generic #36-Ubuntu SMP Tue Apr 10 20:39:51 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

EC2:
Linux ip-10-79-138-239 3.14.35-28.38.amzn1.x86_64 #1 SMP Wed Mar 11 22:50:37 UTC 2015

A DESCRIPTION OF THE PROBLEM :
I have created a simple program which allocates one 500M static object and then allocates many small objects. Some of the small objects references are cached.  What I observe is long and increasing ParNewGC pauses. There are several odd variables:

1) Can not reproduce using a 1G heap.  A 2G heaps (and larger) do trigger the issue.
2) If the size of the static object is (approximately) less than 500,000,00 bytes or greater than 540,000,000 bytes I can not reproduce this issue.
3) Not reproducible with Oracle 6.  Is reproducible with OpenJDK 6, Oracle & OpenJDK 7, and Oracle & OpenJDK 8.

Problem manifested itself as a session timeout when starting Kafka with log compaction enabled.  Kafka log compaction just happens to allocate a 500M static object at startup.

THE PROBLEM WAS REPRODUCIBLE WITH -Xint FLAG: Yes

THE PROBLEM WAS REPRODUCIBLE WITH -server FLAG: Yes

REGRESSION.  Last worked in version 6u45

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
1) Compile the program below.
2) Run it with the following JVM options:

-verbose:gc 
-XX:+PrintGCTimeStamps 
-XX:+PrintGCDetails 
-XX:+UseParNewGC 
-XX:+UseConcMarkSweepGC 
-Xmx2G 
-Xms2G

3) Note the ParNew GC times

EXPECTED VERSUS ACTUAL BEHAVIOR :
0.512: [GC (Allocation Failure) 0.512: [ParNew: 272640K->722K(306688K), 0.0067778 secs] 784640K->512722K(2063104K), 0.0068382 secs] [Times: user=0.02 sys=0.01, real=0.01 secs]
... 15 removed ...
11.678: [GC (Allocation Failure) 11.678: [ParNew: 274869K->2229K(306688K), 1.8754075 secs] 790029K->517662K(2063104K), 1.8754565 secs] [Times: user=7.50 sys=0.00, real=1.87 secs] 
13.591: [GC (Allocation Failure) 13.591: [ParNew: 274869K->2229K(306688K), 2.0347897 secs] 790302K->517934K(2063104K), 2.0348399 secs] [Times: user=8.13 sys=0.00, real=2.04 secs] 
15.663: [GC (Allocation Failure) 15.663: [ParNew^C: 274869K->2229K(306688K), 2.2072114 secs] 790574K->518207K(2063104K), 2.2072618 secs] [Times: user=8.83 sys=0.00, real=2.21 secs] 


REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
import java.util.ArrayList;

public class LongParNewPause {

    static byte[] bigStaticObject;

    public static void main(String[] args) throws Exception {
        int bigObjSize    = args.length > 0 ? Integer.parseInt(args[0]) : 524288000;
        int littleObjSize = args.length > 1 ? Integer.parseInt(args[1]) : 100;
        int saveFraction  = args.length > 2 ? Integer.parseInt(args[2]) : 1000;

        bigStaticObject = new byte[bigObjSize];
        ArrayList<byte[]> holder = new ArrayList<byte[]>();

        int i = 0;
        while (true) {
            byte[] local = new byte[littleObjSize];
            if (i++ % saveFraction == 0) {
                holder.add(local);
            }
        }
    }
}
---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
Configure Kafka to use a log de-dupe buffer outside the "problem" range:

log.cleaner.dedupe.buffer.size=

The real problem is that this issue was very difficult to debug.  Also I am concerned this is just a special case of a more generic issue.

Comments

The long pause is cause by a bug which will be fixed under 8081629. This can be closed as a duplicate.
04-06-2015
Here's what happens: To be able to parallelize the GC work ParNew divides areas that has to be scanned into smaller chunks. Each worker thread is then given a chunk to look at. Area to be scanned divided into chunks: \|----------\|----------\|----------\|----------\|----------\|----------\| When processing a chunk, the worker thread must first find out if there is an object spanning the border of the chunk. To do that the thread will scan from the start of the chunk towards lower addresses to find the start of an object. Object spanning the border between chunks: \|----------\|----------\|-------Ooo\|oo--------\|----------\|----------\| In this example the thread who is given the chunk that starts in the middle of the object will have to scan to the left of the chunk until it finds the capital O where the object starts. If the object is large, this will take time. In the case described in this issue the object is really large and spans several chunks. This means that all worker threads have to scan a long way to find the start of the object several times. This is what consumes the time and causes the long GC pauses. The default size of these chunks is fairly small. To increase it use -XX:ParGCCardsPerStrideChunk. The default is 256, setting it to larger numbers will get rid of the long pause in this example. I went as high as 1m but you need to tune this with your application to see what works the best for you. Setting high values may affect other aspects of your GC experience. It seems that you have found the exact border where the object spans exactly as many chunks as possible. A larger object probably triggers some higher level code that are able to skip scanning of this part of the memory. So, why doesn't this happen with DefNew in 1.6 and 1.7? Well, the answer is that DefNew does not divide the area into chunks since it does not parallelize this work. It simply start from the bottom (left) of the area and scans forward. When it finds an object it can immediately see the size of this object and jump past it to continue scanning. This can be emulated with ParNew as well by setting -XX:ParallelGCThreads=1 and disable the parallelism. Try it out and you will see that the long GC pauses goes away. This will however also affect CMS so I'm not sure if it is a satisfactory solution in your case. This is not something that we can easily change in the ParNew collector. It is this way by design. There may be ways to mark chunks so that they only need to be scanned once and by doing so avoid the repeated scanning that consumes the time here. I'm not sure if that is something we want to do considering that we are actively working on replacing these old collectors with G1. In any case this is not a bug, but would rather be an enhancement request. I'll leave it open for now and bring this case up with the rest of the team to see if it something that we should consider fixing. I doubt such a change would be allowed into 8u60 though since it would be a high risk change with lots of potential new bugs. If it is fixed it will most likely be for JDK 9 or later.
06-05-2015
ILW = Medium (temporary pausetime regression), Low (only for specific use cases), Medium (change allocation pattern, may not always be an option) = P4
06-05-2015
Thank you for the clarification! Now it's more clear that this is a real issue with ParNew. I can reproduce this locally (on OSX) with a slightly different size of the big object: java -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Xmx4G -G LongParNewPause 536870849 100 10 Java HotSpot(TM) 64-Bit Server VM warning: The UseParNewGC flag is deprecated and will likely be removed in a future release 3.993: #0: [GC (Allocation Failure) 3.993: #0: [ParNew: 545344K->2527K(613440K), 0.4250711 secs] 545344K->526817K(4126208K), 0.4252561 secs] [Times: user=2.42 sys=0.01 real=0.43 secs] 4.617: #1: [GC (Allocation Failure) 4.622: #1: [ParNew: 547871K->56537K(613440K), 0.2918433 secs] 1072161K->580827K(4126208K), 0.2974037 secs] [Times: user=1.42 sys=0.00 real=0.29 secs] 5.116: #2: [GC (Allocation Failure) 5.116: #2: [ParNew: 601881K->68096K(613440K), 0.9766083 secs] 1126171K->648717K(4126208K), 0.9768086 secs] [Times: user=4.78 sys=0.17 real=0.97 secs] 6.289: #3: [GC (Allocation Failure) 6.290: #3: [ParNew: 613440K->68096K(613440K), 0.8197730 secs] 1194061K->703088K(4126208K), 0.8202328 secs] [Times: user=4.05 sys=0.00 real=0.82 secs] 7.307: #4: [GC (Allocation Failure) 7.308: #4: [ParNew: 613440K->68096K(613440K), 0.8278834 secs] 1248432K->755794K(4126208K), 0.8285652 secs] [Times: user=4.17 sys=0.01 real=0.83 secs] 8.331: #5: [GC (Allocation Failure) 8.332: #5: [ParNew: 613440K->68096K(613440K), 0.8651425 secs] 1301138K->819509K(4126208K), 0.8657880 secs] [Times: user=4.41 sys=0.00 real=0.87 secs] 9.399: #6: [GC (Allocation Failure) 9.400: #6: [ParNew: 613440K->68096K(613440K), 1.0110336 secs] 1364853K->880159K(4126208K), 1.0116370 secs] [Times: user=5.39 sys=0.01 real=1.01 secs] ... java -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Xmx4G -Xms4G LongParNewPause 536870848 100 10 Java HotSpot(TM) 64-Bit Server VM warning: The UseParNewGC flag is deprecated and will likely be removed in a future release 4.220: #0: [GC (Allocation Failure) 4.221: #0: [ParNew: 545344K->2528K(613440K), 0.4260047 secs] 545344K->526818K(4126208K), 0.4261896 secs] [Times: user=2.40 sys=0.01 real=0.43 secs] 4.848: #1: [GC (Allocation Failure) 4.854: #1: [ParNew: 547872K->56589K(613440K), 0.3017437 secs] 1072162K->580879K(4126208K), 0.3079897 secs] [Times: user=1.49 sys=0.01 real=0.31 secs] 5.359: #2: [GC (Allocation Failure) 5.359: #2: [ParNew: 601933K->68096K(613440K), 17.9330653 secs] 1126223K->648782K(4126208K), 17.9332855 secs] [Times: user=139.42 sys=0.15 real=17.93 secs] 23.492: #3: [GC (Allocation Failure) 23.492: #3: [ParNew: 613440K->68096K(613440K), 0.8217865 secs] 1194126K->702840K(4126208K), 0.8221725 secs] [Times: user=4.01 sys=0.00 real=0.82 secs] 24.510: #4: [GC (Allocation Failure) 24.510: #4: [ParNew: 613440K->68096K(613440K), 0.8355326 secs] 1248184K->755842K(4126208K), 0.8360701 secs] [Times: user=4.19 sys=0.01 real=0.84 secs] ... I'll try to add a few more timers to see where we spend the time.
06-05-2015
Further information from the original reporter: Just read the comment from Jesper Wilhelmsson. Not sure what his fix is, but, his comment makes me think he considers this a tuning issue rather than a real bug. The Bug Report constrained the amount of data I could add to it (size limits on the input fields) and I don't think I picked the best example. Below is a better example showing that reducing the size of the bigStaticObject by one byte can totally destroy the efficiency of the ParNew code for a couple of iterations. I don't think the size of the static object should have any impact on the ParNew collection times - it goes straight to Tenured memory and has no references to New memory. When I decrease the size of the static object from 536870385 to 536870384 bytes: - Second ParNew collection time increases from 0.07 sec to 19.25 secs - Third ParNew collection time increases from 0.04 secs to 50.21 secs That is what I am concerned about and I think tripped up Kafka. I really don't care how the GC performs once the program has run for a while. It is purposely very simplistic. In my current version I actually limit the number of iterations. If I don't limit the loop and just let it run the ParNew eventually stop (and the ParNew last time is not that long) and CMS/Full GCs start running all the time until the heap exhausts. That is expected. Not sure if I have the ability to comment on the bug, but, perhaps you can pass this on to Jesper for me if you think it would help. $ /usr/lib/jvm/java-8-oracle/bin/java -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Xmx4G -Xms4G LongParNewPause 536870385 100 10 0.337: [GC (Allocation Failure) 0.337: [ParNew: 272640K->27713K(306688K), 0.0230783 secs] 796927K->552000K(4160256K), 0.0231369 secs] [Times: user=0.06 sys=0.01, real=0.03 secs] 0.394: [GC (Allocation Failure) 0.394: [ParNew: 300353K->34048K(306688K), 0.0684373 secs] 824640K->586484K(4160256K), 0.0684841 secs] [Times: user=0.09 sys=0.01, real=0.07 secs] 0.496: [GC (Allocation Failure) 0.496: [ParNew: 306688K->34048K(306688K), 0.0332530 secs] 859124K->614019K(4160256K), 0.0332978 secs] [Times: user=0.06 sys=0.02, real=0.04 secs] 0.562: [GC (Allocation Failure) 0.562: [ParNew: 306688K->34048K(306688K), 0.0334176 secs] 886659K->641255K(4160256K), 0.0334716 secs] [Times: user=0.08 sys=0.01, real=0.04 secs] 0.627: [GC (Allocation Failure) 0.627: [ParNew: 306688K->34048K(306688K), 0.0401139 secs] 913895K->670588K(4160256K), 0.0401641 secs] [Times: user=0.10 sys=0.00, real=0.04 secs] 0.702: [GC (Allocation Failure) 0.702: [ParNew: 306688K->34048K(306688K), 0.0461561 secs] 943228K->705488K(4160256K), 0.0461994 secs] [Times: user=0.10 sys=0.01, real=0.05 secs] 0.781: [GC (Allocation Failure) 0.781: [ParNew: 306688K->34048K(306688K), 0.0333572 secs] 978128K->731333K(4160256K), 0.0334229 secs] [Times: user=0.10 sys=0.00, real=0.04 secs] 0.848: [GC (Allocation Failure) 0.848: [ParNew: 306688K->34048K(306688K), 0.0431808 secs] 1003973K->762836K(4160256K), 0.0432255 secs] [Times: user=0.11 sys=0.01, real=0.04 secs] 0.923: [GC (Allocation Failure) 0.923: [ParNew: 306688K->34048K(306688K), 0.0410440 secs] 1035476K->795475K(4160256K), 0.0410885 secs] [Times: user=0.12 sys=0.00, real=0.04 secs] 0.996: [GC (Allocation Failure) 0.996: [ParNew: 306688K->34048K(306688K), 0.0365335 secs] 1068115K->825838K(4160256K), 0.0365772 secs] [Times: user=0.08 sys=0.02, real=0.03 secs] 1.064: [GC (Allocation Failure) 1.064: [ParNew: 306688K->34048K(306688K), 0.0443399 secs] 1098478K->851517K(4160256K), 0.0443830 secs] [Times: user=0.08 sys=0.00, real=0.05 secs] 1.143: [GC (Allocation Failure) 1.143: [ParNew: 306688K->34048K(306688K), 0.0755879 secs] 1124157K->893291K(4160256K), 0.0756329 secs] [Times: user=0.17 sys=0.01, real=0.08 secs] 1.251: [GC (Allocation Failure) 1.251: [ParNew: 306688K->34048K(306688K), 0.0409888 secs] 1165931K->922137K(4160256K), 0.0410516 secs] [Times: user=0.10 sys=0.00, real=0.04 secs] 1.324: [GC (Allocation Failure) 1.324: [ParNew: 306688K->34048K(306688K), 0.0304864 secs] 1194777K->946216K(4160256K), 0.0305517 secs] [Times: user=0.08 sys=0.00, real=0.04 secs] 1.386: [GC (Allocation Failure) 1.386: [ParNew: 306688K->34048K(306688K), 0.0362617 secs] 1218856K->974303K(4160256K), 0.0363361 secs] [Times: user=0.10 sys=0.01, real=0.04 secs] 1.454: [GC (Allocation Failure) 1.454: [ParNew: 306688K->34048K(306688K), 0.0346689 secs] 1246943K->1004688K(4160256K), 0.0347150 secs] [Times: user=0.08 sys=0.00, real=0.04 secs] 1.521: [GC (Allocation Failure) 1.521: [ParNew: 306688K->34048K(306688K), 0.0384377 secs] 1277328K->1029661K(4160256K), 0.0385067 secs] [Times: user=0.10 sys=0.01, real=0.04 secs] 1.594: [GC (Allocation Failure) 1.594: [ParNew: 306688K->34048K(306688K), 0.0609185 secs] 1302301K->1079308K(4160256K), 0.0609655 secs] [Times: user=0.17 sys=0.01, real=0.07 secs] 1.687: [GC (Allocation Failure) 1.687: [ParNew: 306688K->34048K(306688K), 0.0414997 secs] 1351948K->1105610K(4160256K), 0.0415548 secs] [Times: user=0.11 sys=0.01, real=0.04 secs] 1.760: [GC (Allocation Failure) 1.760: [ParNew: 306688K->34048K(306688K), 0.0334070 secs] 1378250K->1132711K(4160256K), 0.0334522 secs] [Times: user=0.08 sys=0.01, real=0.03 secs] ^CHeap par new generation total 306688K, used 143105K [0x00000006c0000000, 0x00000006d4cc0000, 0x00000006d4cc0000) eden space 272640K, 40% used [0x00000006c0000000, 0x00000006c6a80760, 0x00000006d0a40000) from space 34048K, 100% used [0x00000006d0a40000, 0x00000006d2b80000, 0x00000006d2b80000) to space 34048K, 0% used [0x00000006d2b80000, 0x00000006d2b80000, 0x00000006d4cc0000) concurrent mark-sweep generation total 3853568K, used 1098663K [0x00000006d4cc0000, 0x00000007c0000000, 0x00000007c0000000) Metaspace used 2429K, capacity 4486K, committed 4864K, reserved 1056768K class space used 262K, capacity 386K, committed 512K, reserved 1048576K $ /usr/lib/jvm/java-8-oracle/bin/java -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Xmx4G -Xms4G LongParNewPause 536870384 100 10 0.332: [GC (Allocation Failure) 0.332: [ParNew: 272640K->27168K(306688K), 0.0255901 secs] 796927K->551455K(4160256K), 0.0256492 secs] [Times: user=0.06 sys=0.00, real=0.02 secs] 0.393: [GC (Allocation Failure) 0.393: [ParNew: 299808K->34048K(306688K), 19.2543397 secs] 824095K->584454K(4160256K), 19.2543968 secs] [Times: user=55.50 sys=0.26, real=19.25 secs] 19.680: [GC (Allocation Failure) 19.680: [ParNew: 306688K->34048K(306688K), 50.2081711 secs] 857094K->612972K(4160256K), 50.2082153 secs] [Times: user=147.98 sys=0.05, real=50.21 secs] 69.921: [GC (Allocation Failure) 69.921: [ParNew: 306688K->34048K(306688K), 0.0366180 secs] 885612K->640255K(4160256K), 0.0366632 secs] [Times: user=0.10 sys=0.01, real=0.04 secs] 69.989: [GC (Allocation Failure) 69.989: [ParNew: 306688K->34048K(306688K), 0.0363555 secs] 912895K->669850K(4160256K), 0.0364063 secs] [Times: user=0.08 sys=0.01, real=0.03 secs] 70.059: [GC (Allocation Failure) 70.059: [ParNew: 306688K->34048K(306688K), 0.0370623 secs] 942490K->699271K(4160256K), 0.0371055 secs] [Times: user=0.09 sys=0.02, real=0.04 secs] 70.129: [GC (Allocation Failure) 70.129: [ParNew: 306688K->34048K(306688K), 0.0485681 secs] 971911K->730787K(4160256K), 0.0486117 secs] [Times: user=0.08 sys=0.01, real=0.05 secs] 70.211: [GC (Allocation Failure) 70.211: [ParNew: 306688K->34048K(306688K), 0.0574823 secs] 1003427K->769759K(4160256K), 0.0575261 secs] [Times: user=0.11 sys=0.01, real=0.06 secs] 70.300: [GC (Allocation Failure) 70.300: [ParNew: 306688K->34048K(306688K), 0.0370087 secs] 1042399K->796661K(4160256K), 0.0370700 secs] [Times: user=0.10 sys=0.00, real=0.04 secs] 70.370: [GC (Allocation Failure) 70.370: [ParNew^C: 306688K->34048K(306688K), 0.0442745 secs] 1069301K->823489K(4160256K), 0.0443201 secs] [Times: user=0.12 sys=0.01, real=0.04 secs] Heap par new generation total 306688K, used 50406K [0x00000006c0000000, 0x00000006d4cc0000, 0x00000006d4cc0000) eden space 272640K, 6% used [0x00000006c0000000, 0x00000006c0ff9b58, 0x00000006d0a40000) from space 34048K, 100% used [0x00000006d0a40000, 0x00000006d2b80000, 0x00000006d2b80000) to space 34048K, 0% used [0x00000006d2b80000, 0x00000006d2b80000, 0x00000006d4cc0000) concurrent mark-sweep generation total 3853568K, used 789441K [0x00000006d4cc0000, 0x00000007c0000000, 0x00000007c0000000) Metaspace used 2431K, capacity 4486K, committed 4864K, reserved 1056768K class space used 262K, capacity 386K, committed 512K, reserved 1048576K jlent@jlent:~/junk/parnew$
06-05-2015
Please note that you are comparing two different GC algorithms here. 1.6 and 1.7 uses DefNew + CMS, while 1.8+ is using ParNew + CMS. The combination DefNew + CMS was deprecated in JDK 8. See JDK-8046163. Different collectors work well with different applications. Which GC algorithm to choose depends very much on the behavior of the application. Your test program is allocating new data in a loop where it never disposes of any data. This means that it is keeping more and more objects alive over time. The work done by some collector algorithms is proportional to the size of the live data set which implies that the time to collect will increase with more live data. In this particular application ParNew does not perform as well as DefNew did. I tried the test program with the latest G1 (which is the GC we are actively working on today), -XX:+UseG1GC, and the pause times are low and do not increase over time with this GC algorithm.
05-05-2015
Moving across JDK for investigation by dev team.
04-05-2015