JDK-8062128 : Dramatic difference between UseConcMarkSweepGC and UseG1GC
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 8u25
  • Priority: P4
  • Status: Open
  • Resolution: Unresolved
  • OS: windows_2008
  • CPU: x86
  • Submitted: 2014-10-18
  • Updated: 2024-09-25
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdUnresolved
Related Reports
Relates :  
Relates :  
Relates :  
Relates :  
Description
A DESCRIPTION OF THE REQUEST :
There is a dramatic divergence in performance between ConcMarkSweep and G1 garbage collection options for large lists as per the attached example.


JUSTIFICATION :
Such a dramatic change in garbage collection performance can lead to improper configuration on machines, leading to decreased performance.

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
ConcMarkSweep and G1 be more reasonably aligned in the program execution times. I would expect the execution times to be approximately within the same order of magnitude.
ACTUAL -
Notice that the first iteration of the loop takes 8.5 seconds with ConcMarkSweep, and a whopping 215 seconds with G1!


Running on my machine with various options produces the below:

Options: -Xmx8G -verbose:gc -XX:+UseConcMarkSweepGC

[GC (Allocation Failure)  4197K->489K(253440K), 0.0012620 secs]
[Full GC (Allocation Failure)  489K->475K(253440K), 0.0023629 secs]
[GC (Allocation Failure)  332635K->358164K(515652K), 0.9329661 secs]
[GC (CMS Initial Mark)  358164K(515652K), 0.0043027 secs]
[GC (Allocation Failure)  428180K->463104K(533700K), 0.4298466 secs]
[GC (Allocation Failure)  533120K->568127K(638532K), 0.4589084 secs]
[GC (Allocation Failure)  638143K->673149K(743556K), 0.5026516 secs]
[GC (Allocation Failure)  743165K->778172K(848580K), 0.4950669 secs]
[GC (Allocation Failure)  848188K->883194K(953604K), 0.4370889 secs]
[GC (Allocation Failure)  953210K->988217K(1058628K), 0.4436002 secs]
[GC (Allocation Failure)  1058233K->1129204K(1199940K), 0.4727909 secs]
[GC (Allocation Failure)  1199220K->1243104K(1313988K), 0.4291697 secs]
[GC (Allocation Failure)  1313120K->1348127K(1419012K), 0.4385637 secs]
[GC (Allocation Failure)  1418143K->1453236K(1524228K), 0.4223058 secs]
[GC (Allocation Failure)  1523252K->1558259K(1629060K), 0.4296635 secs]
[GC (Allocation Failure)  1628275K->1663281K(1734084K), 0.4225988 secs]
[GC (Allocation Failure)  1733297K->1768347K(1839300K), 0.4338428 secs]
[GC (Allocation Failure)  1838363K->1873370K(1944324K), 0.4220629 secs]
[GC (Allocation Failure)  1876171K->1886424K(1957188K), 0.0878145 secs]
[Full GC (Allocation Failure)  1886424K->1835652K(1957188K), 5.6155671 secs]
[GC (CMS Initial Mark)  2237047K(3366112K), 0.0003521 secs]
67108864|8531
[GC (Allocation Failure)  2501508K->2605004K(3366112K), 1.1743133 secs]
[GC (Allocation Failure)  2877644K->3100149K(3373216K), 1.4558667 secs]
[GC (Allocation Failure)  3214657K->3269438K(3542368K), 0.8864740 secs]
33554432|4012
[GC (Allocation Failure)  3542078K->3582163K(3855592K), 0.8005835 secs]
[GC (Allocation Failure)  3838444K->3924795K(4197736K), 1.3208573 secs]
16777216|1925
[GC (Allocation Failure)  4191172K->4170552K(4443496K), 0.9668430 secs]
8388608|844
4194304|393
2097152|175
1048576|70
524288|29
262144|11
131072|5
65536|1
32768|1
16384|1
8192|2
4096|1
2048|1
1024|0
512|0
256|0
128|0
64|0
32|0
16|0
8|0
4|0
2|0




Options: -Xmx8G -verbose:gc -XX:+UseG1GC

[GC pause (G1 Humongous Allocation) (young) (initial-mark) 1433K->592K(256M), 0.0012598 secs]
[GC concurrent-root-region-scan-start]
[GC concurrent-root-region-scan-end, 0.0009750 secs]
[GC concurrent-mark-start]
[GC concurrent-mark-end, 0.0009256 secs]
[GC remark, 0.0006834 secs]
[GC cleanup 257M->257M(512M), 0.0005585 secs]
[GC pause (G1 Evacuation Pause) (young) 270M->269M(512M), 0.0311023 secs]
[GC pause (G1 Evacuation Pause) (young) 291M->295M(1024M), 0.0530180 secs]
[GC pause (G1 Evacuation Pause) (young) 341M->343M(2048M), 0.1020062 secs]
[GC pause (G1 Evacuation Pause) (young) 437M->444M(3278M), 0.2061966 secs]
[GC pause (G1 Evacuation Pause) (young) 592M->596M(4262M), 0.3261134 secs]
[GC pause (G1 Evacuation Pause) (young) 786M->794M(5048M), 0.4208658 secs]
[GC pause (G1 Evacuation Pause) (young) 1018M->1021M(5678M), 0.4913003 secs]
[GC pause (G1 Evacuation Pause) (young) 1271M->1277M(6182M), 0.5466667 secs]
67108864|215648
[GC pause (G1 Evacuation Pause) (young) 2061M->2064M(6584M), 1.1163806 secs]
33554432|72415
[GC pause (G1 Evacuation Pause) (young) 2608M->2612M(6584M), 2.0054681 secs]
16777216|22609
[GC pause (G1 Evacuation Pause) (young) 3026M->3031M(6584M), 1.6127237 secs]
[GC pause (G1 Humongous Allocation) (young) (initial-mark) 3119M->3123M(6584M), 0.4395586 secs]
[GC concurrent-root-region-scan-start]
[GC concurrent-root-region-scan-end, 0.1803907 secs]
[GC concurrent-mark-start]
[GC concurrent-mark-end, 0.8080920 secs]
[GC remark, 0.0020432 secs]
[GC cleanup 3172M->240M(6584M), 0.0300690 secs]
[GC concurrent-cleanup-start]
[GC concurrent-cleanup-end, 0.2658931 secs]
8388608|6026
4194304|573
2097152|195
1048576|82
524288|35
262144|16
131072|4
65536|3
32768|1
16384|0
8192|0
4096|0
2048|0
1024|0
512|0
256|0
128|0
64|0
32|0
16|0
8|0
4|0
2|0



---------- BEGIN SOURCE ----------
public class DelayInducer {

    public static void main(String[] inarr) {
        final Integer numTimes = 1;
        final Integer bitSize = 26;
        final Random rand = new Random();
        List<Integer> lengths = new ArrayList<>(bitSize);
        for (int i = bitSize; i > 0; --i) {
            lengths.add(1 << i);
        }
        for (Integer n : lengths) {
            final List<Integer> newList = new ArrayList<>(n);
            IntStream.range(0, n + 1).forEach((int i) -> {
                newList.add(i);
            });
            Instant start = Instant.now();
            for (int j = 0; j < numTimes; ++j) {
                for (int i = 0; i < newList.size(); ++i) {
                    int sizeLeft = newList.size() - i;
                    int shuffleWith = rand.nextInt(sizeLeft) + i;
                    Collections.swap(newList, i, shuffleWith);
                }
            }
            Duration d = Duration.between(start, Instant.now());
            System.out.println(String.format("%d|%d", n, d.toMillis()));
        }
    }
}
---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
use ConcMarkSweep


Comments
It has been shown that the problem (for this microbenchmark) is the concurrent refinement. CMS Baseline: 5893 ms. Parallel Baseline: 4962 ms. G1 Baseline: 15867 ms. Throughput remember sets helps (jdk8u20-throughputrset) (execution time goes down to 4559 ms) Disabling concurrent refinement helps (execution time goes down to 4312 ms) There is an enhancement for adding a flag for disabling concurrent refinement: JDK-8134303
03-11-2015

The example program for this problem is ArraysSort.java, which is very similar to DelayInducer
03-08-2015

Part of the performance difference might be JDK-8131668 where TLABs end up contending on the freelist_lock when trying to reload TLABs.
03-08-2015

The original Webbug was reported as an RFE and we agree that this is something that should be improved. Moving this to 9.
26-11-2014