JDK-4961036 : ParNew: performance problem with mm6 benchmark
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 5.0
  • Priority: P3
  • Status: Closed
  • Resolution: Duplicate
  • OS: solaris
  • CPU: sparc
  • Submitted: 2003-11-28
  • Updated: 2013-11-01
  • Resolved: 2006-05-31
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 7
7Resolved
Related Reports
Duplicate :  
Description
Name: elR10090			Date: 11/28/2003


I'm filing this bug to emphasize the issue revealed 
by the test:
    nsk/stress/memory/memleak006
and reported with the bug: 
    4956026 Increase the timeout.

While evaluating the test bug 4956026, I've found that 
Tiger-b29, 64-bits Server VM, spends many time in vain 
for needless attempts of collecting garbage. 

I've rewritten the test to make it a bit clearer what 
is the fault scenario. Please see the updated variant 
as the test mm6.java below. The specificy of the test 
mm6 is that it explicitly allocates x[j] elements for 
the huge 2-dimensional array x[][]. 

If the int[][] array x is succesfully allocated with 
a new int[len][] array of int[] references; the test 
mm6 starts the cycle to allocate x[j] for each j with 
a new int[len] array. As the value of len*len is too 
much anyway (millions of gigabytes), this cycle must 
result in OutOfMemoryError for some j'th iteration. 

The point is, the Tiger's VM spends much more time to 
detect the lack of memory that Mantis spends. Here are 
the logs of my runs against Tiger-b29 and Mantis-b25. 
I've used 8-CPU (SPARC/v9) machine with 8Gb of RAM: 

    % time /home/latkin/jdk1.5/bin/java -d64 -Xmx4g -verbose:gc mm6
    i = 0, j=-1, len = 1000000000
    i = 1, j=-1, len = 500000000
    [GC 1953309K->1953269K(2799808K), 388.3033920 secs]
    [GC 1953269K->1953285K(2802880K), 438.2987816 secs]
    [Full GC 1953285K->1953233K(2205504K), 41.0378874 secs]
    [GC 1953233K->1953233K(2802880K), 441.4371139 secs]
    [Full GC 1953233K->1953232K(2205632K), 37.1156034 secs]
    i = 2, j=0, len = 250000000
    [GC 1953355K->1953232K(2808640K), 447.7919364 secs]
    [GC 1953232K->1953232K(2808704K), 425.2679931 secs]
    [Full GC 1953232K->1953232K(2211520K), 41.9634991 secs]
    [GC 1953232K->1953232K(2819968K), 429.7352058 secs]
    [Full GC 1953232K->1953232K(2222848K), 33.4922985 secs]
    i = 3, j=-1, len = 125000000
    [GC 2686119K->2685654K(2819968K), 488.1438908 secs]
    [Full GC 2685654K->2685654K(2819968K), 47.8311575 secs]
    [GC 2685654K->2685654K(2841216K), 397.0494616 secs]
    [Full GC 2685654K->2685654K(2841216K), 45.3557303 secs]
    i = 4, j=1, len = 62500000
    12923.0u 22.0s 1:02:28 345% 0+0k 0+0io 0pf+0w

    % time /home/latkin/jdk1.4.2/bin/java -d64 -Xmx4g -verbose:gc mm6
    i = 0, j=-1, len = 1000000000
    i = 1, j=-1, len = 500000000
    [GC 155K->108K(3520K), 0.0165322 secs]
    [Full GC 108K->108K(3520K), 0.0280476 secs]
    [GC 1953233K->1953233K(1956648K), 0.0790172 secs]
    [Full GC 1953233K->1953233K(1956648K), 63.9245000 secs]
    [Full GC 2929796K->2929796K(4153216K), 34.4460004 secs]
    [Full GC 2929796K->2929795K(4153216K), 41.1711967 secs]
    i = 2, j=1, len = 250000000
    [Full GC 2929811K->2929795K(4153216K), 40.5874700 secs]
    [Full GC 2929795K->2929795K(4153216K), 56.6698979 secs]
    i = 3, j=-1, len = 125000000
    [Full GC 3906374K->3906358K(4153216K), 48.6680474 secs]
    [Full GC 3906358K->3906358K(4153216K), 44.0415935 secs]
    i = 4, j=2, len = 62500000
    247.0u 33.0s 7:35 61% 0+0k 0+0io 0pf+0w

Here I must confess, that my experiment was not completely pure; 
about 20-40% of machine resources were occupied with other users 
applications. Anyway such dramatical difference like 5-10 times 
looks caused by a problem with Tiger performance. 

Following is the source code for the test mm6.java:

    import java.io.*;

    public class mm6 {
        public static void main(String args[]) {
            System.exit(run(args, System.out) + 95); 
        } 
 
        public static int run(String args[], PrintStream out) {
            int lim1 = 5;
            int len = 1000000000;
            int i = -1;
            int [][][] a = new int [lim1][][];

            for (i = 0; i < lim1; i++) {
                int j = -1;
                try {
                    a[i] = new int [len][];
                    for (j=0; j<len; j++)
                        a[i][j] = new int [len];
                    out.println("# FAIL: this is impossible!");
                    return 2;
                } catch (OutOfMemoryError e) {
                    out.println("i = "+i+", j="+j+", len = "+len);
                    if (len > 20) 
                        len /= 2;
                    else 
                        break;
                }
            };
            a = null;
            return 0;
        }
    }
======================================================================
###@###.### 10/9/04 00:20 GMT

Comments
EVALUATION This is a duplicate of 5089317 which was fixed in 5.0_u05.
31-05-2006

CONVERTED DATA BugTraq+ Release Management Values COMMIT TO FIX: dragon
14-06-2004

EVALUATION I can not reproduce this bug. I ran on a 48x400 v9 machine with 15g of ram: cast:/net/ghopper/LocalBuild> /java/re/jdk/1.5/promoted/all/b29/binaries/solaris-sparcv9/bin/java -d64 -XX:+UseSerialGC -Xmx4G -XX:+PrintGC mm6 i = 0, j=-1, len = 1000000000 i = 1, j=-1, len = 500000000 [GC 212K->107K(3520K), 0.1377678 secs] [Full GC 107K->107K(3520K), 0.0428596 secs] [GC 1953232K->1953232K(1956648K), 0.1047516 secs] [Full GC 1953232K->1953232K(1956648K), 37.6026186 secs] [Full GC 2929794K->2929794K(4153216K), 37.5770322 secs] [Full GC 2929794K->2929793K(4153216K), 49.8737898 secs] i = 2, j=1, len = 250000000 [Full GC 2956107K->2929793K(4153216K), 37.5942054 secs] [Full GC 2929793K->2929793K(4153216K), 37.5752928 secs] i = 3, j=-1, len = 125000000 [Full GC 3919858K->3906356K(4153216K), 48.7844412 secs] [Full GC 3906356K->3906356K(4153216K), 47.0462827 secs] i = 4, j=2, len = 62500000 cast:/net/ghopper/LocalBuild> /java/re/jdk/1.5/promoted/all/b29/binaries/solaris-sparcv9/bin/java -d64 -XX:+UseParallelGC -Xmx4G -XX:+PrintGC mm6 i = 0, j=-1, len = 1000000000 i = 1, j=-1, len = 500000000 [GC 1953309K->1953317K(2801152K), 33.4989684 secs] [GC 1953317K->1953317K(2804224K), 33.4065668 secs] [Full GC 1953317K->1953232K(2210304K), 37.5015470 secs] [GC 1953232K->1953232K(2805248K), 33.5422727 secs] [Full GC 1953232K->1953231K(2215424K), 37.4967021 secs] i = 2, j=0, len = 250000000 [GC 1953374K->1953231K(2809344K), 33.4548923 secs] [GC 1953231K->1953231K(2809344K), 33.4132600 secs] [Full GC 1953231K->1953231K(2219520K), 37.5102507 secs] [GC 1953231K->1953231K(2816448K), 34.0660775 secs] [Full GC 1953231K->1953231K(2230720K), 37.5578730 secs] i = 3, j=-1, len = 125000000 [GC 2686020K->2685653K(2817536K), 37.0207628 secs] [Full GC 2685653K->2685653K(2817536K), 47.0408785 secs] [GC 2685653K->2685653K(2828544K), 36.8932572 secs] [Full GC 2685653K->2685653K(2828544K), 54.4039259 secs] i = 4, j=1, len = 62500000 Is it possible your machine was paging? ###@###.### 2003-12-02 Name: elR10090 Date: 12/16/2003 At last, I've got a chance to do some experiments over clean machine -- not loaded by other applications. Results in brief: * indeed, the issue is not reproduced with -XX:+UseSerialGC, * the issue is best shown with -XX:+UseConcMarkSweepGC, * the GC performance issue is reproduced 100% stably for me, * I haven't observed any paging activities during my experients, * there is no 1.4.2 to 1.5 performance degradation, actually -XX:+UseConcMarkSweepGC works twice slower for Mantis than for Tiger. Here are details about the machine configuration: * MEMORY: 8192MB * MODEL: SUNW,Ultra-Enterprise * 8xCPU: SUNW,UltraSPARC-II * CPU: SUNW,ffb * SunOS RELEASE: 5.7 I've used the /usr/openwin/bin/perfmeter tool to see CPU load level, and disk and page activities. The CPU load were about 0% unless the test was running. While running the test, CPU load was 100% for 1xCPU except of the GC phase. During GC phase (actually, while [ParNew ...] message were being printed), all 8xCPUs were 100% loaded. The perfmeter showed paging and disk activities as negligible (about 0) anyway. I've tried the "mm6" test against Mantis-b25 (rc) and Tiger-b30, 64-bits server VM, mixed mode; both with -XX:+UseConcMarkSweepGC option. I've tried the test with heap space 1Gb to 2Gb; so that the test requested only about 1/8 to 1/4 of the installed RAM and thus avoided to provoke machine activities with swap space. Overall test execution time looked as the following: -Xmx512m -Xmx1g -Xmx1536m -Xmx2g ----------------------------------------------- Mantis 1 min 6 min 8 min 33 min Tiger 40 sec 3 min 4 min 17 min ----------------------------------------------- Please note, that the concurrent collector works dramatically slower than the serial collector. It took my machine only about 1 minute to run the test with -Xmx2g with -XX:+UseSerialGC option. Mainly, the time is spent by the concurrent GC for the "ParNew" cycles like e.g.: % time /home/latkin/jdk1.5/bin/java -d64 -XX:+UseConcMarkSweepGC -Xmx2G -XX:+PrintGC -showversion mm6 java version "1.5.0-beta" Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0-beta-b30) Java HotSpot(TM) 64-Bit Server VM (build 1.5.0-beta-b30, mixed mode) i = 0, j=-1, len = 1000000000 i = 1, j=-1, len = 500000000 [ParNew 254K->126K(16320K), 0.0544375 secs] [Full GC 126K->109K(16320K), 0.0609698 secs] [GC 1953234K(1969448K), 0.0009601 secs] [ParNew 1953234K->1953234K(1969448K), 266.1158923 secs] [Full GC 1953234K->1953234K(1969448K), 32.1781718 secs] [Full GC 1953234K->1953233K(2097088K), 33.8078265 secs] i = 2, j=0, len = 250000000 [ParNew 1953886K->1953233K(2097088K), 266.7678709 secs] [Full GC 1953233K->1953233K(2097088K), 24.9683117 secs] [Full GC 1953233K->1953233K(2097088K), 24.9534391 secs] i = 3, j=-1, len = 125000000 [ParNew 1953886K->1953233K(2097088K), 268.1086409 secs] [Full GC 1953233K->1953233K(2097088K), 32.3615685 secs] [Full GC 1953233K->1953233K(2097088K), 25.0150138 secs] i = 4, j=-1, len = 62500000 6469.0u 91.0s 16:58 643% 0+0k 0+0io 0pf+0w Here we see 3 of "ParNew" attempts, each taking about 4 minutes. Analogous attempts also look taking rather much time in the case of lower heap as well, e.g.: % time /home/latkin/jdk1.5/bin/java -d64 -XX:+UseConcMarkSweepGC -Xmx512m -XX:+PrintGC -showversion mm6 java version "1.5.0-beta" Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0-beta-b30) Java HotSpot(TM) 64-Bit Server VM (build 1.5.0-beta-b30, mixed mode) i = 0, j=-1, len = 1000000000 i = 1, j=-1, len = 500000000 [ParNew 254K->126K(16320K), 0.0175627 secs] [Full GC 126K->109K(16320K), 0.0625521 secs] [Full GC 109K->108K(495552K), 0.0557375 secs] i = 2, j=-1, len = 250000000 [ParNew 761K->108K(524224K), 0.0028878 secs] [Full GC 108K->108K(524224K), 0.0581807 secs] [Full GC 108K->108K(524224K), 0.0571065 secs] i = 3, j=-1, len = 125000000 [GC 489042K(524224K), 0.0009095 secs] [ParNew 489042K->488390K(524224K), 16.6240846 secs] [Full GC 488390K->488390K(524224K), 8.0328821 secs] [Full GC 488390K->488390K(524224K), 6.2194249 secs] i = 4, j=0, len = 62500000 147.0u 10.0s 0:42 370% 0+0k 0+0io 0pf+0w So, to summarize: * It was my mistake to claim that GC works slower for Tiger than for Mantis. I'm sorry. * Actually, thanks to the Tiger-new GC ergonomics feature, Tiger just activates different kind of GC and tries to utilize more of RAM if the host machine is server-class. * But unfortunately, performance of the concurrent GC looks dramatically slow -- comparing to the serial GC. * For me, this looks like a performance bug of the concurent GC. Somehow, the bug concerns to the "ParNew" phase of GC cycle. * This bug may lead to essential performance degradation if rather huge amount of RAM is assigned for Java heap. * Degradation may affect a user application, even if the application is simple and never requests any huge heap nor GC concurrency. ====================================================================== This doesn't really appear to be a regression, more of an RFE to make UseConcMarkSweepGC faster. I'm forwarding to the appropriate people so they can take a look. ###@###.### 2003-12-16
16-12-2003