JDK-7188263 : G1: Excessive c_heap (malloc) consumption
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: hs24
  • Priority: P2
  • Status: Closed
  • Resolution: Fixed
  • OS: generic
  • CPU: generic
  • Submitted: 2012-07-31
  • Updated: 2013-09-18
  • Resolved: 2013-06-20
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
hs24Fixed
Related Reports
Relates :  
Relates :  
Description
G1 use a lot of c_heap (malloc) memory per GC thread (8Mb) which leads sometimes to OOM error. I collected some date using PrintMalloc:

src/share/vm/memory/universe.cpp
     }
   }
 
+  PrintMalloc=true;
   jint status = Universe::initialize_heap();
+  PrintMalloc=false;
   if (status != JNI_OK) {
     return status;
   }
@@ -1154,7 +1156,11 @@
   }
 
   // ("weak") refs processing infrastructure initialization
+  PrintMalloc=true;
+  tty->cr();
+  tty->print_cr("post_initialize");
   Universe::heap()->post_initialize();
+  PrintMalloc=false;
 
   GC_locker::unlock();  // allow gc after bootstrapping


sparc

G1:

java -d64 -XX:CICompilerCount=1 -XX:ParallelGCThreads=10 -Xms20g -Xmx20g -XX:+UseG1GC -version
os::malloc 2032 bytes --> 0x00000001001686d8
...
os::malloc 25 bytes --> 0x000000010ec0d388
-------
total 246,045,104 bytes

java -d64 -XX:CICompilerCount=1 -XX:ParallelGCThreads=20 -Xms20g -Xmx20g -XX:+UseG1GC -version
os::malloc 2032 bytes --> 0x00000001001686d8
...
os::malloc 25 bytes --> 0x00000001132b8048
-------
total 320,141,680 bytes


ParallelOldGC mostly used malloc in post_initialize() phase:

java -d64  -XX:-ZapUnusedHeapArea -XX:CICompilerCount=1 -XX:ParallelGCThreads=20 -Xms20g -Xmx20g -XX:+UseParallelOldGC -version > PS_20_malloc.log2
os::free 800 bytes --> 0x000000010015aaa8
os::free 200 bytes --> 0x0000000100164138
os::free 80 bytes --> 0x0000000100171f88
java version "1.6.0_25"
Java(TM) SE Runtime Environment (build 1.6.0_25-b06)
Java HotSpot(TM) 64-Bit Server VM (build 24.0-b18-internal-jvmg, mixed mode)

os::malloc 152 bytes --> 0x0000000100167a48
...
os::malloc 25 bytes --> 0x0000000100199e68

post_initialize
os::malloc 112 bytes --> 0x00000001002e6f98
...
os::malloc 1048576 bytes --> 0x0000000104290ef8
os::malloc 472 bytes --> 0x0000000104390f38
os::malloc 1048576 bytes --> 0x0000000104391148
------------------------------------------------
total 68,854,192 bytes
See public comments.

Comments
Nothing to verify
26-07-2013

The fixes for JDK-7197666 and JDK-8016556 have reduced the amount of memory that G1 allocates on the C-heap quite a bit. The main source of C-heap allocations that are left in G1 are the remembered sets. There is now a separate bug to track the remembered sets refactoring, JDK-8017163. Will close this bug. The remaining work will be tracked in JDK-8017163.
20-06-2013

Triaged for hs24
17-06-2013

PUBLIC COMMENTS When we look at the PrintMalloc output for G1, the largest allocation amounts are coming from the work queues and the global marking stack. With a TASKQUEUE_SIZE of 131072 (128K) entries, we get a global marking stack of 134217728 bytes (128Mb). Each task queue is: ## -> CMTaskQueue[0] os::malloc 88 bytes --> 0x0000000100b66b68 ## sizeof(E): 8, N: 131072 os::malloc 1048576 bytes --> 0x0000000100b66bf8 ## <- CMTaskQueue[0] G1 is particularly affected since we have a separate task queue for each of the GC worker threads (ParallelGCThreads) and a separate task queue for the maximum number of the concurrent marking threads (which may be up to ParallelGCThreads). So if you have a "large" T-series sparc machine with a large number of GC threads, we are eating a fair amount of C heap. For the global marking stack (and perhaps some of the other concurrent marking structures) we could use a virtual memory backing store. For the task queues we could reduce their size. The task queues that are used during the STW GC pause make use of the overflow mechanism built into the task queue itself. The concurrent marking task queues don't use the task queue mechanism. Instead they push a sequence of references on to the global marking stack. If we overflow the global mark stack, the concurrent marking portion of the marking cycle is restarted. These two actions should significantly reduce the start-up footprint of G1.
14-09-2012