JDK-6493287 : Unproductive CPU-spinning GCs on heap exhaustion; please throw OOME instead
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 6
  • Priority: P3
  • Status: Closed
  • Resolution: Not an Issue
  • OS: generic
  • CPU: generic
  • Submitted: 2006-11-14
  • Updated: 2010-08-19
  • Resolved: 2007-02-27
Related Reports
Relates :  
Relates :  
Relates :  
Description
While investigating
6490770: Supposedly fixed memory leak leads to 100% VM CPU usage
two Linux test machines died with CPU usage so high that it was not even
possible to kill the offending Java process.

The following program has a memory leak in the Java libraries.
------------------------------------------
import java.util.concurrent.*;

public class Bug3 {
    private static int intArg(String[] args, int i, int defaultValue) {
	return args.length > i ? Integer.parseInt(args[i]) : defaultValue;
    }

    public static void main(String[] args) {
	final BlockingQueue<Object> q = new LinkedBlockingQueue<Object>();
	final int threads = intArg(args, 0, 10);
	final int millis  = intArg(args, 1, 100);

	for (int i = 0; i < threads; i++)
            new Thread() { public void run() {
		while (true) {
		    try { q.poll(millis, TimeUnit.MILLISECONDS); }
		    catch (Throwable t) { throw new Error(t); }}}}.start();
    }
}
------------------------------------------

When the heap is exhausted, 
the load average (as measured by "uptime") jumps dramatically, and 
the number of GCs (as measured by "verbose:gc") jumps dramatically.


Seen on Linux and Solaris (but Solaris is much more robust in the face of
many threads demanding to be run) and with Serial and Parallel collectors,
and with -server and -client.  E.g. run like this:

$ jver -v 6 jr -server -verbose:gc -XX:+UseParallelGC -Xms3m -Xmx3m Bug3 10 1
Using JDK /java/re/jdk/6/promoted/latest/binaries/solaris-sparcv9
==> javac -source 1.6 -Xlint:all Bug3.java
==> java -server -verbose:gc -XX:+UseParallelGC -Xms3m -Xmx3m -esa -ea Bug3 10 1

These tests take a long time to run till the heap is exhausted.

linux-i586 check6:

 jver 6 jr -verbose:gc -Xmx2m -Xms2m -XX:+UseSerialGC -server  Bug3 10 10
==> javac -source 1.6 -Xlint:all Bug3.java
==> java -verbose:gc -Xmx2m -Xms2m -XX:+UseSerialGC -server -esa -ea Bug3 10 10
[GC 896K->628K(1984K), 0.0156240 secs]
[GC 1524K->1483K(2496K), 0.0207210 secs]
[Full GC 1483K->1483K(2496K), 0.0581490 secs]
[GC 2379K->2324K(3436K), 0.0138730 secs]
# Here I ^C; load average has gone to 7.

Similarly for:
$ jver 6 jr -verbose:gc -Xmx2m -Xms2m -XX:+UseParallelGC -client  Bug3 10 10

On Linux I do not see a frenzy of output from verbose:gc,
but I do on Solaris.  I suspect that the verbose:gc output simply isn't
successfully printed.

On Solaris, I get this output:

 $ jver 6 jr -verbose:gc -Xmx2200k -Xms2200k -XX:+UseSerialGC -server  Bug3 10 30
==> javac -source 1.6 -Xlint:all Bug3.java
==> java -verbose:gc -Xmx2200k -Xms2200k -XX:+UseSerialGC -server -esa -ea Bug3 10 30
[GC 1792K->1453K(3328K), 0.0409150 secs]
[Full GC 1453K->1453K(3328K), 0.0400240 secs]
[GC 3128K->2987K(4856K), 0.0503332 secs]
[Full GC 2987K->2987K(4856K), 0.0906168 secs]
[GC 4983K->4507K(7096K), 0.0814396 secs]
[Full GC 6619K->5540K(7096K), 0.1808850 secs]
[Full GC 6967K->6035K(7936K), 0.1391880 secs]
[Full GC 7935K->6537K(7936K), 0.2170006 secs]
[Full GC 7935K->6800K(7936K), 0.1249482 secs]
[Full GC 7935K->7035K(7936K), 0.1788394 secs]
[Full GC 7935K->7256K(7936K), 0.2360202 secs]
[Full GC 7935K->7419K(7936K), 0.2703050 secs]
[Full GC 7935K->7509K(7936K), 0.1307258 secs]
[Full GC 7935K->7574K(7936K), 0.3245770 secs]
[Full GC 7935K->7624K(7936K), 0.1750772 secs]
[Full GC 7935K->7655K(7936K), 0.1913908 secs]
[Full GC 7935K->7709K(7936K), 0.2180826 secs]
[Full GC 7935K->7730K(7936K), 0.1334918 secs]
[Full GC 7935K->7750K(7936K), 0.2484308 secs]
[Full GC 7935K->7780K(7936K), 0.1406920 secs]
[Full GC 7935K->7803K(7936K), 0.2278106 secs]
[Full GC 7935K->7814K(7936K), 0.2625662 secs]
[Full GC 7935K->7841K(7936K), 0.1850372 secs]
[Full GC 7935K->7847K(7936K), 0.1443554 secs]
[Full GC 7935K->7850K(7936K), 0.2105556 secs]
[Full GC 7935K->7855K(7936K), 0.2375956 secs]
[Full GC 7935K->7856K(7936K), 0.1593658 secs]
[Full GC 7935K->7857K(7936K), 0.1466886 secs]
[Full GC 7935K->7859K(7936K), 0.1348994 secs]
[Full GC 7935K->7872K(7936K), 0.1697830 secs]
[Full GC 7935K->7881K(7936K), 0.3186660 secs]
[Full GC 7935K->7886K(7936K), 0.1604742 secs]
[Full GC 7935K->7894K(7936K), 0.2483718 secs]
[Full GC 7935K->7897K(7936K), 0.2059152 secs]
[Full GC 7935K->7899K(7936K), 0.2144390 secs]
[Full GC 7935K->7899K(7936K), 0.1493318 secs]
[Full GC 7935K->7899K(7936K), 0.1454098 secs]
[Full GC 7935K->7914K(7936K), 0.1987240 secs]
[Full GC 7935K->7915K(7936K), 0.1354908 secs]
[Full GC 7935K->7915K(7936K), 0.1417004 secs]
[Full GC 7935K->7918K(7936K), 0.1793320 secs]
[Full GC 7935K->7918K(7936K), 0.1934992 secs]
[Full GC 7935K->7918K(7936K), 0.1477912 secs]
[Full GC 7935K->7923K(7936K), 0.1387894 secs]
[Full GC 7935K->7926K(7936K), 0.1406162 secs]
[Full GC 7935K->7928K(7936K), 0.2726940 secs]
[Full GC 7935K->7928K(7936K), 0.1427668 secs]
[Full GC 7935K->7928K(7936K), 0.1310112 secs]
[Full GC 7935K->7928K(7936K), 0.2180804 secs]
[Full GC 7935K->7929K(7936K), 0.2701286 secs]
[Full GC 7935K->7929K(7936K), 0.1469068 secs]
[Full GC 7935K->7929K(7936K), 0.1436338 secs]
[Full GC 7935K->7930K(7936K), 0.2601764 secs]
[Full GC 7935K->7930K(7936K), 0.3005986 secs]
[Full GC 7935K->7930K(7936K), 0.1370202 secs]
[Full GC 7935K->7930K(7936K), 0.1430302 secs]
[Full GC 7935K->7930K(7936K), 0.1948008 secs]
[Full GC 7935K->7930K(7936K), 0.2450178 secs]
[Full GC 7935K->7931K(7936K), 0.1523952 secs]
[Full GC 7935K->7931K(7936K), 0.2538496 secs]
[Full GC 7935K->7931K(7936K), 0.2523624 secs]
[Full GC 7935K->7933K(7936K), 0.2655210 secs]
[Full GC 7935K->7933K(7936K), 0.2356198 secs]
[Full GC 7935K->7933K(7936K), 0.1775052 secs]
[Full GC 7935K->7933K(7936K), 0.1864234 secs]
[Full GC 7935K->7933K(7936K), 0.1488638 secs]
[Full GC 7935K->7933K(7936K), 0.2692090 secs]
[Full GC 7935K->7933K(7936K), 0.2173580 secs]
[Full GC 7935K->7933K(7936K), 0.1416144 secs]
[Full GC 7935K->7933K(7936K), 0.2482622 secs]
[Full GC 7935K->7933K(7936K), 0.2727218 secs]
[Full GC 7935K->7933K(7936K), 0.1343704 secs]
[Full GC 7935K->7933K(7936K), 0.1438122 secs]
[Full GC 7935K->7933K(7936K), 0.2276202 secs]
[Full GC 7935K->7933K(7936K), 0.3102684 secs]
[Full GC 7935K->7933K(7936K), 0.1440892 secs]
[Full GC 7935K->7933K(7936K), 0.2725500 secs]
[Full GC 7935K->7933K(7936K), 0.1342194 secs]
[Full GC 7935K->7935K(7936K), 0.2581534 secs]
[Full GC 7935K->7935K(7936K), 0.2205910 secs]
[Full GC 7935K->7935K(7936K), 0.2814272 secs]
[Full GC 7935K->7935K(7936K), 0.3107006 secs]
[Full GC 7935K->7935K(7936K), 0.2859620 secs]
[Full GC 7935K->7935K(7936K), 0.3437136 secs]
[Full GC 7935K->7935K(7936K), 0.1455586 secs]
[Full GC 7935K->7935K(7936K), 0.3266726 secs]
[Full GC 7935K->7935K(7936K), 0.2792878 secs]
[Full GC 7935K->7935K(7936K), 0.2019102 secs]
[Full GC 7935K->7935K(7936K), 0.2835920 secs]
[Full GC 7935K->7935K(7936K), 0.2474744 secs]
[Full GC 7935K->7935K(7936K), 0.2727414 secs]
[Full GC 7935K->7935K(7936K), 0.2555420 secs]
[Full GC 7935K->7935K(7936K), 0.2852250 secs]
[Full GC 7935K->7934K(7936K), 0.2697394 secs]
[Full GC 7935K->7935K(7936K), 0.1420054 secs]
[Full GC 7935K->7935K(7936K), 0.2935430 secs]
[Full GC 7935K->7935K(7936K), 0.3216336 secs]
[Full GC 7935K->7935K(7936K), 0.2806712 secs]
[Full GC 7935K->7935K(7936K), 0.2571510 secs]
..... etc......

Although the stability of the system itself isn't affected on Solaris,
instead of repeated unproductive GCs, we'd really like to see an OOME here.

Comments
EVALUATION I ran with the serial collector until there were almost continuous collections. java -client -Xmx2200k -Xms2200k -XX:+UseSerialGC \ -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC \ -server -esa -ea \ Bug3 10 30 At the point all the space in the tenured generation was full of live data Tenured generation before tenured generation total 5504K, used 5503K [0xf72a0000, 0xf7800000, 0xf7800000) the space 5504K, 99% used [0xf72a0000, 0xf77ffff0, 0xf7800000, 0xf7800000) Tenured generation after tenured generation total 5504K, used 5503K [0xf72a0000, 0xf7800000, 0xf7800000) the space 5504K, 99% used [0xf72a0000, 0xf77ffff0, 0xf7800000, 0xf7800000) The collection of the young generation was still recovering garbage which was in from-space. Before GC def new generation total 2432K, used 2432K [0xf7000000, 0xf72a0000, 0xf72a0000) eden space 2176K, 100% used [0xf7000000, 0xf7220000, 0xf7220000) from space 256K, 100% used [0xf7220000, 0xf7260000, 0xf7260000) to space 256K, 0% used [0xf7260000, 0xf7260000, 0xf72a0000) After GC def new generation total 2432K, used 2431K [0xf7000000, 0xf72a0000, 0xf72a0000) eden space 2176K, 100% used [0xf7000000, 0xf7220000, 0xf7220000) from space 256K, 99% used [0xf7220000, 0xf725ff00, 0xf7260000) to space 256K, 0% used [0xf7260000, 0xf7260000, 0xf72a0000) These are from collections toward the end (before I killed the job) and are typical of the later collections. 256b are being recovered so that is keeping the VM from throwing an out-of-memory. A similar analysis of the parallel collector showed a larger number of the bytes being collected (= 13632) on alternate collections. Before GC eden space 2048K, 100% used [0xfb560000,0xfb760000,0xfb760000) After GC eden space 2048K, 99% used [0xfb560000,0xfb75cac0,0xfb760000) Also after the full collections start, the survivor space are empty after each collection. This difference with the serial collector is likely the result of not doing direct allocation out of from-space by the parallel collector. Objects are only copied to the survivor spaces during a scavenge and no more scavenges are occurring. Some policy change may be appropriate to make better use of the survivor spaces although, because only full collections are being done, it is reasonable that the survivor spaces are not used. The heap is not exhausted so an out-of-memory should not be thrown.
20-11-2006