JDK-5025281 : Allow System.gc() to trigger concurrent (not stop-the-world) full collections
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 5.0u6,6
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • OS: generic,linux,solaris_8
  • CPU: generic,x86,sparc
  • Submitted: 2004-04-01
  • Updated: 2004-08-30
  • Resolved: 2004-08-30
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 6
6 mustangFixed
Related Reports
Duplicate :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Description

Name: rmT116609			Date: 04/01/2004


A DESCRIPTION OF THE REQUEST :
Currently, System.gc() always forces a full, stop-the-world, collection regardless of the collector policy being used.

When the concurrent mark-sweep (CMS) collector is in use, it would be better to have System.gc() trigger a concurrent full collection instead of a stop-the-world collection.

As some callers of System.gc() (e.g. NIO and RMI's DGC) rely on System.gc() only returning after the collection is completed, System.gc() would still need to block until the collection completes; the change is that when CMS is in use, only the calling thread would be blocked during the concurrent phases of GC, not all threads.

JUSTIFICATION :
The CMS collector is used mostly for applications where GC pauses must be kept low -- i.e. latency-sensitive applications. A stop-the-world full collection causes an unacceptably large pause for these applications -- part of tuning the CMS collector for an application involves making sure a stop-the-world full collection never occurs in normal operation.

The current System.gc() implementation always forces a full collection. The only way to avoid this currently is to pass -XX:+DisableExplicitGC to turn System.gc() into a no-op. Avoiding calls to System.gc() entirely is not possible as some calls are made from within the standard Java libraries (e.g. NIO and RMI DGC).

However, completely disabling System.gc() in this way does not work well either, as callers of System.gc() generally do so for a reason -- e.g. NIO calls it to reclaim direct buffers when it runs out of direct buffer space, DGC calls it to get a more accurate view of live RMI references.

The attached testcase shows some of the problems with NIO and -XX:+DisableExplicitGC. Without -XX:+DisableExplicitGC, full stop-the-world GCs are forced. With it, NIO buffer allocation fails.

Note that the attached testcase exposes another NIO-related bug (submitted with review ID 233528) which can cause spurious OutOfMemoryErrors even when System.gc() is enabled.

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
-XX:+DisableExplicitGC should not be needed to avoid stop-the-world collections. The testcase should run with concurrent collections only.
ACTUAL -
With System.gc() enabled:

oliver@flood:~/nio-bugs$ java -verbose:gc -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:MaxDirectMemorySize=1M NIOAllocate 1000 10000
Allocating 10000 NIO Direct buffers of 1000 bytes each.
0 buffers allocated.
1001 buffers allocated.
[GC 246K->207K(16320K), 0.0162750 secs]
[Full GC 207K->206K(16320K), 0.0348480 secs]
Caught OOME allocating buffer #1049
2002 buffers allocated.
[GC 330K->329K(16320K), 0.0063360 secs]
[Full GC 329K->206K(16320K), 0.0292560 secs]
Caught OOME allocating buffer #2097
3003 buffers allocated.
[GC 329K->329K(16320K), 0.0050900 secs]
[Full GC 329K->206K(16320K), 0.0291510 secs]
Caught OOME allocating buffer #3145
4004 buffers allocated.
[GC 329K->329K(16320K), 0.0053420 secs]
[Full GC 329K->206K(16320K), 0.0297570 secs]
Caught OOME allocating buffer #4193
5005 buffers allocated.
[GC 329K->329K(16320K), 0.0050510 secs]
[Full GC 329K->206K(16320K), 0.0291440 secs]
Caught OOME allocating buffer #5241
6006 buffers allocated.
[GC 329K->329K(16320K), 0.0053410 secs]
[Full GC 329K->206K(16320K), 0.0288150 secs]
Caught OOME allocating buffer #6289
7007 buffers allocated.
[GC 329K->329K(16320K), 0.0059880 secs]
[Full GC 329K->206K(16320K), 0.0287610 secs]
Caught OOME allocating buffer #7337
8008 buffers allocated.
[GC 329K->329K(16320K), 0.0051440 secs]
[Full GC 329K->206K(16320K), 0.0293400 secs]
Caught OOME allocating buffer #8385
9009 buffers allocated.
[GC 329K->329K(16320K), 0.0051870 secs]
[Full GC 329K->206K(16320K), 0.0287630 secs]
Caught OOME allocating buffer #9433
Done.

With System.gc() disabled:

oliver@flood:~/nio-bugs$ java -verbose:gc -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:MaxDirectMemorySize=1M -XX:+DisableExplicitGC NIOAllocate 1000 10000
Allocating 10000 NIO Direct buffers of 1000 bytes each.
0 buffers allocated.
1001 buffers allocated.
Caught OOME allocating buffer #1049
Caught OOME allocating buffer #1049
Caught OOME allocating buffer #1049
Caught OOME allocating buffer #1049
Caught OOME allocating buffer #1049
Caught OOME allocating buffer #1049
Caught OOME allocating buffer #1049
Caught OOME allocating buffer #1049
Caught OOME allocating buffer #1049
Caught OOME allocating buffer #1049
Caught OOME allocating buffer #1049
Caught OOME allocating buffer #1049
Caught OOME allocating buffer #1049
Caught OOME allocating buffer #1049
Caught OOME allocating buffer #1049
Caught OOME allocating buffer #1049
[.. continues indefinitely ..]


---------- BEGIN SOURCE ----------
public class NIOAllocate {
    public static void main(String[] args) throws Exception {
        if (args.length < 2) {
            System.err.println("syntax: java NIOAllocate <buffer size> <buffer count>");
            return;
        }

        int bufferSize = Integer.parseInt(args[0]);
        int bufferCount = Integer.parseInt(args[1]);
        int progressSize = bufferCount / 10 + 1;
        
        System.err.println("Allocating " + bufferCount + " NIO Direct buffers of " + bufferSize + " bytes each.");

        for (int i = 0; i < bufferCount; ++i) {
            if (i % progressSize == 0)
                System.err.println(i + " buffers allocated.");
            
            try {
                java.nio.ByteBuffer testBuffer = java.nio.ByteBuffer.allocateDirect(bufferSize);
            } catch (OutOfMemoryError oome) {
                System.err.println("Caught OOME allocating buffer #" + (i+1));
                Thread.sleep(500);
                --i; // Try again.
            }
        }

        System.err.println("Done.");
    }
}

---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
None. Currently we must avoid using any code that requires calls to System.gc() (e.g. NIO direct buffers) to operate correctly, and run with +XX:+DisableExplicitGC.
(Incident Review ID: 233534) 
======================================================================

Comments
SUGGESTED FIX The URL for the webrev has changed from that listed above to: http://analemma.sfbay.sun.com/net/prt-archiver.sfbay/data/archived_workspaces/main/post_tiger_gc_baseline/2004/20040616131632.ysr.dragon_work/workspace/webrevs/webrev-2004.06.16/index.html
03-04-2007

CONVERTED DATA BugTraq+ Release Management Values COMMIT TO FIX: dragon mustang FIXED IN: dragon mustang INTEGRATED IN: mustang
31-08-2004

SUGGESTED FIX Event: putback-to Parent workspace: /net/jano.sfbay/export/disk05/hotspot/ws/main/post_tiger_gc_baseline (jano.sfbay:/export/disk05/hotspot/ws/main/post_tiger_gc_baseline) Child workspace: /prt-workspaces/20040616131632.ysr.dragon_work/workspace (prt-web:/prt-workspaces/20040616131632.ysr.dragon_work/workspace) User: ysr Comment: --------------------------------------------------------- Original workspace: neeraja:/net/spot/archive02/ysr/dragon_work Submitter: ysr Archived data: /net/prt-archiver.sfbay/export2/archived_workspaces/main/post_tiger_gc_baseline/2004/20040616131632.ysr.dragon_work/ Webrev: http://analemma.sfbay.sun.com/net/prt-archiver.sfbay/export2/archived_workspaces/main/post_tiger_gc_baseline/2004/20040616131632.ysr.dragon_work/workspace/webrevs/webrev-2004.06.16/index.html Fixed 5025281: Allow Systm.gc() to trigger concurrent (not stop-world) full collections Fixed 4780073: Can/Should/Could a partial gc request short-circuit a full gc request? http://analemma.sfbay/net/spot/archive02/ysr/dragon_work/webrev For the first, an RFE, we define a satisfying collection as any collection of the old gen that starts following the System.gc() request. This may be either a concurrent collection or a stop-world collection. A new vm operation, VM_GenCollectFullConcurrent, implements the guts of this RFE. A young collection is first done, unless one has already been done following the request, followed by a concurrent collection, unless one has already started following the request. The caller is blocked until such a satisfying collection is completed. For the second, a soft-"bug", we endow CollectedHeap with a full_gc_count (in addition to the existing gc_count which does not discriminate between young and full gc's). We use the latter to decide if a fresh full collection needs to be initiated. This prevents a young collection from inadvertently short-circuiting a full gc request. Similarly, the PrintClassHistogram option is always made to succeed, so that a young collection will not, inadvertently, short-circuit such a class histogram request. Furthermore, the PrintClassHistogram option becomes a no-op for non-GenCollectedHeap's (previously it would do a futile safepoint), pending RFE 5023697. Reviewed by: John Coomes, Jon Masamitsu Fred Oliver (partial) Fix Verified: y Verification Testing: used PrintGCDetails to verify that a System.gc() caused the expected behaviour with the new flag Other Testing: (w/ and w/o -XX:+ExplicitGCInvokesConcurrent) . PRT . refworkload with fastdebug, product (sparc only) . vmark . preliminary testing by feature requestor (Opencloud, a Sun partner) Files: update: src/share/vm/includeDB_core update: src/share/vm/gc_implementation/parallelScavenge/parallelScavengeHeap.cpp update: src/share/vm/gc_implementation/parallelScavenge/psMarkSweep.cpp update: src/share/vm/gc_implementation/parallelScavenge/psScavenge.cpp update: src/share/vm/gc_interface/collectedHeap.cpp update: src/share/vm/gc_interface/collectedHeap.hpp update: src/share/vm/memory/concurrentMarkSweepGeneration.cpp update: src/share/vm/memory/concurrentMarkSweepGeneration.hpp update: src/share/vm/memory/genCollectedHeap.cpp update: src/share/vm/memory/genCollectedHeap.hpp update: src/share/vm/runtime/arguments.cpp update: src/share/vm/runtime/globals.hpp update: src/share/vm/runtime/mutexLocker.cpp update: src/share/vm/runtime/mutexLocker.hpp update: src/share/vm/runtime/timer.hpp update: src/share/vm/runtime/vmThread.cpp update: src/share/vm/runtime/vm_operations.cpp update: src/share/vm/runtime/vm_operations.hpp Examined files: 3215 Contents Summary: 18 update 3197 no action (unchanged)
31-08-2004

EVALUATION ###@###.### 2004-04-08: This is a good suggestion which we have considered in the past. While this is simple to do, it is not clear if applications will miss the "prompter, but more intrusive" and stronger semantics offered by the current implementation. It may be worthwhile to consider a weaker version of System.gc(weak?) that does what's asked in this RFE, in addition to the existing stronger semantics; that way we offer both choices to users. However, that's an API change/addition. Basically, my problem is that I do not know how customers make use of System.gc(). A CAP survey may yield some clues as to whether a change such as this would be mostly beneficial or not. In any case, this is too big a change, in terms of customer impact (not in terms of implementation work), i think, for Tiger at this late stage. We should take this up for either Dragonfly or for Mustang, with an appropriate customer survey well before the fact. If you want to vote on this proposal, you have three choices, plus a don't care vote: (1) current (status quo) (2) that proposed by this customer (offer "weak" semantics only for concurrent collectors) (3) my refinement of customer's proposal (offer "weak" _and_ "strong" semantics for concurrent collectors; implies API change; will require certain libraries to be rewritten to use "weak" version of call) (4) don't care among the above You can cast your vote on JDC (make sure you include your choice in your vote; write-in ballots welcome!). ###@###.### 2004-04-20: The polls are now closed; thanks to all those who participated in this exercise in democracy ;-) Based on your (collective) feedback, we have decided to implement this via the least disruptive and most useful route, namely, via a -XX commad-line option, subject to approval by the appropriate committee. ###@###.### 2004-05-21: libjvm.so's available for testing by customer.
21-05-2004