United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
JDK-5025281 : Allow System.gc() to trigger concurrent (not stop-the-world) full collections

Details
Type:
Enhancement
Submit Date:
2004-04-01
Status:
Resolved
Updated Date:
2004-08-30
Project Name:
JDK
Resolved Date:
2004-08-30
Component:
hotspot
OS:
solaris_8,linux,generic
Sub-Component:
gc
CPU:
x86,sparc,generic
Priority:
P4
Resolution:
Fixed
Affected Versions:
5.0u6,6
Fixed Versions:

Related Reports
Backport:
Duplicate:
Relates:
Relates:
Relates:
Relates:
Relates:
Relates:

Sub Tasks

Description

Name: rmT116609			Date: 04/01/2004


A DESCRIPTION OF THE REQUEST :
Currently, System.gc() always forces a full, stop-the-world, collection regardless of the collector policy being used.

When the concurrent mark-sweep (CMS) collector is in use, it would be better to have System.gc() trigger a concurrent full collection instead of a stop-the-world collection.

As some callers of System.gc() (e.g. NIO and RMI's DGC) rely on System.gc() only returning after the collection is completed, System.gc() would still need to block until the collection completes; the change is that when CMS is in use, only the calling thread would be blocked during the concurrent phases of GC, not all threads.

JUSTIFICATION :
The CMS collector is used mostly for applications where GC pauses must be kept low -- i.e. latency-sensitive applications. A stop-the-world full collection causes an unacceptably large pause for these applications -- part of tuning the CMS collector for an application involves making sure a stop-the-world full collection never occurs in normal operation.

The current System.gc() implementation always forces a full collection. The only way to avoid this currently is to pass -XX:+DisableExplicitGC to turn System.gc() into a no-op. Avoiding calls to System.gc() entirely is not possible as some calls are made from within the standard Java libraries (e.g. NIO and RMI DGC).

However, completely disabling System.gc() in this way does not work well either, as callers of System.gc() generally do so for a reason -- e.g. NIO calls it to reclaim direct buffers when it runs out of direct buffer space, DGC calls it to get a more accurate view of live RMI references.

The attached testcase shows some of the problems with NIO and -XX:+DisableExplicitGC. Without -XX:+DisableExplicitGC, full stop-the-world GCs are forced. With it, NIO buffer allocation fails.

Note that the attached testcase exposes another NIO-related bug (submitted with review ID 233528) which can cause spurious OutOfMemoryErrors even when System.gc() is enabled.

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
-XX:+DisableExplicitGC should not be needed to avoid stop-the-world collections. The testcase should run with concurrent collections only.
ACTUAL -
With System.gc() enabled:

oliver@flood:~/nio-bugs$ java -verbose:gc -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:MaxDirectMemorySize=1M NIOAllocate 1000 10000
Allocating 10000 NIO Direct buffers of 1000 bytes each.
0 buffers allocated.
1001 buffers allocated.
[GC 246K->207K(16320K), 0.0162750 secs]
[Full GC 207K->206K(16320K), 0.0348480 secs]
Caught OOME allocating buffer #1049
2002 buffers allocated.
[GC 330K->329K(16320K), 0.0063360 secs]
[Full GC 329K->206K(16320K), 0.0292560 secs]
Caught OOME allocating buffer #2097
3003 buffers allocated.
[GC 329K->329K(16320K), 0.0050900 secs]
[Full GC 329K->206K(16320K), 0.0291510 secs]
Caught OOME allocating buffer #3145
4004 buffers allocated.
[GC 329K->329K(16320K), 0.0053420 secs]
[Full GC 329K->206K(16320K), 0.0297570 secs]
Caught OOME allocating buffer #4193
5005 buffers allocated.
[GC 329K->329K(16320K), 0.0050510 secs]
[Full GC 329K->206K(16320K), 0.0291440 secs]
Caught OOME allocating buffer #5241
6006 buffers allocated.
[GC 329K->329K(16320K), 0.0053410 secs]
[Full GC 329K->206K(16320K), 0.0288150 secs]
Caught OOME allocating buffer #6289
7007 buffers allocated.
[GC 329K->329K(16320K), 0.0059880 secs]
[Full GC 329K->206K(16320K), 0.0287610 secs]
Caught OOME allocating buffer #7337
8008 buffers allocated.
[GC 329K->329K(16320K), 0.0051440 secs]
[Full GC 329K->206K(16320K), 0.0293400 secs]
Caught OOME allocating buffer #8385
9009 buffers allocated.
[GC 329K->329K(16320K), 0.0051870 secs]
[Full GC 329K->206K(16320K), 0.0287630 secs]
Caught OOME allocating buffer #9433
Done.

With System.gc() disabled:

oliver@flood:~/nio-bugs$ java -verbose:gc -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:MaxDirectMemorySize=1M -XX:+DisableExplicitGC NIOAllocate 1000 10000
Allocating 10000 NIO Direct buffers of 1000 bytes each.
0 buffers allocated.
1001 buffers allocated.
Caught OOME allocating buffer #1049
Caught OOME allocating buffer #1049
Caught OOME allocating buffer #1049
Caught OOME allocating buffer #1049
Caught OOME allocating buffer #1049
Caught OOME allocating buffer #1049
Caught OOME allocating buffer #1049
Caught OOME allocating buffer #1049
Caught OOME allocating buffer #1049
Caught OOME allocating buffer #1049
Caught OOME allocating buffer #1049
Caught OOME allocating buffer #1049
Caught OOME allocating buffer #1049
Caught OOME allocating buffer #1049
Caught OOME allocating buffer #1049
Caught OOME allocating buffer #1049
[.. continues indefinitely ..]


---------- BEGIN SOURCE ----------
public class NIOAllocate {
    public static void main(String[] args) throws Exception {
        if (args.length < 2) {
            System.err.println("syntax: java NIOAllocate <buffer size> <buffer count>");
            return;
        }

        int bufferSize = Integer.parseInt(args[0]);
        int bufferCount = Integer.parseInt(args[1]);
        int progressSize = bufferCount / 10 + 1;
        
        System.err.println("Allocating " + bufferCount + " NIO Direct buffers of " + bufferSize + " bytes each.");

        for (int i = 0; i < bufferCount; ++i) {
            if (i % progressSize == 0)
                System.err.println(i + " buffers allocated.");
            
            try {
                java.nio.ByteBuffer testBuffer = java.nio.ByteBuffer.allocateDirect(bufferSize);
            } catch (OutOfMemoryError oome) {
                System.err.println("Caught OOME allocating buffer #" + (i+1));
                Thread.sleep(500);
                --i; // Try again.
            }
        }

        System.err.println("Done.");
    }
}

---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
None. Currently we must avoid using any code that requires calls to System.gc() (e.g. NIO direct buffers) to operate correctly, and run with +XX:+DisableExplicitGC.
(Incident Review ID: 233534) 
======================================================================

                                    

Comments
EVALUATION

###@###.### 2004-04-08: This is a good suggestion
which we have considered in the past. While this is simple to do,
it is not clear if applications will miss the "prompter, but
more intrusive" and stronger semantics offered by the
current implementation. It may be worthwhile to consider
a weaker version of System.gc(weak?) that does what's
asked in this RFE, in addition to the existing stronger
semantics; that way we offer both choices to users. However,
that's an API change/addition. Basically, my
problem is that I do not know how customers make use of
System.gc(). A CAP survey may yield some clues as to whether
a change such as this would be mostly beneficial or not.

In any case, this is too big a change, in terms of customer
impact (not in terms of implementation work), i think, for Tiger
at this late stage. We should take this up for either
Dragonfly or for Mustang, with an appropriate
customer survey well before the fact.

If you want to vote on this proposal, you have three choices,
plus a don't care vote:
(1) current (status quo)
(2) that proposed by this customer (offer "weak" semantics only for 
    concurrent collectors)
(3) my refinement of customer's proposal (offer "weak" _and_ "strong"
    semantics for concurrent collectors; implies API change; will
    require certain libraries to be rewritten to use "weak" version
    of call)
(4) don't care among the above

You can cast your vote on JDC (make sure you include your
choice in your vote; write-in ballots welcome!).


###@###.### 2004-04-20: The polls are now closed; thanks
to all those who participated in this exercise in democracy ;-)
Based on your (collective) feedback, we have decided to implement
this via the least disruptive and most useful route, namely,
via a -XX commad-line option, subject to approval by the
appropriate committee.

###@###.### 2004-05-21: libjvm.so's available for
testing by customer.
                                     
2004-05-21
SUGGESTED FIX

Event:            putback-to
Parent workspace: /net/jano.sfbay/export/disk05/hotspot/ws/main/post_tiger_gc_baseline
                  (jano.sfbay:/export/disk05/hotspot/ws/main/post_tiger_gc_baseline)
Child workspace:  /prt-workspaces/20040616131632.ysr.dragon_work/workspace
                  (prt-web:/prt-workspaces/20040616131632.ysr.dragon_work/workspace)
User:             ysr

Comment:

---------------------------------------------------------

Original workspace:     neeraja:/net/spot/archive02/ysr/dragon_work
Submitter:              ysr
Archived data:          /net/prt-archiver.sfbay/export2/archived_workspaces/main/post_tiger_gc_baseline/2004/20040616131632.ysr.dragon_work/
Webrev:                 http://analemma.sfbay.sun.com/net/prt-archiver.sfbay/export2/archived_workspaces/main/post_tiger_gc_baseline/2004/20040616131632.ysr.dragon_work/workspace/webrevs/webrev-2004.06.16/index.html

Fixed 5025281: Allow Systm.gc() to trigger concurrent (not stop-world) full collections
Fixed 4780073: Can/Should/Could a partial gc request short-circuit a full gc request?

  http://analemma.sfbay/net/spot/archive02/ysr/dragon_work/webrev

For the first, an RFE, we define a satisfying collection as any
collection of the old gen that starts following the System.gc()
request. This may be either a concurrent collection or a stop-world
collection. A new vm operation, VM_GenCollectFullConcurrent,
implements the guts of this RFE. A young collection is first
done, unless one has already been done following the request,
followed by a concurrent collection, unless one has already
started following the request. The caller is blocked until such
a satisfying collection is completed.

For the second, a soft-"bug", we endow CollectedHeap with
a full_gc_count (in addition to the existing gc_count which
does not discriminate between young and full gc's). We use the
latter to decide if a fresh full collection needs to be
initiated. This prevents a young collection from inadvertently
short-circuiting a full gc request. Similarly, the
PrintClassHistogram option is always made to succeed, so that
a young collection will not, inadvertently, short-circuit such
a class histogram request. Furthermore, the PrintClassHistogram
option becomes a no-op for non-GenCollectedHeap's (previously
it would do a futile safepoint), pending RFE 5023697.

Reviewed by: John Coomes, Jon Masamitsu
             Fred Oliver (partial)

Fix Verified: y

Verification Testing:
 used PrintGCDetails to verify that a System.gc() caused
 the expected behaviour with the new flag

Other Testing: (w/ and w/o -XX:+ExplicitGCInvokesConcurrent)
 . PRT
 . refworkload with fastdebug, product (sparc only)
 . vmark
 . preliminary testing by feature requestor (Opencloud, a Sun partner)

Files:
update: src/share/vm/includeDB_core
update: src/share/vm/gc_implementation/parallelScavenge/parallelScavengeHeap.cpp
update: src/share/vm/gc_implementation/parallelScavenge/psMarkSweep.cpp
update: src/share/vm/gc_implementation/parallelScavenge/psScavenge.cpp
update: src/share/vm/gc_interface/collectedHeap.cpp
update: src/share/vm/gc_interface/collectedHeap.hpp
update: src/share/vm/memory/concurrentMarkSweepGeneration.cpp
update: src/share/vm/memory/concurrentMarkSweepGeneration.hpp
update: src/share/vm/memory/genCollectedHeap.cpp
update: src/share/vm/memory/genCollectedHeap.hpp
update: src/share/vm/runtime/arguments.cpp
update: src/share/vm/runtime/globals.hpp
update: src/share/vm/runtime/mutexLocker.cpp
update: src/share/vm/runtime/mutexLocker.hpp
update: src/share/vm/runtime/timer.hpp
update: src/share/vm/runtime/vmThread.cpp
update: src/share/vm/runtime/vm_operations.cpp
update: src/share/vm/runtime/vm_operations.hpp

Examined files: 3215

Contents Summary:
      18   update
    3197   no action (unchanged)
                                     
2004-08-31
CONVERTED DATA

BugTraq+ Release Management Values

COMMIT TO FIX:
dragon
mustang

FIXED IN:
dragon
mustang

INTEGRATED IN:
mustang


                                     
2004-08-31
SUGGESTED FIX

The URL for the webrev has changed from that listed above to:

http://analemma.sfbay.sun.com/net/prt-archiver.sfbay/data/archived_workspaces/main/post_tiger_gc_baseline/2004/20040616131632.ysr.dragon_work/workspace/webrevs/webrev-2004.06.16/index.html
                                     
2007-04-03



Hardware and Software, Engineered to Work Together