JDK-4615723 : CMS: deal with CMS marking stack overflow
Type:Bug
Component:hotspot
Sub-Component:gc
Affected Version:1.4.1,5.0
Priority:P2
Status:Closed
Resolution:Other
OS:generic
CPU:generic
Submitted:2001-12-19
Updated:2012-10-03
Resolved:2012-10-03
The Version table provides details related to the release that this issue/RFE will be addressed.
Unresolved : Release in which this issue/RFE will be addressed. Resolved: Release in which this issue/RFE has been resolved. Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.
Been in resolved state for more than ten years. Closing.
03-10-2012
CONVERTED DATA
BugTraq+ Release Management Values
COMMIT TO FIX:
tiger
FIXED IN:
tiger
INTEGRATED IN:
tiger-b22
tiger-b25
tiger-beta
14-06-2004
EVALUATION
See comments section.
###@###.### 2003-08-05: This becomes more important
if CMS is expected to "replace" the train (in the sense of -Xincgc)
because client applications wouldn;t want to pay the cost of
the increased footprint from the marking stack. I am therefore
raising the priority to P2 for Tiger and committing this
to 1.5.
11-06-2004
WORK AROUND
###@###.### 2003-08-11: -XX:-CMSParallelRemarkEnabled
would workaround the silent work-list overflow.
However, there is no workaround for stack overflow during concurrent
marking or precleaning other than to run with a larger marking stack.
11-06-2004
SUGGESTED FIX
The first part has been putback:
Event: putback-to
Parent workspace: /net/jano/export/disk05/hotspot/ws/main/gc_baseline
(jano:/export/disk05/hotspot/ws/main/gc_baseline)
Child workspace: /export/imgr_home/ws/20030923170909.ysr.ovflw2
(balvenie:/export/imgr_home/ws/20030923170909.ysr.ovflw2)
User: ysr
Comment:
Original workspace: neeraja:/net/spot/archive02/ysr/ovflw2
Parent workspace: /net/jano/export/disk05/hotspot/ws/main/gc_baseline
Submitter: ysr
imgr data: /net/balvenie.sfbay/export/imgr_home/archive/main/gc_baseline/2003/20030923170909.ysr.ovflw2
Partial: 4615723 CMS: deal with CMS marking stack overflow
webrev: http://analemma.sfbay/net/spot/archive02/ysr/ovflw/webrev
This putback addresses stack overflow during the
concurrent phases: marking and prelceaning.
Overflow during the remark phase will be addressed
in a second putback.
To recover from overflow during marking, we
discard the stack contents, remembering the
least address thus discarded. Upon completion
of the forward traversal, during which the
restart address may need to be updated,
we do another traversal starting at the
remembered address, doing more iterative
retraversals as necessary.
In case of overflow during precleaning,
we remember to revisit the object by marking
the page in the mod union table on which
the discraded object lives.
In the absence of mutation, both are
guaranteed to terminate even with a
bounded (non-zero) marking stack size.
In the presence of mutation, termination
is guaranteed because of (for example) losing
the race to a foreground collection.
The current stack setting of 8K (down from the
original 8M) prevents overflow with all programs
in refWorkload. In order to let customers
assess the impact of frequent stack overflow
which can kill performance, we currently emit
a warning upon each such event.
Reviewed by: Ross Knippel, Jon Masamitsu
Verified fix: y
Verification testing:
. run with artificially low default and/or
max size to induce frequent stack overflow
Other testing: (CMS, ? artificially small stack size/max)
. imgr all platforms
. refworkload
. volanotest, atg
passed linux i486 product SPECjvm98 GeoMean 37.98 59.37
passed linux i486 product1 SPECjvm98 GeoMean 44.71 49.51
passed linux i486 productcore SPECjvm98 GeoMean 94.63 94.63
passed solaris i486 product SPECjvm98 GeoMean 36.34 56.81
passed solaris i486 product1 SPECjvm98 GeoMean 44.63 49.34
passed solaris i486 productcore SPECjvm98 GeoMean 104.42 104.42
passed solaris sparc product SPECjvm98 GeoMean 22.95 33.62
passed solaris sparc product1 SPECjvm98 GeoMean 22.71 24.90
passed solaris sparc productcore SPECjvm98 GeoMean 40.68 40.68
passed solaris sparcv9 product SPECjvm98 GeoMean 24.57 35.18
passed solaris sparcv9 productcore SPECjvm98 GeoMean 41.15 41.15
passed windows i486 compiler2 SPECjvm98 GeoMean 49.64 96.37
passed windows i486 compiler1 SPECjvm98 GeoMean 56.12 72.18
passed windows i486 core SPECjvm98 GeoMean 118.22 118.22
passed windows ia64 core SPECjvm98 GeoMean 15.55 15.55
Files:
update: src/share/vm/memory/concurrentMarkSweepGeneration.cpp
update: src/share/vm/memory/concurrentMarkSweepGeneration.hpp
update: src/share/vm/memory/concurrentMarkSweepGeneration.inline.hpp
update: src/share/vm/memory/genOopClosures.hpp
update: src/share/vm/runtime/globals.hpp
Examined files: 2882
Contents Summary:
5 update
2877 no action (unchanged)
###@###.### 2003-10-10: The following putback to gc_baseline
completes work on this bug.
Event: putback-to
Parent workspace: /net/jano.sfbay/export/disk05/hotspot/ws/main/gc_baseline
(jano.sfbay:/export/disk05/hotspot/ws/main/gc_baseline)
Child workspace: /prt-workspaces/20031010014636.ysr.ovflw2/workspace
(prt-web:/prt-workspaces/20031010014636.ysr.ovflw2/workspace)
User: ysr
Comment:
---------------------------------------------------------
Original workspace: neeraja:/net/spot/archive02/ysr/ovflw2
Submitter: ysr
Archived data: /net/prt-archiver.sfbay/export2/archived_workspaces/main/gc_baseline/2003/20031010014636.ysr.ovflw2/
Webrev: http://analemma.sfbay.sun.com/net/prt-web.sfbay/prt-workspaces/20031010014636.ysr.ovflw2/workspace/webrevs/webrev-2003.10.10/index.html
Fixed: 4615723 CMS: deal with CMS marking stack overflow
webrev: http://analemma.sfbay/net/spot/archive02/ysr/ovflw2/webrev
This putback completes marking-stack/work-queue overflow
during the stop-world remark phase (including that
encountered during reference processing).
Overflow objects are linked on a global
(per-collector) overflow list, via the
mark-word. Non-prototypical mark-words are
spooled into a C-heap growable array (this
will be revisited in the future), and restored
at the end of a phase.
The main change was that, in the parallel
case, we needed to make sure that each grey
object was handled by a unique thread, so
that in the event of an overflow an oop
would not be linked multiple times into the
overflow list.
Work queue overflow handling allowed us to reduce
the work queue size to 8K from the former 32K.
(We increased the single marking-stack size from
the former 8K to 32K to avoid the occasional
overflow in _209_db).
Reviewed by: Jon Masamitsu
Verified fix: y
Verification testing:
. run with CMSMarkStackOverflowALot
(as well as small CMSMarkStackOverflowInterval)
to induce frequent marking-stack/work-queue overflow
Other testing: (CMS, ? simulated overflow)
. spec
. refworkload
. volanotest
Files:
update: src/share/vm/memory/concurrentMarkSweepGeneration.cpp
update: src/share/vm/memory/concurrentMarkSweepGeneration.hpp
update: src/share/vm/memory/concurrentMarkSweepGeneration.inline.hpp
update: src/share/vm/memory/genOopClosures.hpp
update: src/share/vm/memory/referenceProcessor.cpp
update: src/share/vm/oops/oop.hpp
update: src/share/vm/oops/oop.inline.hpp
update: src/share/vm/runtime/globals.hpp
update: src/share/vm/utilities/bitMap.cpp
update: src/share/vm/utilities/bitMap.hpp
update: src/share/vm/utilities/taskqueue.hpp
Examined files: 2977
Contents Summary:
11 update
2966 no action (unchanged)