United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
JDK-6722116 : CMS: Incorrect overflow handling when using parallel concurrent marking

Details
Type:
Bug
Submit Date:
2008-07-03
Status:
Resolved
Updated Date:
2011-12-17
Project Name:
JDK
Resolved Date:
2008-10-07
Component:
hotspot
OS:
generic,linux_redhat_5.2
Sub-Component:
gc
CPU:
generic,unknown
Priority:
P3
Resolution:
Fixed
Affected Versions:
hs14,6u13
Fixed Versions:
hs14 (b06)

Related Reports
Backport:
Backport:
Backport:
Backport:
Backport:
Duplicate:
Relates:
Relates:
Relates:
Relates:
Relates:

Sub Tasks

Description
Here's a description from the Evaluation field of 6578335 during the investigation
of which this problem was first discovered:-

There was a third bug found which relates to the handling of
"second ring overflow" when using parallel concurrent marking
-- the overflow of the global overflow stack (which itself handles
the overflow from the local work queues). The intention was
that this second ring overflow should use the "restart mechanism"
to restart marking from the least overflown address.
That mechanism was not completely extended to the parallel
concurrent marking case. The restart_addr was not pushed
all the way through to the parallel concurrent marking task that controls
the parallel concurrent marking. Because of the partial
change to the state of the parallel concurrent marking task,
we can and often will end up missing the scan of some of the
addresses at the higher extremes of the CMS-collected
generations. Because second-ring overflow is a very rare
event in practice, this appears to have not been detected
before (or at least not until the first two bugs mentioned
above were moved out of our way).

The obvious workaround is to switch off parallel concurrent
marking via -XX:-CMSConcurrentMTEnabled.

                                    

Comments
SUGGESTED FIX

changeset:   301:ebeb6490b814
parent:      299:387a62b4be60
user:        ysr
date:        Tue Aug 26 14:54:48 2008 -0700
summary:     6722116: CMS: Incorrect overflow handling when using parallel concurrent marking
                                     
2008-08-28
EVALUATION

http://hg.openjdk.java.net/jdk7/hotspot-gc/hotspot/rev/ebeb6490b814
                                     
2008-08-27
SUGGESTED FIX

The following is under testing and review:

http://analemma.sfbay.sun.com/net/neeraja/export/ysr/cms_cmt/webrev/
                                     
2008-08-22
WORK AROUND

-XX:-CMSConcurrentMTEnabled.

Otherwise, increasing the size of the marking stack via -XX:CMSMarkingStackSize{,Max}
would reduce the probability of hitting this bug.
                                     
2008-07-03
EVALUATION

This is a bug since 6.0 when parallel concurrent marking was first introduced.
Because this involves not the first, but the second level of overflow,
it's much less frequent (other than under really high stress
conditions), and so customers are not likely to run into this
very frequently (i think).
                                     
2008-07-03
SUGGESTED FIX

The restart_addr should be pushed down into the state of the
ConcurrentMarkingTask and, further, the do_scan_and_mark()
methods should use that restart address as the starting point
when re-starting their marking work following an overflow-and-restart
(rather than the bottom of the space as currently done).
                                     
2008-07-03



Hardware and Software, Engineered to Work Together