The Version table provides details related to the release that this issue/RFE will be addressed.
Unresolved : Release in which this issue/RFE will be addressed. Resolved: Release in which this issue/RFE has been resolved. Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.
EVALUATION
Fixed in 6u3 b02. See parent CR for related CR's.
17-07-2007
SUGGESTED FIX
Event: putback-to
Parent workspace: /net/jano2.sfbay/export2/hotspot/ws/1.6/update3/baseline
(jano2.sfbay:/export2/hotspot/ws/1.6/update3/baseline)
Child workspace: /net/prt-web.sfbay/prt-workspaces/20070717115529.ysr.hx3/workspace
(prt-web:/net/prt-web.sfbay/prt-workspaces/20070717115529.ysr.hx3/workspace)
User: ysr
Comment:
---------------------------------------------------------
Job ID: 20070717115529.ysr.hx3
Original workspace: karachi:/net/spot/workspaces/ysr/hx3
Submitter: ysr
Archived data: /net/prt-archiver.sfbay/data/archived_workspaces/2007/20070717115529.ysr.hx3/
Webrev: http://prt-web.sfbay.sun.com/net/prt-archiver.sfbay/data/archived_workspaces/2007/20070717115529.ysr.hx3/workspace/webrevs/webrev-2007.07.17/index.html
Fixed 6558100: CMS crash when -XX:+ParallelRefProcEnabled is set
Partial 6572569: CMS: consistently skewed work distribution indicated in (long) re-mark pauses
http://analemma.sfbay/net/spot/workspaces/ysr/hx3/webrev
Approved for 6u3b02 by HotSpot P-Team (Penni Henry)
(6558100)
When CMS marking (either during parallel rescan or parallel reference processing)
runs out of space on the per-worker work queues, the overflown grey objects
are tracked by chaining through their mark word. In this case, we had two
bugs: firstly, the method that took a prefix of the overflow list was not
re-attaching the intended suffix correctly (this affects all JVM's going
back to 1.4.2_14); secondly, the parallel reference processing code was
entirely neglecting to process the overflow list (this affects JVM's going
back to 5.0). The crucial debugging breakthrough came when Poonam used
the SA to track down the objects that CMS remark was declaring as
unreachable but unmarked, and found that they occurred in long chains
linked via their mark word (but with the promoted bit not set, which
helped distinguish them from the promoted chains that ParNew uses, and
identified them as broken fragments of an erstwhile overflow list).
Many thanks to Poonam Bajaj and Thomas Viessmann for crucial
debugging help. The customer has since run with a version of 6u2
with the fix (thanks Poonam) and verified that the previous crash
does not reproduce in > 2 days (previously the crash would happen in
about 4 hours).
Some debugging code was added as well as some asserts relaxed
to allow for the possibility of examining an object lying at the end
of the overflow list. This latter issue will be more thoroughly revisited
and cleaned up under a separate bug id.
(6572569)
When CMSScavengeBeforeRemark is set, we were assuming that a scavenge
would have necessarily preceded a remark and that therefore the heap
would already be in a parsable state. However, it is possible that
the scavenge may not have been done because, for instance, a JNI
critical section was held. The main CR here will need other work to
deal with the issue found at the customer, but this is a fix for
the problem with CMSScavengeBeforeRemark which is a temporary workaround
to this customer's performance issue as described in the bug report.
Thanks to Chris Phillips for testing and backport help with 5uXX where
the problem manifested most readily.
Reviewed by: Jon Masamitsu & Andrey Petrusenko
Fix Verified: y
Verification Testing:
6558100: GCBasher on CMS with CMSMarkStackOverflowALot enabled
6572569: GCBasher on CMS with CMSScavengeBeforeRemark & no survivor spaces
Other testing:
PRT (also with CMS stress options)
refworkload, runThese -quick and -testbase
Note added in proof: Some late breaking big apps testing using the
stress flags yesterday revealed an as-yet-undiagnosed issue when
running Tomcat and ATG. Thanks to Ashwin for finding this issue,
which is being tracked under CR 6578335.
Files:
update: src/share/vm/gc_implementation/concurrentMarkSweep/compactibleFreeListSpace.cpp
update: src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp
update: src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.hpp
Examined files: 4209
Contents Summary:
3 update
4206 no action (unchanged)