Bug ID: JDK-6324141 CMS: nightly testing asserts during concurrent marking

JDK-6324141 : CMS: nightly testing asserts during concurrent marking

Type: Bug
Component: hotspot
Sub-Component: gc
Affected Version: 6

Priority: P2
Status: Resolved
Resolution: Fixed
OS: generic,solaris_10
CPU: generic,x86

Submitted: 2005-09-14
Updated: 2010-08-19
Resolved: 2005-10-27

Versions (Unresolved/Resolved/Fixed)

The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.

JDK 6
6 b58Fixed

Related Reports

Duplicate :	JDK-6334261 - assert(s > 0,"Bad size calculated") with -XX:+UseConcMarkSweep
Relates :	JDK-6574315 - Compact_InternedStrings fails with "missing Printezis mark?"

Description

There have been several assertions in concurrent marking code that was recently touched.

Specifically, where previously we would defer scanning of objects
allocated since the start of marking to the precleaning phase,
we now do such scanning more eagerly. This was an enabling step
towards the CMS "clean on enter" optimization, which, by the way
has been enabled in the single-threaded case.

We have since then seen assertions in related code.

Here are some instances of these assertions:
--------------------------------------------------
http://vmsqe.sfbay/nightly/mantis/DTWS/results/09-11-05/ClientVM/Solsparc/mixed/Gc_Baseline-Xconc/RT_PLUMHALL-NIGHTLY-Gc_Baseline-Xconc-ClientVM-mixed-Solsparc-2005-09-11-21-25-03/log

# To suppress the following error report, specify this argument
# after -XX: or in .hotspotrc:
SuppressErrorAt=/concurrentMarkSweepGeneration.cpp:6414]
#
# An unexpected error has been detected by HotSpot Virtual Machine:
#
#  Internal Error
(/net/prt-solsparc-q1-18/tmp/PrtBuildDir/workspace/src/share/vm/memory/concurrentMarkSweepGeneration.cpp,
6414 [ Patched ]), pid=8109, tid=6
#
# Java VM: Java HotSpot(TM) Client VM
(20050908154901.jcoomes.gc_merge-debug mixed mode)
#
# Error: assert(_finger > ptr,"we just incremented it above")
# An error report file with more information is saved as hs_err_pid8109.log
#
# If you would like to submit a bug report, please visit:
#   http://java.sun.com/webapps/bugreport/crash.jsp
#

---------------------------------------------------------------------------------
# Error: assert(_finger > ptr,"we just incremented it above")

# Error: assert(_markBitMap.isMarked(addr) && _markBitMap.isMarked(addr
+ 1),"missing Printezis mark?")

# An unexpected error has been detected by HotSpot Virtual Machine:
#
#  Internal Error (/net/prt-solsparc-q1-10/tmp/PrtBuildDir/workspace/src/share/vm/memory/concurrentMarkSweepGe
neration.cpp, 5759 [ Patched ]), pid=19697, tid=9
#
# Java VM: Java HotSpot(TM) Client VM (20050920144303.ysr.MT-debug mixed mode)
#
# Error: assert(_markBitMap.isMarked(addr) && _markBitMap.isMarked(addr + 1),"missing Printezis mark?")

---------------  T H R E A D  ---------------

Current thread (0x000f8f00):  GCTaskThread [id=9]

Stack: 
[error occurred during error reporting, step 110, id 0xe0000000]

Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0xd3c61c];;  __1cHVMErrorOreport_and_die6M_v_+0x7d8
V  [libjvm.so+0x422870];;  __1cYreport_assertion_failure6Fpkci1_v_+0x6c
V  [libjvm.so+0xd3b5f4];;  __1cHVMErrorGreport6MpnMoutputStream__v_+0x510
V  [libjvm.so+0xd3c61c];;  __1cHVMErrorOreport_and_die6M_v_+0x7d8
V  [libjvm.so+0x422870];;  __1cYreport_assertion_failure6Fpkci1_v_+0x6c
V  [libjvm.so+0x3e7480];;  __1cMCMSCollectorbFblock_size_using_printezis_bits6kMpnIHeapWord__I_+0x198
V  [libjvm.so+0x391ac4];;  __1cYCompactibleFreeListSpaceKblock_size6kMpnIHeapWord_pnMCMSCollector__I_+0x400
V  [libjvm.so+0x3d7da4];;  __1cSCMSConcMarkingTaskQdo_scan_and_mark6MipnYCompactibleFreeListSpace__v_+0x1f4
V  [libjvm.so+0x3d74a4];;  __1cSCMSConcMarkingTaskEwork6Mi_v_+0x2d8
V  [libjvm.so+0xd73468];;  __1cbAYieldingFlexibleGangWorkerEloop6M_v_+0x10c
V  [libjvm.so+0xada7c0];;  java_start+0x14c

Comments

SUGGESTED FIX The following fixes the balance of this bug: Event: putback-to Parent workspace: /net/jano.sfbay/export/disk05/hotspot/ws/main/gc_baseline (jano.sfbay:/export/disk05/hotspot/ws/main/gc_baseline) Child workspace: /net/prt-web.sfbay/prt-workspaces/20051020145817.ysr.perm/workspace (prt-web:/net/prt-web.sfbay/prt-workspaces/20051020145817.ysr.perm/workspace) User: ysr Comment: --------------------------------------------------------- Job ID: 20051020145817.ysr.perm Original workspace: neeraja:/net/spot/archive02/ysr/perm Submitter: ysr Archived data: /net/prt-archiver.sfbay/data/archived_workspaces/main/gc_baseline/2005/20051020145817.ysr.perm/ Webrev: http://analemma.sfbay.sun.com/net/prt-archiver.sfbay/data/archived_workspaces/main/gc_baseline/2005/20051020145817.ysr.perm/workspace/webrevs/webrev-2005.10.20/index.html Fixed 6324141: CMS: nightly testing asserts during concurrent marking http://analemma.sfbay/net/spot/archive02/ysr/perm/webrev For CMS (precleaning and concurrent parallel marking to work correctly), we need the following essential properties: (1) every card mark corresponding to an oop update must strictly follow the update (2) an allocated but not yet fully uninitialized object should be recognizable as ``unparsable'' when it's in a state where we cannot safely iterate over all of its (already-)oop-containing fields. For example, such a window of vulnerability is present between the installation of the klass pointer and the subsequent installation of the C++ vtbl pointer in perm gen objects. (3) unparsable oops should not be published in parsable objects (4) all card marks for oop updates should have been made before a safepoint is allowed (5) all allocated objects should be safely parsable at a safepoint The changes in this putback enforce the first three constraints in the handful of places where this was previously not the case. We also added assertions to check for the last constraint (i.e. that safepoints cannot happen while objects are not safely parsable). Since these changes also fix the previous problems (CR 5040363/4975054) with precleaning of the perm gen, we are also enabling perm gen precleaning in this putback. We piggybacked the conversion of CMSScavengeBeforeRemark into a product flag in this putback since it might turn out to be useful for EBay like performance tuning when running with large Edens. Fix Verified: yes Verification Testing: runThese -full fastdebug and product on {solaris/amd64, solaris/sparcv9, linux/amd64}. Other testing: {cloudscape, ATG, GCBasher, refWorkload} on {6X,24X}sparcv9/solaris. prt, refworkload (no change in performance) Reviewed by: John Coomes, Tom Rodriguez Approved by: Karl Jense / Mustang Core Team (Low risk) Files: update: src/share/vm/includeDB_ci update: src/share/vm/includeDB_core update: src/share/vm/gc_interface/collectedHeap.inline.hpp update: src/share/vm/memory/concurrentMarkSweepGeneration.cpp update: src/share/vm/oops/arrayKlass.cpp update: src/share/vm/oops/arrayKlass.hpp update: src/share/vm/oops/arrayKlassKlass.cpp update: src/share/vm/oops/constMethodKlass.cpp update: src/share/vm/oops/constMethodKlass.hpp update: src/share/vm/oops/constMethodOop.cpp update: src/share/vm/oops/constMethodOop.hpp update: src/share/vm/oops/constantPoolKlass.cpp update: src/share/vm/oops/constantPoolOop.hpp update: src/share/vm/oops/instanceKlassKlass.cpp update: src/share/vm/oops/klass.hpp update: src/share/vm/oops/klassVtable.cpp update: src/share/vm/oops/methodDataKlass.cpp update: src/share/vm/oops/methodKlass.cpp update: src/share/vm/oops/objArrayKlass.cpp update: src/share/vm/oops/oop.inline.hpp update: src/share/vm/oops/symbolKlass.cpp update: src/share/vm/oops/typeArrayKlass.cpp update: src/share/vm/runtime/globals.hpp Examined files: 3709 Contents Summary: 23 update 3686 no action (unchanged)
24-10-2005
SUGGESTED FIX http://analemma.sfbay.sun.com/net/spot/archive02/ysr/perm/webrev/ for fixes (under review and testing).
05-10-2005
SUGGESTED FIX Event: putback-to Parent workspace: /net/jano.sfbay/export/disk05/hotspot/ws/main/gc_baseline (jano.sfbay:/export/disk05/hotspot/ws/main/gc_baseline) Child workspace: /net/prt-web.sfbay/prt-workspaces/20050928224532.ysr.MT/workspace (prt-web:/net/prt-web.sfbay/prt-workspaces/20050928224532.ysr.MT/workspace) User: ysr Comment: --------------------------------------------------------- Original workspace: karachi:/net/spot/scratch/ysr/MT Submitter: ysr Archived data: /net/prt-archiver.sfbay/data/archived_workspaces/main/gc_baseline/2005/20050928224532.ysr.MT/ Webrev: http://analemma.sfbay.sun.com/net/prt-archiver.sfbay/data/archived_workspaces/main/gc_baseline/2005/20050928224532.ysr.MT/workspace/webrevs/webrev-2005.09.29/index.html Partial 6324141: CMS: nightly testing asserts during concurrent marking http://analemma.sfbay/net/spot/scratch/ysr/MT/webrev This is a regression introduced as a result of recent work on parallel concurrent marking. The problem was that a (parallel) concurrent marker thread working on a chunk needs to start its scan of the marking bitmap at a block boundary so as to be able to tell a mark bit from a Printezis bit (which is not at a block boundary). In order to do so, it would look up the block offset table to find a block boundary and then walk the blocks forward into the chunk to get to the first block starting in the chunk of interest. Unfortunately. as explained in my earlier email, it's possible for blocks to declare themselves as "not free" (i.e. allocated) yet not have the P-bits set. In this case, we cannot determine a size for the block. Our solution is to recognize this situation and let the bitmap scan begin at such a "failure point" (if you will), which might scan more of the bitmap than strictly necessary. A more efficient solution might be to ensure that blocks that have not yet had their P-bits set continue to show the correct block size (and to read said block size via an appropriate interlocked read of the size between the P-bits on a TSO machine). However, that latter solution would have required a close audit of a lot of allocation code, and is deferred to the future. The current fix is relatively low-risk in that it touches only new code and does not affect any existing code in non-parallel concurrent marking. This fix is marked "partial" because the bug report includes two assertion violations, and this fix is for one of those two. The second one is under investigation. Reviewed by: John Coomes Fix Verified: yes Verification Testing: RT_PLUMHALL and RT_QUICK w/classunloading enabled in CMS [fastdebug and product] Other testing: prt, refworkload Files: update: src/share/vm/memory/compactibleFreeListSpace.cpp update: src/share/vm/memory/compactibleFreeListSpace.hpp update: src/share/vm/memory/concurrentMarkSweepGeneration.cpp update: src/share/vm/memory/concurrentMarkSweepGeneration.hpp Examined files: 3706 Contents Summary: 4 update 3702 no action (unchanged)
26-09-2005
EVALUATION See comment#2.
22-09-2005