JDK-6692899 : CMS: many vm.parallel_class_loading tests fail with assert "missing Printezis mark"
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: hs11,hs12
  • Priority: P3
  • Status: Closed
  • Resolution: Fixed
  • OS: generic,solaris_9
  • CPU: generic,sparc
  • Submitted: 2008-04-23
  • Updated: 2011-03-08
  • Resolved: 2011-03-08
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 6 JDK 7 Other
6u18Fixed 7Fixed hs15Fixed
Related Reports
Duplicate :  
Relates :  
Relates :  
Relates :  
Description
#!/usr/bin/sh
#===============================================================================
# Generated rerun script
#===============================================================================
JAVA_HOME=/net/sqenfs-1.sfbay/export1/comp/vm/jdk/7/nightly/gc_baseline/linux-amd64
export JAVA_HOME

pwdir=`pwd`
#! sh
#
#

test_work_dir="/net/sqenfs-2.sfbay/export2/results/vm/gtee/JDK7/NIGHTLY/VM/2008-04-22/GC_Baseli
ne-Xinc/vm/linux-amd64/server/mixed/vm-linux-amd64_server_mixed_vm.parallel_class_loading.testl
ist2008-04-22-21-53-16/ResultDir/inner-simple_copy_1"
STRESS_OPTIONS=""
RAS_OPTIONS=""
JAVA="$JAVA_HOME/bin/java"
TEMP=""
JAVA_OPTS="-d64 -server -Xmixed -DHANGINGJAVA30148 -XX:-PrintVMOptions -Xincgc -XX:+CMSClassUnl
oadingEnabled"
TESTBASE="/net/sqenfs-1.sfbay/export1/tools/gtee/suites/6-vm/vm"
SEPARATOR=":"
TEST_ARGS=
LD_LIBRARY_PATH="$JAVA_HOME/jre/lib/amd64:$JAVA_HOME/jre/lib/amd64/server"
COMMON_LIBS_LOCATION="/net/sqenfs-1.sfbay/export1/tools/gtee/suites/6-vm/vm/bin"
test_name="inner-simple"
SHELL="/bin/ksh"
test_case_name="inner-simple"
CLASSPATH="/net/sqenfs-1.sfbay/export1/tools/gtee/suites/6-vm/vm/bin/classes:$JAVA_HOME/lib/too
ls.jar"
PATH="$JAVA_HOME/bin:/bin:/usr/bin"
DISPLAY="gtee.sfbay:0"
HOME="/import/gtee"
ARCH="linux-amd64"
PS=":"
SystemRoot=""
LIBJSIG_PATH="$JAVA_HOME/jre/lib/amd64/server/libjsig.so"
ROOTDIR=""
WINDIR=""
TIMEOUT="30"


#
export SHELL
export DISPLAY
export LIBJSIG_PATH
export SystemRoot
export TESTBASE
export RAS_OPTIONS
export HOME
export ROOTDIR
export LD_LIBRARY_PATH
export CLASSPATH
export TEMP
export WINDIR
export PATH
TEST_DEST_DIR="inner-simple_copy_1"
TESTNAME="${test_case_name}"
testName="runtime/ParallelClassLoading/stress-redefine/freeLock/loadClass//inner-simple"
TESTDIR="${test_work_dir}"
testWorkDir="${test_work_dir}/"
export testWorkDir
tlogOutFile="${test_work_dir}/${test_name}.tlog"
testErrFile="${test_work_dir}/${test_name}.err"
EXECUTE_CLASS="${test_name}"
EXECUTE_CLASS="runtime.ParallelClassLoading.shared.ClassLoadingController"
LD_LIBRARY_PATH="${COMMON_LIBS_LOCATION}/lib/${ARCH}/runtime/ParallelClassLoading/shared${SEPAR
ATOR}${LD_LIBRARY_PATH}"
export LD_LIBRARY_PATH
JAVA_OPTS="${JAVA_OPTS} -cp ${TESTBASE}/bin/classes -agentlib:redefineClasses"
ITERATIONS="100"
DEBUG="false"
THREADS="5"
HIERARCHY_CLASS_DIR="${TESTBASE}/bin/classes/runtime/ParallelClassLoading/shared/hierarchies/st
atic-load/static-init/inner/simple"
LOADING_CLASS="custom.C%"
PROVOKE_TYPE="loadClass"
FREE_LOCK_CLASS="custom.A.*"
TEST_ARGS="${TEST_ARGS} -ITERATIONS ${ITERATIONS} -DEBUG ${DEBUG} -THREADS_COUNT ${THREADS} -cl
assDir ${HIERARCHY_CLASS_DIR} -class ${LOADING_CLASS} -provoke ${PROVOKE_TYPE} -provoke newInst
ance -regexForFreeingLock ${FREE_LOCK_CLASS} ${STRESS_OPTIONS}"
APPLICATION_TIMEOUT="${TIMEOUT}"
CLASSPATH="${test_work_dir}${PS}${CLASSPATH}"
export CLASSPATH
${JAVA} ${JAVA_OPTS} ${EXECUTE_CLASS} ${TEST_ARGS}
# Test level exit status: 134
See, for example:-

http://sqeweb/nfs/results/vm/gtee/JDK7/NIGHTLY/VM/2008-04-22/GC_Baseline-Xinc/vm/linux-amd64/server/mixed/vm-linux-amd64_server_mixed_vm.parallel_class_loading.testlist2008-04-22-21-53-16/analysis.html#New_Failures

A specific example might be the following:-

http://sqeweb/nfs/results/vm/gtee/JDK7/NIGHTLY/VM/2008-04-22/GC_Baseline-Xinc/vm/linux-amd64/server/mixed/vm-linux-amd64_server_mixed_vm.parallel_class_loading.testlist2008-04-22-21-53-16/ResultDir/inner-simple_copy_1/hs_err_pid31634.log

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/tmp/jprt-jprtadm/temp/P1/B/120347.ap159146/source/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp:6134), pid=31634, tid=1079138656
#  Error: assert(_markBitMap.isMarked(addr) && _markBitMap.isMarked(addr + 1),"missing Printezis mark?")
#
# Java VM: OpenJDK 64-Bit Server VM (12.0-b02-2008-04-17-120347.ap159146.hotspot-permoom-6539517-fastdebug mixed mode linux-amd64)
# If you would like to submit a bug report, please visit:
#   http://java.sun.com/webapps/bugreport/crash.jsp
#

---------------  T H R E A D  ---------------

Current thread (0x00002aaaac6b7400):  ConcurrentGCThread [stack: 0x0000000040425000,0x0000000040526000] [id=31641]

Stack: [0x0000000040425000,0x0000000040526000],  sp=0x0000000040524810,  free space=1022k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x9c4592];;  _ZN7VMError14report_and_dieEv+0x262
V  [libjvm.so+0x3df4fe];;  _Z24report_assertion_failurePKciS0_+0x6e
V  [libjvm.so+0x3b3164];;  _ZNK12CMSCollector31block_size_using_printezis_bitsEP8HeapWord+0x154
V  [libjvm.so+0x3b385d];;  _ZNK12CMSCollector27next_card_start_after_blockEP8HeapWord+0x3d
V  [libjvm.so+0x3ac9a1];;  _ZN12CMSCollector19preclean_card_tableEP29ConcurrentMarkSweepGenerationP38ScanMarkedObjectsAgainCarefullyClosure+0x521
V  [libjvm.so+0x3aae26];;  _ZN12CMSCollector13preclean_workEbb+0x776
V  [libjvm.so+0x3a9f65];;  _ZN12CMSCollector8precleanEv+0x2a5
V  [libjvm.so+0x3a1f78];;  _ZN12CMSCollector21collect_in_backgroundEb+0xb88
V  [libjvm.so+0x3c2c80];;  _ZN25ConcurrentMarkSweepThread3runEv+0x3a0
V  [libjvm.so+0x7eb8ed];;  _Z10java_startP6Thread+0x16d
Dropped " on linux x86" from synopsis, since the failures described above
happen on almost all platforms (although perhaps more frequent
on x86); Added `assert "missing Printezis mark"' to synopsis.

Comments
EVALUATION http://hg.openjdk.java.net/jdk7/hotspot-gc/hotspot/rev/0af8b0718fc9
12-01-2009

SUGGESTED FIX This fixes one aspect of the assert to do with the interference between concurrent direct allocation in the old gen and the precleaning code that encounters an uparsable object but beats the allocator into acquiring the bit map lock to try and read the Printezis-bit, thus locking out the writer of those bits (i.e. the allocating thread):- --- a/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration .cpp Wed Aug 20 23:05:04 2008 -0700 +++ b/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration .cpp Mon Aug 25 16:01:17 2008 -0700 @@ -4560,11 +4560,11 @@ size_t CMSCollector::preclean_mod_union_ if (!dirtyRegion.is_empty()) { assert(numDirtyCards > 0, "consistency check"); HeapWord* stop_point = NULL; + stopTimer(); + CMSTokenSyncWithLocks ts(true, gen->freelistLock(), + bitMapLock()); + startTimer(); { - stopTimer(); - CMSTokenSyncWithLocks ts(true, gen->freelistLock(), - bitMapLock()); - startTimer(); verify_work_stacks_empty(); verify_overflow_empty(); sample_eden(); @@ -4581,10 +4581,6 @@ size_t CMSCollector::preclean_mod_union_ assert((CMSPermGenPrecleaningEnabled && (gen == _permGen)) || (_collectorState == AbortablePreclean && should_abort_preclean()), "Unparsable objects should only be in perm gen."); - - stopTimer(); - CMSTokenSyncWithLocks ts(true, bitMapLock()); - startTimer(); _modUnionTable.mark_range(MemRegion(stop_point, dirtyRegion.end())); if (should_abort_preclean()) { break; // out of preclean loop The following diff might be useful during further debugging of this bug but, obviously should be removed before checking in the changes:- @@ -6130,8 +6124,15 @@ void CMSCollector::verify_ok_to_terminat #endif size_t CMSCollector::block_size_using_printezis_bits(HeapWord* addr) const { - assert(_markBitMap.isMarked(addr) && _markBitMap.isMarked(addr + 1), - "missing Printezis mark?"); +// assert(_markBitMap.isMarked(addr) && _markBitMap.isMarked(addr + 1), +// "missing Printezis mark?"); + if (!(_markBitMap.isMarked(addr) && _markBitMap.isMarked(addr + 1))) { + tty->print_cr("Missing Printezis bits for " PTR_FORMAT, addr); + tty->print_cr("_markBitMap.isMarked(addr)=%d, _markBitMap.isMarked(addr+1)=%d", + _markBitMap.isMarked(addr), _markBitMap.isMarked(addr+1)); + oop(addr)->print(); + assert(false, "Missing Printezis mark!!!"); + } HeapWord* nextOneAddr = _markBitMap.getNextMarkedWordAddress(addr + 2); size_t size = pointer_delta(nextOneAddr + 1, addr); assert(size == CompactibleFreeListSpace::adjustObjectSize(size),
25-08-2008

WORK AROUND When using class redefinition we need to (temporarily) disable CMS Perm Gen Precleaning: -XX:-CMSPermGenPrecleaningEnabled . This prevents CMS from incorrectly trying to parse unparsable class pools (and perhaps other class redefinition related structures) while they are under construction. The right thing to do is to suitably strengthen the associated is_parsable() methods which may have bit-rotted wrt class redefinition.
28-04-2008

EVALUATION With the above mods we still have a few issues that appear directly related to the class redefinition being done in these tests. If the class redefinition is switched off, the tests run fine. I have a suspicion that class redifinition is being done in such a way as to interfere with concurrent CMS activities. Will need to understand class redefinition steps thoroughly to understand the extent of this interference. Meanwhile, as a workaround to this problem, CMS activity should be locked out during class redefinition. A simple workaround along those lines will be attempted as a temporary workaround.
26-04-2008

EVALUATION P-bits may be missing when CMSCollector::next_card_start_after_block() is called. Will ensure, via appropriately coarse locking that that is not the case. Will check which older releases may be prone to this problem and file sub-CR's.
25-04-2008