Bug ID: JDK-6883834 ParNew: assert(!_g->to()->is_in_reserved(obj),"Scanning field twice?") with LargeObjects tests

JDK-6883834 : ParNew: assert(!_g->to()->is_in_reserved(obj),"Scanning field twice?") with LargeObjects tests

Type: Bug
Component: hotspot
Sub-Component: gc
Affected Version: hs17,hs19,6,7

Priority: P2
Status: Closed
Resolution: Fixed
OS: generic,linux_redhat_3.0,solaris_7
CPU: generic,x86,sparc

Submitted: 2009-09-19
Updated: 2013-06-22
Resolved: 2011-07-18

Versions (Unresolved/Resolved/Fixed)

The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.

JDK 6	JDK 7	Other
6u60Fixed	7Fixed	hs21Fixed

Related Reports

Duplicate :	JDK-6997046 - assert(!_g->to()->is_in_reserved(obj)) failed: Scanning field twice?
Duplicate :	JDK-7035489 - (i)CMS: VM crashes with SIGSEGV during StringTable
Relates :	JDK-7036837 - cardtable interface and class hierarchy simplification
Relates :	JDK-7037276 - Unnecessary double traversal of dirty card windows

Description

When running the test gc/gctests/LargeObjects/large003 on linux with a 64-bit fastdebug VM, the following assert fails intermittently (about 1/3 of the time, or less):

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/export/ws/hs/hotspot-gc/src/share/vm/gc_implementation/parNew/parOopClosures.inline.hpp:73), pid=11422, tid=1075845440
#  Error: assert(!_g->to()->is_in_reserved(obj),"Scanning field twice?")
#
# JRE version: 7.0-b63
# Java VM: Java HotSpot(TM) 64-Bit Server VM (17.0-b01-hotspot-gc.20090915100844-fastdebug mixed mode linux-amd64 compressed oops)
# If you would like to submit a bug report, please visit:
#   http://java.sun.com/webapps/bugreport/crash.jsp
#

The above VM is a local build of of the openjdk hotspot-gc/hotspot repo at this revision:

changeset:   944:68ef3fdcdb76 
tag:         tip 
user:        ysr 
date:        Thu Sep 10 16:46:17 2009 -0700 
summary:     6872136: CMS: confusing message may be printed when a collector is switched off implicitly 

which is essentially jdk7-b72.

Full hs_err log is attached.  The attached large003.tlog file can be edited and used to reproduce the failure.
gc/gctests/LargeObjects/large005 also fails in the same way.
Failing tests:-

gc/gctests/LargeObjects/large003
gc/gctests/LargeObjects/large004
gc/gctests/LargeObjects/large005
Fails on 32-bit too (see entry 5 of "Comments" for link); so removed "64-bit" from
Synopsis. Also added "CMS:" prefix to synopsis.
Also gc/gctests/LargeObjects/large001.

See: http://sqeweb.sfbay.sun.com/nfs/tools/gtee/results/JDK7/NIGHTLY/VM/2010-09-07/Main_Baseline/vm/solaris-sparc/client/mixed/solaris-sparc_vm_client_mixed_vm.gc.testlist/ResultDir/large003/hs_err_pid29094.log

This fails on sparis/sparc with -client, so i am changing the synopsis
accordingly.
gc.gctests.LargeObjects.large001.large001
gc/gctests/LargeObjects/large003
gc/gctests/LargeObjects/large005
Dropped "CMS" from synopsis, seeing as to how this is a ParNew bug
which does not need CMS for it to manifest.

Comments

EVALUATION http://hg.openjdk.java.net/jdk7/hotspot/hotspot/rev/fc2b798ab316
13-05-2011
EVALUATION http://hg.openjdk.java.net/jdk7/hotspot-gc/hotspot/rev/fc2b798ab316
10-05-2011
SUGGESTED FIX Here's a minimal fix, which preserves the erstwhile lack of scaling of the original algorithm:- cr.openjdk.java.net/~ysr/6883834/webrev.3 A more scalable implementation will be the subject of a future RFE (TBF).
05-05-2011
EVALUATION Since this is a day-one bug, update releases might want to pick up this fix, once it has undergone adequate testing in the current release. I'll file a subCR for 6-pool to be backported at the discretion of sustaining.
04-05-2011
EVALUATION A minimal fix has been implemented which fixes the correctness problem. Performance and scalability improvements are deferred to a later RFE post-JDK-7.
04-05-2011
EVALUATION Looks like a day-one (of ParNew) bug, which was shy until recently -- it became bolder (with certain tests or apps involving classes with large constant pools) due to the transoceanic (read trans-generational) migration of interned strings.
05-04-2011
SUGGESTED FIX Fix the large dirty non-array object spliiting protocol so that each of its dirty parts is scanned precisely once (no more and no less).
05-04-2011
WORK AROUND -XX:+JavaObjectsInPerm avoids this problem by moving the "offending" reference back into the perm gen so that the appropriate cards do not contain references that need to be relocated. However, this does not eliminate the more fundamental underlying bug in the parallel card scanning in general.
04-04-2011
EVALUATION There appears to be a subtle bug in the parallel card-scanning code because of which a very large non-array object (which could be imprecisely marked) spanning multiple work chunks and with a dirty card in a chunk other than the one in which the object starts ends up being scanned more than once. The precise shape of the fix is not yet known but, because this code is shared between several collectors, it is likely to be subtle and non-trivial, and will require extensive testing and correctness proofs (given how this bug has been in the product for apparently so long without having been noticed).
04-04-2011
WORK AROUND For some as yet not completely clear reason, -XX:+JavaObjectsInPerm seems to avoid the assertion violation. This might however just be papering over the problem which might still lie in the GC rather than in the runtime/JVM.
30-03-2011