United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
JDK-4907039 : Gc tests crashes on linux-ia64 at concurrentMarkSweepGeneration.cpp with -Xcongc

Details
Type:
Bug
Submit Date:
2003-08-14
Status:
Resolved
Updated Date:
2004-03-23
Project Name:
JDK
Resolved Date:
2003-11-14
Component:
hotspot
OS:
generic,solaris_10
Sub-Component:
gc
CPU:
itanium,generic
Priority:
P2
Resolution:
Fixed
Affected Versions:
1.4.2,1.4.2_04,5.0
Fixed Versions:
5.0 (b18)

Related Reports
Backport:

Sub Tasks

Description
Test failed : 

$TESTBASE/testbase/testbase_vm/src/misc/gc/gctests/MTasyncGC MTasyncGC
$TESTBASE/testbase/testbase_vm/src/misc/gc/gctests/LoadUnloadGC classes.LoadUnloadGC

VM         : Server VM
Mode       : Xcomp, -Xint, -Xmixed
Platform   : linux-ia64 (siliconium, vanadium2, qeia2two) 
		try siliconium fails within minutes.
JDK failed : JDK1.5-build_14 with build15 PIT binaries

To reproduce : 
* execute rerun.sh at /net/jano.sfbay/export/disk20/GammaBase/Bugs/{BugID}


 Error : 

......................................
[Enter:classes.LoadUnloadGC]
......................................
[Enter:MTasyncGC]
#Passed; MTasyncGC; (1,1,0,0,0)
[Exit:MTasyncGC]
# To suppress the following error report, specify this argument
# after -XX: or in .hotspotrc:  SuppressErrorAt=/concurrentMarkSweepGeneration.cpp:4422
#
# An unexpected error has been detected by HotSpot Virtual Machine:
#
#  Internal Error (/net/cocoa.east/export/home/main_baseline/src/share/vm/memory/concurrentMarkSweepGeneration.cpp, 4422), pid=24083, tid=1026
#
# Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug_200308071842-debug interpreted mode)
#
# Error: assert(!_parallel || _mark_stack->isEmpty(),"pre-condition (eager drainage)")
# An error report file with more information is saved as hs_err_pid24083.log
# To suppress the following error report, specify this argument
# after -XX: or in .hotspotrc:  SuppressErrorAt=/thread.hpp:390
#
# If you would like to submit a bug report, please visit:
#   http://java.sun.com/cgi-bin/bugreport.cgi
#
Current thread is 0x402
Dumping core ...

                                    

Comments
PUBLIC COMMENTS

verified on tiger beta2 b43

###@###.### 2004-03-23
                                     
2004-03-23
SUGGESTED FIX



###@###.### 2003-08-25: fix putback to gc_baseline/Tiger:

Subject: Code Manager notification (putback-to)
   Date: Mon, 25 Aug 2003 15:14:40 -0700 (PDT)
   From: "Y. S. Ramakrishna" <###@###.###>
     To: ###@###.###

Event:            putback-to
Parent workspace: /net/jano/export/disk05/hotspot/ws/main/gc_baseline
                  (jano:/export/disk05/hotspot/ws/main/gc_baseline)
Child workspace:  /export/imgr_home/ws/20030825104821.ysr.prbug
                  (balvenie:/export/imgr_home/ws/20030825104821.ysr.prbug)
User:             ysr

Comment:
Original workspace:  neeraja:/net/jano.sfbay/export/hotspot/users1/ysr/prbug
Parent workspace:    /net/jano/export/disk05/hotspot/ws/main/gc_baseline
Submitter:           ysr
imgr data:           /net/balvenie.sfbay/export/imgr_home/archive/main/gc_baseline/2003/20030825104821.ysr.prbug

4907039 GC test crashes on linux-ia64 at concurrentMarkSweepGeneration.cpp with -Xconcgc

Webrev: http://analemma.sfbay/net/jano/export/hotspot/users1/ysr/prbug/webrev

The problem was that the ScanMarkedObjectsAgainClosure
which is used to rescan the objects on dirty cards at the
"final checkpoint" had gotten the use of the _parallel boolean
field backwards in the assert. As it turns out, there was a
potentially more pernicious bug, in that the use of the _parallel
boolean was also backwards in the code that uses it to decide whether
an embedded closure is a single-threaded closure or its
multi-threaded avatar. Thanks to Jon Masamitsu for finding
this bug via inspection. Our concern, then, was that this would
cause errors because of the incorrect type cast implicit in the
use of the union that holds the embedded closure. However,
providing direct refutation of the adage, we found that two
wrongs do indeed make a right, or at least a "lesser wrong"
-- running with the nominally incorrect code revealed that,
due to a performance bug because of which we had failed to
"devirtualize/specialize" all the way to the final target,
we ended up calling the correct do_oop() method of the
actually embedded closure, rather than that of
the type it had been incorrectly cast to in the earlier step.
This was because although we had specialized/devirtualized
the oop_iterate() methods for the two closures involved
in the above code all the way to the do_oop_nv() method,
the do_oop_nv() methods themselves were virtually dispatching
to the do_oop() methods which did the work:

  void ClosureType::do_oop_nv() { do_oop(); }

Thanks to John Coomes for figuring this out. This was
fixed by using fully qualified method names at
this and a handful of other places (all in CMS) where
we had not intended virtual dispatch.

The elimination of the virtual dispatch from these
do_oop() methods resulted in no overall performance change.

I also made a temporary fix for a silent stack overflow bug
(found independently while working on parallel reference processing
and running an SQE GC test, FinalizerGCXX) in parallel remark by:
 . increasing the size of the work-queue to reduce the
   probability of task queue overflow
 . noticing when task queues overflow and aborting with
   an error.
A more complete fix for this latter problem will appear under
4615723.

Reviewed by: Jon Masamitsu, John Coomes

Fix verified: y
Verification testing:
  /net/jano.sfbay/export/disk20/GammaBase/Bugs//4907039/rerun.sh
    on linux/ia64

Other testing:
 imgr: all platforms, CMS, +/-CMSParallelRemarkEnabled
 refWorkload: sparc/solaris/c2/CMS
 volano test 24 hours: sparc/solaris/cs/CMS

passed    linux   i486    product       SPECjvm98    GeoMean   37.70    58.25
passed    linux   i486    product1      SPECjvm98    GeoMean   44.51    49.38
passed    linux   i486    productcore   SPECjvm98    GeoMean   97.05    97.05
passed    linux   ia64    product       SPECjvm98    GeoMean   23.06    37.25
passed    solaris i486    product       SPECjvm98    GeoMean   36.94    57.47
passed    solaris i486    product1      SPECjvm98    GeoMean   44.92    49.57
passed    solaris i486    productcore   SPECjvm98    GeoMean  104.45   104.45
passed    solaris sparc   product       SPECjvm98    GeoMean   22.58    32.94
passed    solaris sparc   product1      SPECjvm98    GeoMean   22.70    24.85
passed    solaris sparc   productcore   SPECjvm98    GeoMean   40.32    40.32
passed    solaris sparcv9 product       SPECjvm98    GeoMean   23.70    35.07
passed    solaris sparcv9 productcore   SPECjvm98    GeoMean   41.92    41.92
passed    windows i486    compiler2     SPECjvm98    GeoMean   45.50    97.59
passed    windows i486    compiler1     SPECjvm98    GeoMean   57.82    73.25
passed    windows i486    core          SPECjvm98    GeoMean  114.07   114.07
passed    windows ia64    core          SPECjvm98    GeoMean   15.86    15.86

Files:
update: src/share/vm/memory/concurrentMarkSweepGeneration.cpp
update: src/share/vm/memory/genOopClosures.hpp
update: src/share/vm/utilities/taskqueue.hpp

Examined files: 2854

Contents Summary:
       3   update
    2851   no action (unchanged)

                                     
2004-06-11
EVALUATION

See suggested fix section.
                                     
2004-06-11
CONVERTED DATA

BugTraq+ Release Management Values

COMMIT TO FIX:
tiger

FIXED IN:
tiger

INTEGRATED IN:
tiger-b18


                                     
2004-06-14



Hardware and Software, Engineered to Work Together