JDK-4907039 : Gc tests crashes on linux-ia64 at concurrentMarkSweepGeneration.cpp with -Xcongc
  • Type: Bug
  • Status: Resolved
  • Resolution: Fixed
  • Component: hotspot
  • Sub-Component: gc
  • Priority: P2
  • Affected Version: 1.4.2,1.4.2_04,5.0
  • OS: generic,solaris_10
  • CPU: generic,itanium
  • Submit Date: 2003-08-14
  • Updated Date: 2004-03-23
  • Resolved Date: 2003-11-14
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availabitlity Release.

To download the current JDK release, click here.
Test failed : 

$TESTBASE/testbase/testbase_vm/src/misc/gc/gctests/MTasyncGC MTasyncGC
$TESTBASE/testbase/testbase_vm/src/misc/gc/gctests/LoadUnloadGC classes.LoadUnloadGC

VM         : Server VM
Mode       : Xcomp, -Xint, -Xmixed
Platform   : linux-ia64 (siliconium, vanadium2, qeia2two) 
		try siliconium fails within minutes.
JDK failed : JDK1.5-build_14 with build15 PIT binaries

To reproduce : 
* execute rerun.sh at /net/jano.sfbay/export/disk20/GammaBase/Bugs/{BugID}

 Error : 

#Passed; MTasyncGC; (1,1,0,0,0)
# To suppress the following error report, specify this argument
# after -XX: or in .hotspotrc:  SuppressErrorAt=/concurrentMarkSweepGeneration.cpp:4422
# An unexpected error has been detected by HotSpot Virtual Machine:
#  Internal Error (/net/cocoa.east/export/home/main_baseline/src/share/vm/memory/concurrentMarkSweepGeneration.cpp, 4422), pid=24083, tid=1026
# Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug_200308071842-debug interpreted mode)
# Error: assert(!_parallel || _mark_stack->isEmpty(),"pre-condition (eager drainage)")
# An error report file with more information is saved as hs_err_pid24083.log
# To suppress the following error report, specify this argument
# after -XX: or in .hotspotrc:  SuppressErrorAt=/thread.hpp:390
# If you would like to submit a bug report, please visit:
#   http://java.sun.com/cgi-bin/bugreport.cgi
Current thread is 0x402
Dumping core ...

CONVERTED DATA BugTraq+ Release Management Values COMMIT TO FIX: tiger FIXED IN: tiger INTEGRATED IN: tiger-b18

EVALUATION See suggested fix section.

SUGGESTED FIX ###@###.### 2003-08-25: fix putback to gc_baseline/Tiger: Subject: Code Manager notification (putback-to) Date: Mon, 25 Aug 2003 15:14:40 -0700 (PDT) From: "Y. S. Ramakrishna" <###@###.###> To: ###@###.### Event: putback-to Parent workspace: /net/jano/export/disk05/hotspot/ws/main/gc_baseline (jano:/export/disk05/hotspot/ws/main/gc_baseline) Child workspace: /export/imgr_home/ws/20030825104821.ysr.prbug (balvenie:/export/imgr_home/ws/20030825104821.ysr.prbug) User: ysr Comment: Original workspace: neeraja:/net/jano.sfbay/export/hotspot/users1/ysr/prbug Parent workspace: /net/jano/export/disk05/hotspot/ws/main/gc_baseline Submitter: ysr imgr data: /net/balvenie.sfbay/export/imgr_home/archive/main/gc_baseline/2003/20030825104821.ysr.prbug 4907039 GC test crashes on linux-ia64 at concurrentMarkSweepGeneration.cpp with -Xconcgc Webrev: http://analemma.sfbay/net/jano/export/hotspot/users1/ysr/prbug/webrev The problem was that the ScanMarkedObjectsAgainClosure which is used to rescan the objects on dirty cards at the "final checkpoint" had gotten the use of the _parallel boolean field backwards in the assert. As it turns out, there was a potentially more pernicious bug, in that the use of the _parallel boolean was also backwards in the code that uses it to decide whether an embedded closure is a single-threaded closure or its multi-threaded avatar. Thanks to Jon Masamitsu for finding this bug via inspection. Our concern, then, was that this would cause errors because of the incorrect type cast implicit in the use of the union that holds the embedded closure. However, providing direct refutation of the adage, we found that two wrongs do indeed make a right, or at least a "lesser wrong" -- running with the nominally incorrect code revealed that, due to a performance bug because of which we had failed to "devirtualize/specialize" all the way to the final target, we ended up calling the correct do_oop() method of the actually embedded closure, rather than that of the type it had been incorrectly cast to in the earlier step. This was because although we had specialized/devirtualized the oop_iterate() methods for the two closures involved in the above code all the way to the do_oop_nv() method, the do_oop_nv() methods themselves were virtually dispatching to the do_oop() methods which did the work: void ClosureType::do_oop_nv() { do_oop(); } Thanks to John Coomes for figuring this out. This was fixed by using fully qualified method names at this and a handful of other places (all in CMS) where we had not intended virtual dispatch. The elimination of the virtual dispatch from these do_oop() methods resulted in no overall performance change. I also made a temporary fix for a silent stack overflow bug (found independently while working on parallel reference processing and running an SQE GC test, FinalizerGCXX) in parallel remark by: . increasing the size of the work-queue to reduce the probability of task queue overflow . noticing when task queues overflow and aborting with an error. A more complete fix for this latter problem will appear under 4615723. Reviewed by: Jon Masamitsu, John Coomes Fix verified: y Verification testing: /net/jano.sfbay/export/disk20/GammaBase/Bugs//4907039/rerun.sh on linux/ia64 Other testing: imgr: all platforms, CMS, +/-CMSParallelRemarkEnabled refWorkload: sparc/solaris/c2/CMS volano test 24 hours: sparc/solaris/cs/CMS passed linux i486 product SPECjvm98 GeoMean 37.70 58.25 passed linux i486 product1 SPECjvm98 GeoMean 44.51 49.38 passed linux i486 productcore SPECjvm98 GeoMean 97.05 97.05 passed linux ia64 product SPECjvm98 GeoMean 23.06 37.25 passed solaris i486 product SPECjvm98 GeoMean 36.94 57.47 passed solaris i486 product1 SPECjvm98 GeoMean 44.92 49.57 passed solaris i486 productcore SPECjvm98 GeoMean 104.45 104.45 passed solaris sparc product SPECjvm98 GeoMean 22.58 32.94 passed solaris sparc product1 SPECjvm98 GeoMean 22.70 24.85 passed solaris sparc productcore SPECjvm98 GeoMean 40.32 40.32 passed solaris sparcv9 product SPECjvm98 GeoMean 23.70 35.07 passed solaris sparcv9 productcore SPECjvm98 GeoMean 41.92 41.92 passed windows i486 compiler2 SPECjvm98 GeoMean 45.50 97.59 passed windows i486 compiler1 SPECjvm98 GeoMean 57.82 73.25 passed windows i486 core SPECjvm98 GeoMean 114.07 114.07 passed windows ia64 core SPECjvm98 GeoMean 15.86 15.86 Files: update: src/share/vm/memory/concurrentMarkSweepGeneration.cpp update: src/share/vm/memory/genOopClosures.hpp update: src/share/vm/utilities/taskqueue.hpp Examined files: 2854 Contents Summary: 3 update 2851 no action (unchanged)

PUBLIC COMMENTS verified on tiger beta2 b43 ###@###.### 2004-03-23