JDK-6392907 : Very slow FullGC with -XX:+UseParallelOldGC when running GC stress tests
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 5.0u17-rev,6
  • Priority: P2
  • Status: Closed
  • Resolution: Fixed
  • OS: generic,solaris_10
  • CPU: generic,sparc
  • Submitted: 2006-03-02
  • Updated: 2012-02-01
  • Resolved: 2006-07-20
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 6
6 b92Fixed
Related Reports
Relates :  
Description
In nighly testing tests very long time to complete. Running the same tests manually with -XX:+PrintGCDetails shows that some FullGCs take up to 300 seconds and more to complete. See comments section for details on how to reproduce it.

Comments
SUGGESTED FIX This was fixed as a side-effect of the performance improvements done under 6433606; see the suggested fix there.
01-07-2006

EVALUATION Some runs with analyzer show that the primary problem is the unnecessary bitmap iteration done during compaction. The caller-callees output from analyzer (er_print) for the function BitMap::find_next_one_bit() shows: Callers and callees sorted by metric: Attributed User CPU Time Attr. User CPU Excl. User CPU Incl. User CPU Name sec. % sec. % sec. % 1546.202 45.18 0.430 0.01 1550.234 41.26 ParCompactionManager::copy(oopDesc*,HeapWord*,HeapWord*) 1423.776 41.60 1.011 0.03 2975.031 79.18 MoveAndUpdateClosure::do_bit_cond(unsigned,HeapWord*,HeapWord*) 451.866 13.20 1.691 0.05 3429.919 91.29 PSParallelCompact::dest_chunk_prologue(ParCompactionManager*,MoveAndUpdateClosure*,unsigned,unsigned*,unsigned*) 0.660 0.02 0.040 0.00 1.131 0.03 ParMarkBitMap::iterate(BitMapTerminateClosure*,BitMapTerminateClosure*,unsigned,unsigned,unsigned)const 0.030 0.00 0.040 0.00 0.080 0.00 ParallelCompactData::start_of_2nd_chunk_live(unsigned,unsigned) 0.010 0.00 0. 0. 0.100 0.00 ParMarkBitMap::iterate(BitMapTerminateClosure*,unsigned,unsigned)const 3422.544 100.00 3422.544 91.09 3422.544 91.09 *BitMap::find_next_one_bit(unsigned,unsigned)const The calls to find_next_one_bit() from copy() and do_bit_cond() are unnecessary, and take up a significant fraction of the total execution time. This was also discovered separately and fixed while working on general performance improvements for par compaction. That workspace also eliminates dest_chunk_prologue() and the associated bitmap iteration done there. The caller-callees output running binaries from the new workspace shows: Callers and callees sorted by metric: Attributed User CPU Time Attr. User Excl. User Incl. User Name CPU CPU CPU sec. % sec. % sec. % 0.831 100.00 0.010 0.04 1.241 4.93 ParMarkBitMap::iterate(ParMarkBitMapClosure*,ParMarkBitMapClosure*,unsigned,unsigned,unsigned)const 0.831 100.00 0.831 3.30 0.831 3.30 *BitMap::find_next_one_bit(unsigned,unsigned)const
07-03-2006