JDK-6298694 : bad performance with big object in heap
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 5.0,5.0u6,6
  • Priority: P3
  • Status: Closed
  • Resolution: Fixed
  • OS: generic,linux,solaris_10
  • CPU: generic,x86,sparc
  • Submitted: 2005-07-19
  • Updated: 2011-01-31
  • Resolved: 2006-03-14
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other JDK 6
5.0u8Fixed 6 b74Fixed
Related Reports
Relates :  
Relates :  
Allocation of big long lived array in heap significally reduce performance of Parallel and Conc GC. In mustang parallel works fine, but Conc still very slow.


See my reproduction source in attachment. SerialGC is more than 10 times faster than Conc.
java -XX:+PrintGCDetails -Xmx768m -XX:+Use<GC> gctest.Main
###@###.### 2005-07-19 13:56:08 GMT

EVALUATION I've looked at the execution profiles for the VM after the fix for the BOT was implemented and the a large part of the remaining costs are in the promotion handling code. Specifically, the code that links promoted objects together in a list may need some work. There was nothing there that looked like it related specifically to big objects so this bug is being moved to fix delivered.

SUGGESTED FIX The following fix was putback by Jon to gc_baseline on 2/17 and will integrate into Mustang b75 (nee b74): ----------------------------------------------------------------------------- Event: putback-to Parent workspace: /net/jano.sfbay/export/disk05/hotspot/ws/main/gc_baseline (jano.sfbay:/export/disk05/hotspot/ws/main/gc_baseline) Child workspace: /net/prt-web.sfbay/prt-workspaces/20060217102834.jmasa.gc_baseline_6298694/workspace (prt-web:/net/prt-web.sfbay/prt-workspaces/20060217102834.jmasa.gc_baseline_6298694/workspace) User: jmasa Comment: --------------------------------------------------------- Job ID: 20060217102834.jmasa.gc_baseline_6298694 Original workspace: arches:/net/karachi/bigtmp/jmasa/gc_baseline_6298694 Submitter: jmasa Archived data: /net/prt-archiver.sfbay/data/archived_workspaces/main/gc_baseline/2006/20060217102834.jmasa.gc_baseline_6298694/ Webrev: http://analemma.sfbay.sun.com/net/prt-archiver.sfbay/data/archived_workspaces/main/gc_baseline/2006/20060217102834.jmasa.gc_baseline_6298694/workspace/webrevs/webrev-2006.02.17/index.html Partial 6298694: bad performance with big object in heap This is partial because there may be more improvements to be made for large objects. Change the initialization of the block offset table to use the logrithmic offsets. Line 98 in the webrev for blockOffsetTable.cpp is the significant change. The rest of the changes are clean up. Removed some debugging code (old version of a method that was being used for verification). Renamed a parameter to fix_up_alloced_region() from "start_card" to "first_card_to_fix" since "start_card" often is the first card set for a newly allocated block. Clarified (hopefully) the specification for fix_up_alloced_region(). Reviewed by: Ramki (partial) , John, and Tony. Approved for putback by: Dave C. Fix verified (y/n): y Verification testing: Ran the test program attached to the CR and noted a decrease in the minor collection pause by about 2/3's (approximately 10s to approximately 3.5s). Other testing: runThese -quick -testbase_vm -testbase_gc with sparc product and fastdebug builds refworkload reference_server runs were done with sparc product. Files: update: src/share/vm/memory/blockOffsetTable.cpp update: src/share/vm/memory/blockOffsetTable.hpp Examined files: 3790 Contents Summary: 2 update 3788 no action (unchanged)

EVALUATION The root cause of this bug is the the block offset table (BOT) is initialized for single card offsets in the constructor for BlockOffsetArray. This is does not lead to a correctness issue. The performance problem arises from the free list allocation done by CMS. If a chunk is split in order to do an allocation, it is assumed that the BOT for the original chunk is correct and the BOT for the remainder after the split is updated. For some situations (for example the first initialization of large chunks out of the dictionary) this leaves the BOT using the single card offsets instead of the logrithmic offsets. This probably evertually works itself out but is particularly obvious in the test case for the problem. The initialization of the BOT needs to fixed. Since the contiguous space version of the BOT will set the BOT for blocks as allocations move to the right in the heap, initializing for logrithmics strides is probably ok but that should be verified.