United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
JDK-6298694 : bad performance with big object in heap

Details
Type:
Bug
Submit Date:
2005-07-19
Status:
Closed
Updated Date:
2011-01-31
Project Name:
JDK
Resolved Date:
2006-03-14
Component:
hotspot
OS:
linux,generic,solaris_10
Sub-Component:
gc
CPU:
x86,sparc,generic
Priority:
P3
Resolution:
Fixed
Affected Versions:
5.0,5.0u6,6
Fixed Versions:

Related Reports
Backport:
Relates:
Relates:

Sub Tasks

Description
Allocation of big long lived array in heap significally reduce performance of Parallel and Conc GC. In mustang parallel works fine, but Conc still very slow.

http://forum.java.sun.com/thread.jspa?messageID=3801578

See my reproduction source in attachment. SerialGC is more than 10 times faster than Conc.
java -XX:+PrintGCDetails -Xmx768m -XX:+Use<GC> gctest.Main
###@###.### 2005-07-19 13:56:08 GMT

                                    

Comments
EVALUATION

The root cause of this bug is the the block offset table (BOT) is initialized
for single card offsets in the constructor for BlockOffsetArray.  This is 
does not lead to a correctness issue.  The performance problem arises from
the free list allocation done by CMS.  If a chunk is split in order to
do an allocation, it is assumed that the BOT for the original chunk is
correct and the BOT for the remainder after the split is updated.  For
some situations (for example the first initialization of large chunks
out of the dictionary) this leaves the BOT using the single card offsets
instead of the logrithmic offsets.  This probably evertually works itself
out but is particularly obvious  in the test case for the problem.

The initialization of the BOT needs to fixed.  Since the contiguous space
version of the BOT will set the BOT for blocks as allocations move to
the right in the heap, initializing for logrithmics strides is probably
ok but that should be verified.
                                     
2006-02-02
SUGGESTED FIX

The following fix was putback by Jon to gc_baseline on 2/17
and will integrate into Mustang b75 (nee b74):
-----------------------------------------------------------------------------

Event:            putback-to
Parent workspace: /net/jano.sfbay/export/disk05/hotspot/ws/main/gc_baseline
                  (jano.sfbay:/export/disk05/hotspot/ws/main/gc_baseline)
Child workspace:  /net/prt-web.sfbay/prt-workspaces/20060217102834.jmasa.gc_baseline_6298694/workspace
                  (prt-web:/net/prt-web.sfbay/prt-workspaces/20060217102834.jmasa.gc_baseline_6298694/workspace)
User:             jmasa

Comment:

---------------------------------------------------------

Job ID:                 20060217102834.jmasa.gc_baseline_6298694
Original workspace:     arches:/net/karachi/bigtmp/jmasa/gc_baseline_6298694
Submitter:              jmasa
Archived data:          /net/prt-archiver.sfbay/data/archived_workspaces/main/gc_baseline/2006/20060217102834.jmasa.gc_baseline_6298694/
Webrev:                 http://analemma.sfbay.sun.com/net/prt-archiver.sfbay/data/archived_workspaces/main/gc_baseline/2006/20060217102834.jmasa.gc_baseline_6298694/workspace/webrevs/webrev-2006.02.17/index.html

Partial 6298694: bad performance with big object in heap

This is partial because there may be more improvements to be
made for large objects.

Change the initialization of the block offset table to
use the logrithmic offsets.  Line 98 in the webrev for blockOffsetTable.cpp
is the significant change.  The rest of the changes are clean up.

Removed some debugging code (old version of a method that was
being used for verification).

Renamed a parameter to fix_up_alloced_region() from "start_card"
to "first_card_to_fix" since "start_card" often is the first card
set for a newly allocated block.  Clarified (hopefully) the 
specification for fix_up_alloced_region().

Reviewed by: Ramki (partial) , John, and Tony.

Approved for putback by: Dave C.

Fix verified (y/n): y

Verification testing:
	Ran the test program attached to the CR and noted a
	decrease in the minor collection pause by about
	2/3's (approximately 10s to approximately 3.5s).

Other testing:

	runThese -quick -testbase_vm -testbase_gc with sparc product
	and fastdebug builds

	refworkload reference_server runs were done with sparc product.

Files:
update: src/share/vm/memory/blockOffsetTable.cpp
update: src/share/vm/memory/blockOffsetTable.hpp

Examined files: 3790

Contents Summary:
       2   update
    3788   no action (unchanged)
                                     
2006-02-25
EVALUATION

I've looked at the execution profiles for the VM
after the fix for the BOT was implemented and the
a large part of the remaining costs are in the promotion handling code.
Specifically, the code that links promoted objects together
in a list may need some work.  There was nothing there that
looked like it related specifically to big objects so this bug
is being moved to fix delivered.
                                     
2006-03-14



Hardware and Software, Engineered to Work Together