United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
Bug ID: JDK-6468290 Divide and allocate out of eden on a per cpu basis
JDK-6468290 : Divide and allocate out of eden on a per cpu basis

Details
Type:
Enhancement
Submit Date:
2006-09-07
Status:
Resolved
Updated Date:
2012-10-18
Project Name:
JDK
Resolved Date:
2007-04-24
Component:
hotspot
OS:
generic
Sub-Component:
gc
CPU:
generic
Priority:
P4
Resolution:
Fixed
Affected Versions:
7
Fixed Versions:
hs10 (b12)

Related Reports
Backport:
Backport:
Relates:
Relates:

Sub Tasks

Description
This is a potential performance improvement on numa architectures.  The goal
is to improve thread to memory affinity.

                                    

Comments
EVALUATION

Implement the suggest in the comment section and measure any improvement.
                                     
2006-09-07
SUGGESTED FIX

Event:            putback-to
Parent workspace: /net/jano/export/disk05/hotspot/ws/main/gc_baseline
                  (jano:/export/disk05/hotspot/ws/main/gc_baseline)
Child workspace:  /net/prt-web.sfbay/prt-workspaces/20061205113425.iv159533.gc_baseline.numa/workspace
                  (prt-web:/net/prt-web.sfbay/prt-workspaces/20061205113425.iv159533.gc_baseline.numa/workspace)
User:             iv159533

Comment:

---------------------------------------------------------

Job ID:                 20061205113425.iv159533.gc_baseline.numa
Original workspace:     jano:/home/iv159533/gc_baseline.numa
Submitter:              iv159533
Archived data:          /net/prt-archiver.sfbay/data/archived_workspaces/main/gc_baseline/2006/20061205113425.iv159533.gc_baseline.numa/
Webrev:                 http://analemma.sfbay.sun.com/net/prt-archiver.sfbay/data/archived_workspaces/main/gc_baseline/2006/20061205113425.iv159533.gc_baseline.numa/workspace/webrevs/webrev-2006.12.05/index.html

Partial 6468290: Divide and allocate out of eden on a per cpu basis

This is the initial partial fix. It implements a NUMA-aware allocator for
eden with parallel scavenger. CPU hotplugging is not yet supported.
The allocator is basically a modification of MutableSpace which preserves 
interfaces but implements different functionality. The space (eden) is split
into chunks for each locality group. For each thread the allocations are 
performed in the chunk corresponding to the home locality group of the thread. 
When any chunk fills-in the young generation collection occurs. Eden resizing
for the adaptive size policy is also supported.

For more information and performance results please refer to:

http://j2se.sfbay.sun.com/web/bin/view/HotspotGC/NUMA

Webrev: http://javaweb.sfbay/~iv159533/webrev.numa/

Fix verified (y/n): y

Testing: refworkload, PRT (with -XX:+UseNUMA).

Reviewed by: Jon


Files:
update: src/cpu/amd64/vm/assembler_amd64.cpp
update: src/cpu/i486/vm/assembler_i486.cpp
update: src/cpu/sparc/vm/assembler_sparc.cpp
update: src/os/linux/vm/os_linux.cpp
update: src/os/solaris/vm/os_solaris.cpp
update: src/os/solaris/vm/os_solaris.hpp
update: src/os/win32/vm/os_win32.cpp
update: src/share/vm/gc_implementation/includeDB_gc_parallelScavenge
update: src/share/vm/gc_implementation/includeDB_gc_shared
update: src/share/vm/gc_implementation/parallelScavenge/parallelScavengeHeap.cpp
update: src/share/vm/gc_implementation/parallelScavenge/parallelScavengeHeap.hpp
update: src/share/vm/gc_implementation/parallelScavenge/psYoungGen.cpp
update: src/share/vm/gc_implementation/shared/immutableSpace.hpp
update: src/share/vm/gc_implementation/shared/mutableSpace.cpp
update: src/share/vm/gc_implementation/shared/mutableSpace.hpp
update: src/share/vm/runtime/arguments.cpp
update: src/share/vm/runtime/globals.hpp
update: src/share/vm/runtime/os.hpp
update: src/share/vm/runtime/thread.cpp
update: src/share/vm/runtime/thread.hpp
update: src/share/vm/utilities/globalDefinitions_gcc.hpp
update: src/share/vm/utilities/globalDefinitions_sparcWorks.hpp
create: src/share/vm/gc_implementation/shared/mutableNUMASpace.cpp
create: src/share/vm/gc_implementation/shared/mutableNUMASpace.hpp

Examined files: 3925

Contents Summary:
       2   create
      22   update
    3901   no action (unchanged)


========================================================

Event:            putback-to
Parent workspace: /net/jano/export/disk05/hotspot/ws/main/gc_baseline
                  (jano:/export/disk05/hotspot/ws/main/gc_baseline)
Child workspace:  /net/prt-web.sfbay/prt-workspaces/20070320194522.iv159533.gc_baseline.numa.adaptive/workspace
                  (prt-web:/net/prt-web.sfbay/prt-workspaces/20070320194522.iv159533.gc_baseline.numa.adaptive/workspace)
User:             iv159533

Comment:

---------------------------------------------------------

Job ID:                 20070320194522.iv159533.gc_baseline.numa.adaptive
Original workspace:     jano.SFBay.Sun.COM:/home/iv159533/gc_baseline.numa.adaptive
Submitter:              iv159533
Archived data:          /net/prt-archiver.sfbay/data/archived_workspaces/main/gc_baseline/2007/20070320194522.iv159533.gc_baseline.numa.adaptive/
Webrev:                 http://prt-web.sfbay.sun.com/net/prt-archiver.sfbay/data/archived_workspaces/main/gc_baseline/2007/20070320194522.iv159533.gc_baseline.numa.adaptive/workspace/webrevs/webrev-2007.03.21/index.html

Fixed 6468290: Divide and allocate out of eden on a per cpu basis

  The idea behind the adaptive sizing is to reduce the loss of space in 
the eden due to fragmentation. The main cause of fragmentation is 
uneven allocation rates of threads. Allocation rate difference between 
the locality groups may be caused either by application specifics or by 
uneven LWP distribution by the OS. Besides, an application can have less
threads then the number of locality groups.  
  In order to resize the chunk we measure the allocation rate of the 
application between collections. After that we reshape the chunks to 
reflect the allocation rate pattern. The AdaptiveWeightedAverage filter 
is used to smooth the measurements. The NUMASpaceResizeSpeed parameter is
used to control the adaptation speed by restricting the number of bytes 
that can be moved during the adaptation phase. 
  The page-scanner is another addition. It is designed to address the 
problem of the pages allocated in a wrong locality group. This typically
happens due to shortage of the pages in the target locality group.  The
page-scanner scans the pages right after the collection and frees remote
pages in hope that subsequent reallocation would be more successful. 
This approach proved to be useful on systems with high load where multiple
processes are competing for the memory.

  SPECjbb2005 improvement results 32bit/64bit (compared to the baseline
allocator): 8/9% on x4100, 30/40% on x4600, NA/280% on E25K.

For details and experimental results please refer to the wiki page.

Webrev: http://javaweb.sfbay/~iv159533/webrev.numa.adaptive
Wiki: http://j2se.sfbay.sun.com/web/bin/view/HotspotGC/AdaptiveNUMAChunkSizing

Fix verified (y/n): y

Testing: refworkload, PRT with -XX:+UseNUMA

Reviewed by: Jon, John, Andrey



Files:
update: src/os/linux/vm/os_linux.cpp
update: src/os/solaris/vm/os_solaris.cpp
update: src/os/solaris/vm/os_solaris.hpp
update: src/os/win32/vm/os_win32.cpp
update: src/share/vm/gc_implementation/includeDB_gc_shared
update: src/share/vm/gc_implementation/parallelScavenge/parallelScavengeHeap.cpp
update: src/share/vm/gc_implementation/parallelScavenge/parallelScavengeHeap.hpp
update: src/share/vm/gc_implementation/parallelScavenge/psScavenge.cpp
update: src/share/vm/gc_implementation/shared/mutableNUMASpace.cpp
update: src/share/vm/gc_implementation/shared/mutableNUMASpace.hpp
update: src/share/vm/gc_implementation/shared/mutableSpace.cpp
update: src/share/vm/gc_implementation/shared/mutableSpace.hpp
update: src/share/vm/gc_interface/collectedHeap.hpp
update: src/share/vm/memory/genCollectedHeap.cpp
update: src/share/vm/memory/genCollectedHeap.hpp
update: src/share/vm/memory/threadLocalAllocBuffer.cpp
update: src/share/vm/memory/threadLocalAllocBuffer.hpp
update: src/share/vm/memory/threadLocalAllocBuffer.inline.hpp
update: src/share/vm/prims/jni.cpp
update: src/share/vm/runtime/globals.hpp
update: src/share/vm/runtime/os.hpp
update: src/share/vm/runtime/thread.cpp
update: src/share/vm/runtime/thread.hpp
update: src/share/vm/utilities/globalDefinitions.hpp

Examined files: 3944

Contents Summary:
      24   update
    3920   no action (unchanged)
                                     
2007-03-30



Hardware and Software, Engineered to Work Together