JDK-6468290 : Divide and allocate out of eden on a per cpu basis
Type:Enhancement
Component:hotspot
Sub-Component:gc
Affected Version:7
Priority:P4
Status:Resolved
Resolution:Fixed
OS:generic
CPU:generic
Submitted:2006-09-07
Updated:2012-10-18
Resolved:2007-04-24
The Version table provides details related to the release that this issue/RFE will be addressed.
Unresolved : Release in which this issue/RFE will be addressed. Resolved: Release in which this issue/RFE has been resolved. Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.
This is a potential performance improvement on numa architectures. The goal
is to improve thread to memory affinity.
Comments
SUGGESTED FIX
Event: putback-to
Parent workspace: /net/jano/export/disk05/hotspot/ws/main/gc_baseline
(jano:/export/disk05/hotspot/ws/main/gc_baseline)
Child workspace: /net/prt-web.sfbay/prt-workspaces/20061205113425.iv159533.gc_baseline.numa/workspace
(prt-web:/net/prt-web.sfbay/prt-workspaces/20061205113425.iv159533.gc_baseline.numa/workspace)
User: iv159533
Comment:
---------------------------------------------------------
Job ID: 20061205113425.iv159533.gc_baseline.numa
Original workspace: jano:/home/iv159533/gc_baseline.numa
Submitter: iv159533
Archived data: /net/prt-archiver.sfbay/data/archived_workspaces/main/gc_baseline/2006/20061205113425.iv159533.gc_baseline.numa/
Webrev: http://analemma.sfbay.sun.com/net/prt-archiver.sfbay/data/archived_workspaces/main/gc_baseline/2006/20061205113425.iv159533.gc_baseline.numa/workspace/webrevs/webrev-2006.12.05/index.html
Partial 6468290: Divide and allocate out of eden on a per cpu basis
This is the initial partial fix. It implements a NUMA-aware allocator for
eden with parallel scavenger. CPU hotplugging is not yet supported.
The allocator is basically a modification of MutableSpace which preserves
interfaces but implements different functionality. The space (eden) is split
into chunks for each locality group. For each thread the allocations are
performed in the chunk corresponding to the home locality group of the thread.
When any chunk fills-in the young generation collection occurs. Eden resizing
for the adaptive size policy is also supported.
For more information and performance results please refer to:
http://j2se.sfbay.sun.com/web/bin/view/HotspotGC/NUMA
Webrev: http://javaweb.sfbay/~iv159533/webrev.numa/
Fix verified (y/n): y
Testing: refworkload, PRT (with -XX:+UseNUMA).
Reviewed by: Jon
Files:
update: src/cpu/amd64/vm/assembler_amd64.cpp
update: src/cpu/i486/vm/assembler_i486.cpp
update: src/cpu/sparc/vm/assembler_sparc.cpp
update: src/os/linux/vm/os_linux.cpp
update: src/os/solaris/vm/os_solaris.cpp
update: src/os/solaris/vm/os_solaris.hpp
update: src/os/win32/vm/os_win32.cpp
update: src/share/vm/gc_implementation/includeDB_gc_parallelScavenge
update: src/share/vm/gc_implementation/includeDB_gc_shared
update: src/share/vm/gc_implementation/parallelScavenge/parallelScavengeHeap.cpp
update: src/share/vm/gc_implementation/parallelScavenge/parallelScavengeHeap.hpp
update: src/share/vm/gc_implementation/parallelScavenge/psYoungGen.cpp
update: src/share/vm/gc_implementation/shared/immutableSpace.hpp
update: src/share/vm/gc_implementation/shared/mutableSpace.cpp
update: src/share/vm/gc_implementation/shared/mutableSpace.hpp
update: src/share/vm/runtime/arguments.cpp
update: src/share/vm/runtime/globals.hpp
update: src/share/vm/runtime/os.hpp
update: src/share/vm/runtime/thread.cpp
update: src/share/vm/runtime/thread.hpp
update: src/share/vm/utilities/globalDefinitions_gcc.hpp
update: src/share/vm/utilities/globalDefinitions_sparcWorks.hpp
create: src/share/vm/gc_implementation/shared/mutableNUMASpace.cpp
create: src/share/vm/gc_implementation/shared/mutableNUMASpace.hpp
Examined files: 3925
Contents Summary:
2 create
22 update
3901 no action (unchanged)
========================================================
Event: putback-to
Parent workspace: /net/jano/export/disk05/hotspot/ws/main/gc_baseline
(jano:/export/disk05/hotspot/ws/main/gc_baseline)
Child workspace: /net/prt-web.sfbay/prt-workspaces/20070320194522.iv159533.gc_baseline.numa.adaptive/workspace
(prt-web:/net/prt-web.sfbay/prt-workspaces/20070320194522.iv159533.gc_baseline.numa.adaptive/workspace)
User: iv159533
Comment:
---------------------------------------------------------
Job ID: 20070320194522.iv159533.gc_baseline.numa.adaptive
Original workspace: jano.SFBay.Sun.COM:/home/iv159533/gc_baseline.numa.adaptive
Submitter: iv159533
Archived data: /net/prt-archiver.sfbay/data/archived_workspaces/main/gc_baseline/2007/20070320194522.iv159533.gc_baseline.numa.adaptive/
Webrev: http://prt-web.sfbay.sun.com/net/prt-archiver.sfbay/data/archived_workspaces/main/gc_baseline/2007/20070320194522.iv159533.gc_baseline.numa.adaptive/workspace/webrevs/webrev-2007.03.21/index.html
Fixed 6468290: Divide and allocate out of eden on a per cpu basis
The idea behind the adaptive sizing is to reduce the loss of space in
the eden due to fragmentation. The main cause of fragmentation is
uneven allocation rates of threads. Allocation rate difference between
the locality groups may be caused either by application specifics or by
uneven LWP distribution by the OS. Besides, an application can have less
threads then the number of locality groups.
In order to resize the chunk we measure the allocation rate of the
application between collections. After that we reshape the chunks to
reflect the allocation rate pattern. The AdaptiveWeightedAverage filter
is used to smooth the measurements. The NUMASpaceResizeSpeed parameter is
used to control the adaptation speed by restricting the number of bytes
that can be moved during the adaptation phase.
The page-scanner is another addition. It is designed to address the
problem of the pages allocated in a wrong locality group. This typically
happens due to shortage of the pages in the target locality group. The
page-scanner scans the pages right after the collection and frees remote
pages in hope that subsequent reallocation would be more successful.
This approach proved to be useful on systems with high load where multiple
processes are competing for the memory.
SPECjbb2005 improvement results 32bit/64bit (compared to the baseline
allocator): 8/9% on x4100, 30/40% on x4600, NA/280% on E25K.
For details and experimental results please refer to the wiki page.
Webrev: http://javaweb.sfbay/~iv159533/webrev.numa.adaptive
Wiki: http://j2se.sfbay.sun.com/web/bin/view/HotspotGC/AdaptiveNUMAChunkSizing
Fix verified (y/n): y
Testing: refworkload, PRT with -XX:+UseNUMA
Reviewed by: Jon, John, Andrey
Files:
update: src/os/linux/vm/os_linux.cpp
update: src/os/solaris/vm/os_solaris.cpp
update: src/os/solaris/vm/os_solaris.hpp
update: src/os/win32/vm/os_win32.cpp
update: src/share/vm/gc_implementation/includeDB_gc_shared
update: src/share/vm/gc_implementation/parallelScavenge/parallelScavengeHeap.cpp
update: src/share/vm/gc_implementation/parallelScavenge/parallelScavengeHeap.hpp
update: src/share/vm/gc_implementation/parallelScavenge/psScavenge.cpp
update: src/share/vm/gc_implementation/shared/mutableNUMASpace.cpp
update: src/share/vm/gc_implementation/shared/mutableNUMASpace.hpp
update: src/share/vm/gc_implementation/shared/mutableSpace.cpp
update: src/share/vm/gc_implementation/shared/mutableSpace.hpp
update: src/share/vm/gc_interface/collectedHeap.hpp
update: src/share/vm/memory/genCollectedHeap.cpp
update: src/share/vm/memory/genCollectedHeap.hpp
update: src/share/vm/memory/threadLocalAllocBuffer.cpp
update: src/share/vm/memory/threadLocalAllocBuffer.hpp
update: src/share/vm/memory/threadLocalAllocBuffer.inline.hpp
update: src/share/vm/prims/jni.cpp
update: src/share/vm/runtime/globals.hpp
update: src/share/vm/runtime/os.hpp
update: src/share/vm/runtime/thread.cpp
update: src/share/vm/runtime/thread.hpp
update: src/share/vm/utilities/globalDefinitions.hpp
Examined files: 3944
Contents Summary:
24 update
3920 no action (unchanged)
30-03-2007
EVALUATION
Implement the suggest in the comment section and measure any improvement.