Bug ID: JDK-6468290 Divide and allocate out of eden on a per cpu basis

Type: Enhancement
Component: hotspot
Sub-Component: gc
Affected Version: 7

Priority: P4
Status: Resolved
Resolution: Fixed
OS: generic
CPU: generic

Submitted: 2006-09-07
Updated: 2012-10-18
Resolved: 2007-04-24

JDK 6	JDK 7	Other
6u2Fixed	7Fixed	hs10Fixed

This is a potential performance improvement on numa architectures.  The goal
is to improve thread to memory affinity.

SUGGESTED FIX Event: putback-to Parent workspace: /net/jano/export/disk05/hotspot/ws/main/gc_baseline (jano:/export/disk05/hotspot/ws/main/gc_baseline) Child workspace: /net/prt-web.sfbay/prt-workspaces/20061205113425.iv159533.gc_baseline.numa/workspace (prt-web:/net/prt-web.sfbay/prt-workspaces/20061205113425.iv159533.gc_baseline.numa/workspace) User: iv159533 Comment: --------------------------------------------------------- Job ID: 20061205113425.iv159533.gc_baseline.numa Original workspace: jano:/home/iv159533/gc_baseline.numa Submitter: iv159533 Archived data: /net/prt-archiver.sfbay/data/archived_workspaces/main/gc_baseline/2006/20061205113425.iv159533.gc_baseline.numa/ Webrev: http://analemma.sfbay.sun.com/net/prt-archiver.sfbay/data/archived_workspaces/main/gc_baseline/2006/20061205113425.iv159533.gc_baseline.numa/workspace/webrevs/webrev-2006.12.05/index.html Partial 6468290: Divide and allocate out of eden on a per cpu basis This is the initial partial fix. It implements a NUMA-aware allocator for eden with parallel scavenger. CPU hotplugging is not yet supported. The allocator is basically a modification of MutableSpace which preserves interfaces but implements different functionality. The space (eden) is split into chunks for each locality group. For each thread the allocations are performed in the chunk corresponding to the home locality group of the thread. When any chunk fills-in the young generation collection occurs. Eden resizing for the adaptive size policy is also supported. For more information and performance results please refer to: http://j2se.sfbay.sun.com/web/bin/view/HotspotGC/NUMA Webrev: http://javaweb.sfbay/~iv159533/webrev.numa/ Fix verified (y/n): y Testing: refworkload, PRT (with -XX:+UseNUMA). Reviewed by: Jon Files: update: src/cpu/amd64/vm/assembler_amd64.cpp update: src/cpu/i486/vm/assembler_i486.cpp update: src/cpu/sparc/vm/assembler_sparc.cpp update: src/os/linux/vm/os_linux.cpp update: src/os/solaris/vm/os_solaris.cpp update: src/os/solaris/vm/os_solaris.hpp update: src/os/win32/vm/os_win32.cpp update: src/share/vm/gc_implementation/includeDB_gc_parallelScavenge update: src/share/vm/gc_implementation/includeDB_gc_shared update: src/share/vm/gc_implementation/parallelScavenge/parallelScavengeHeap.cpp update: src/share/vm/gc_implementation/parallelScavenge/parallelScavengeHeap.hpp update: src/share/vm/gc_implementation/parallelScavenge/psYoungGen.cpp update: src/share/vm/gc_implementation/shared/immutableSpace.hpp update: src/share/vm/gc_implementation/shared/mutableSpace.cpp update: src/share/vm/gc_implementation/shared/mutableSpace.hpp update: src/share/vm/runtime/arguments.cpp update: src/share/vm/runtime/globals.hpp update: src/share/vm/runtime/os.hpp update: src/share/vm/runtime/thread.cpp update: src/share/vm/runtime/thread.hpp update: src/share/vm/utilities/globalDefinitions_gcc.hpp update: src/share/vm/utilities/globalDefinitions_sparcWorks.hpp create: src/share/vm/gc_implementation/shared/mutableNUMASpace.cpp create: src/share/vm/gc_implementation/shared/mutableNUMASpace.hpp Examined files: 3925 Contents Summary: 2 create 22 update 3901 no action (unchanged) ======================================================== Event: putback-to Parent workspace: /net/jano/export/disk05/hotspot/ws/main/gc_baseline (jano:/export/disk05/hotspot/ws/main/gc_baseline) Child workspace: /net/prt-web.sfbay/prt-workspaces/20070320194522.iv159533.gc_baseline.numa.adaptive/workspace (prt-web:/net/prt-web.sfbay/prt-workspaces/20070320194522.iv159533.gc_baseline.numa.adaptive/workspace) User: iv159533 Comment: --------------------------------------------------------- Job ID: 20070320194522.iv159533.gc_baseline.numa.adaptive Original workspace: jano.SFBay.Sun.COM:/home/iv159533/gc_baseline.numa.adaptive Submitter: iv159533 Archived data: /net/prt-archiver.sfbay/data/archived_workspaces/main/gc_baseline/2007/20070320194522.iv159533.gc_baseline.numa.adaptive/ Webrev: http://prt-web.sfbay.sun.com/net/prt-archiver.sfbay/data/archived_workspaces/main/gc_baseline/2007/20070320194522.iv159533.gc_baseline.numa.adaptive/workspace/webrevs/webrev-2007.03.21/index.html Fixed 6468290: Divide and allocate out of eden on a per cpu basis The idea behind the adaptive sizing is to reduce the loss of space in the eden due to fragmentation. The main cause of fragmentation is uneven allocation rates of threads. Allocation rate difference between the locality groups may be caused either by application specifics or by uneven LWP distribution by the OS. Besides, an application can have less threads then the number of locality groups. In order to resize the chunk we measure the allocation rate of the application between collections. After that we reshape the chunks to reflect the allocation rate pattern. The AdaptiveWeightedAverage filter is used to smooth the measurements. The NUMASpaceResizeSpeed parameter is used to control the adaptation speed by restricting the number of bytes that can be moved during the adaptation phase. The page-scanner is another addition. It is designed to address the problem of the pages allocated in a wrong locality group. This typically happens due to shortage of the pages in the target locality group. The page-scanner scans the pages right after the collection and frees remote pages in hope that subsequent reallocation would be more successful. This approach proved to be useful on systems with high load where multiple processes are competing for the memory. SPECjbb2005 improvement results 32bit/64bit (compared to the baseline allocator): 8/9% on x4100, 30/40% on x4600, NA/280% on E25K. For details and experimental results please refer to the wiki page. Webrev: http://javaweb.sfbay/~iv159533/webrev.numa.adaptive Wiki: http://j2se.sfbay.sun.com/web/bin/view/HotspotGC/AdaptiveNUMAChunkSizing Fix verified (y/n): y Testing: refworkload, PRT with -XX:+UseNUMA Reviewed by: Jon, John, Andrey Files: update: src/os/linux/vm/os_linux.cpp update: src/os/solaris/vm/os_solaris.cpp update: src/os/solaris/vm/os_solaris.hpp update: src/os/win32/vm/os_win32.cpp update: src/share/vm/gc_implementation/includeDB_gc_shared update: src/share/vm/gc_implementation/parallelScavenge/parallelScavengeHeap.cpp update: src/share/vm/gc_implementation/parallelScavenge/parallelScavengeHeap.hpp update: src/share/vm/gc_implementation/parallelScavenge/psScavenge.cpp update: src/share/vm/gc_implementation/shared/mutableNUMASpace.cpp update: src/share/vm/gc_implementation/shared/mutableNUMASpace.hpp update: src/share/vm/gc_implementation/shared/mutableSpace.cpp update: src/share/vm/gc_implementation/shared/mutableSpace.hpp update: src/share/vm/gc_interface/collectedHeap.hpp update: src/share/vm/memory/genCollectedHeap.cpp update: src/share/vm/memory/genCollectedHeap.hpp update: src/share/vm/memory/threadLocalAllocBuffer.cpp update: src/share/vm/memory/threadLocalAllocBuffer.hpp update: src/share/vm/memory/threadLocalAllocBuffer.inline.hpp update: src/share/vm/prims/jni.cpp update: src/share/vm/runtime/globals.hpp update: src/share/vm/runtime/os.hpp update: src/share/vm/runtime/thread.cpp update: src/share/vm/runtime/thread.hpp update: src/share/vm/utilities/globalDefinitions.hpp Examined files: 3944 Contents Summary: 24 update 3920 no action (unchanged)

30-03-2007

EVALUATION Implement the suggest in the comment section and measure any improvement.

07-09-2006

Relates :	JDK-4914491 - Enable to use different heaps in same JVM and share heaps between different JVMs
Relates :	JDK-6629329 - CMS/ParNew: extend NUMA allocator to ParNew