JDK-6488532 : Support allocate prefetching of several sequential cache lines
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 7
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • OS: solaris_10
  • CPU: x86
  • Submitted: 2006-10-31
  • Updated: 2010-04-03
  • Resolved: 2006-12-02
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 6 JDK 7 Other
6u2Fixed 7Fixed hs10Fixed
Related Reports
Relates :  
Description
I see about 3% improvement of jbb2005 score on Core2Duo with
-XX:AllocatePrefetchLines=3 vs -XX:AllocatePrefetchLines=1 (default).

The time spent in arraycopy dropped from 96 sec to 26 sec:

-XX:AllocatePrefetchLines=1 :
96.508 100.00    96.508   4.67    96.508   4.67  *arrayof_jshort_disjoint_arraycopy

-XX:AllocatePrefetchLines=3 :
25.868 100.00    25.868   1.25    25.868   1.25  *arrayof_jshort_disjoint_arraycopy

Comments
SUGGESTED FIX Webrev: http://analemma.sfbay.sun.com/net/prt-archiver.sfbay/data/archived_workspaces/main/c2_baseline/2006/20061114191212.kvn.6488532/workspace/webrevs/webrev-2006.11.14/index.html 1. Added new allocate prefetch options: Number of lines to prefetch: AllocatePrefetchLines = 3 on x86 / 1 on sparc Step size in bytes of sequential prefetch: AllocatePrefetchStepSize = 16 (niagara1 == 16, sparc == 32, x86 L1 cache line size) Prefetch instruction used to allocate: AllocatePrefetchInstr = 0 ( allow to select prefetch instruction on x86/x64: 0 - PREFETCHNTA, 1 - PREFETCHT0, 2 - PREFETCHT2, 3 - PREFETCHW ) Prefetch instruction used to read: ReadPrefetchInstr = 0 ( allow to select prefetch instruction on x86/x64: 0 - PREFETCHNTA, 1 - PREFETCHT0, 2 - PREFETCHT2, 3 - PREFETCHR ) By default PREFETCHNTA is used instead of PREFETCHW on Opteron since it shows better jbb scores. 2. Allocate prefetch several sequential cache lines based on allocate prefetch options. 3. Set UseSSE and prefetch options values during initialization. We don't need to specify these options in platform specific header files since they are modified during initialization any way. 4. Check new x86/x64 processors features. Update output for PrintMiscellaneous: fez% gamma -XX:AllocatePrefetchLines=3 -XX:+Verbose -XX:+PrintMiscellaneous VM option 'AllocatePrefetchLines=3' VM option '+Verbose' VM option '+PrintMiscellaneous' [SafePoint Polling address: 0xfdf40000] [Memory Serialize Page address: 0xfde90000] Logical CPUs per package: 1 UseSSE=3 L1 data cache line size: 64 Allocation: PREFETCHNTA 256, 3 lines with step 64 bytes CPU:total 4 (2 cores per cpu, 1 threads per core) family 15 model 33 stepping 0, cmov, cx8, fxsr, mmx, sse, sse2, sse3, mmxext, 3dnow, 3dnowext
15-11-2006

EVALUATION Need the flag AllocatePrefetchLines to specify how many sequential cache lines should be prefetched, the default is 1 as it is currently. Need to modify vm_version files to detect cpu type and set the cache line size in bytes.
31-10-2006