United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
Bug ID: JDK-6488532 Support allocate prefetching of several sequential cache lines
JDK-6488532 : Support allocate prefetching of several sequential cache lines

Details
Type:
Enhancement
Submit Date:
2006-10-31
Status:
Resolved
Updated Date:
2010-04-03
Project Name:
JDK
Resolved Date:
2006-12-02
Component:
hotspot
OS:
solaris_10
Sub-Component:
compiler
CPU:
x86
Priority:
P4
Resolution:
Fixed
Affected Versions:
7
Fixed Versions:
hs10 (b03)

Related Reports
Backport:
Backport:
Relates:

Sub Tasks

Description
I see about 3% improvement of jbb2005 score on Core2Duo with
-XX:AllocatePrefetchLines=3 vs -XX:AllocatePrefetchLines=1 (default).

The time spent in arraycopy dropped from 96 sec to 26 sec:

-XX:AllocatePrefetchLines=1 :
96.508 100.00    96.508   4.67    96.508   4.67  *arrayof_jshort_disjoint_arraycopy

-XX:AllocatePrefetchLines=3 :
25.868 100.00    25.868   1.25    25.868   1.25  *arrayof_jshort_disjoint_arraycopy

                                    

Comments
EVALUATION

Need the flag AllocatePrefetchLines to specify how many sequential cache lines
should be prefetched, the default is 1 as it is currently.
Need to modify vm_version files to detect cpu type and set the cache line size in bytes.
                                     
2006-10-31
SUGGESTED FIX

Webrev:                 http://analemma.sfbay.sun.com/net/prt-archiver.sfbay/data/archived_workspaces/main/c2_baseline/2006/20061114191212.kvn.6488532/workspace/webrevs/webrev-2006.11.14/index.html

1. Added new allocate prefetch options:
  Number of lines to prefetch:  AllocatePrefetchLines = 3 on x86 / 1 on sparc  
  Step size in bytes of sequential prefetch: AllocatePrefetchStepSize = 16
  (niagara1 == 16, sparc == 32, x86 L1 cache line size)
  Prefetch instruction used to allocate: AllocatePrefetchInstr = 0
  ( allow to select prefetch instruction on x86/x64:
    0 - PREFETCHNTA, 1 - PREFETCHT0, 2 - PREFETCHT2, 3 - PREFETCHW )
  Prefetch instruction used to read: ReadPrefetchInstr = 0
  ( allow to select prefetch instruction on x86/x64:
    0 - PREFETCHNTA, 1 - PREFETCHT0, 2 - PREFETCHT2, 3 - PREFETCHR )

  By default PREFETCHNTA is used instead of PREFETCHW on Opteron since
  it shows better jbb scores.

2. Allocate prefetch several sequential cache lines based on
  allocate prefetch options.

3. Set UseSSE and prefetch options values during initialization.
  We don't need to specify these options in platform specific header
  files since they are modified during initialization any way.

4. Check new x86/x64 processors features. 
  Update output for PrintMiscellaneous:

fez% gamma -XX:AllocatePrefetchLines=3 -XX:+Verbose -XX:+PrintMiscellaneous
VM option 'AllocatePrefetchLines=3'
VM option '+Verbose'
VM option '+PrintMiscellaneous'
[SafePoint Polling address: 0xfdf40000]
[Memory Serialize  Page address: 0xfde90000]
Logical CPUs per package: 1
UseSSE=3
L1 data cache line size: 64
Allocation: PREFETCHNTA 256, 3 lines with step 64 bytes
CPU:total 4 (2 cores per cpu, 1 threads per core) family 15 model 33 stepping 0,
 cmov, cx8, fxsr, mmx, sse, sse2, sse3, mmxext, 3dnow, 3dnowext
                                     
2006-11-15



Hardware and Software, Engineered to Work Together