JDK-8158012 : Use SW prefetch instructions instead of BIS for allocation prefetches on SPARC Core C4
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 7
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • OS: generic
  • CPU: sparc,sparc_64
  • Submitted: 2016-05-27
  • Updated: 2019-01-14
  • Resolved: 2016-12-07
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 8 JDK 9
8u192Fixed 9 b150Fixed
Description
Analysis by Prasad Vidhyabaskaran and Jeff Oplinger

Loads that follow BIS (Block Init Store) based allocation prefetches (which is the default on SPARC processors), suffer from Partial RAW (Read After Write) Hazards. Partial RAWs perform slower than Full RAW Hazards on T7 (C4 core). When SW Prefetches are used for allocation prefetch (-XX:AllocatePrefetchInstr=0) , the partial RAW hazard is eliminated and resulting full RAW hazards are handled more efficiently, thus improving performance.

Performance measurements were done using JMH testsuite (on SPARC T7), aurora tool (on SPARC T4), and stand alone SPECJBB2005  runs (on SPARC T7).
 - 70% of JMH test cases showed improvements in the range of 1% to more than 4x, with 30% showing more than 5% gain. 28% of the tests regressed by less than 5%.
 - Results from Aurora runs can be seen at the following link
http://aurora.se.oracle.com/performance/reporting/report/prasad.vidhyabaskaran.java_jvm_prefetch_flag_eval_solaris_sparc?mode=prasad.vidhyabaskaran.style3.instr1
These show improvements between 1% to 5% on most of the workloads, and a small regression on just the SPECjvm2008.serial workload. 
 - On SPECjbb2005, lower number of warehouse threads showed anywhere between 1 to 2.7% improvements and < 2% regression in peak warehouse step when memory bandwidth was exercised heavily. 

It can be noted that performance gains are measurably larger than few regressing cases that were noted. Recommendation is to change the default prefetching choice to use SW prefetches (AllocatePrefetchInstr=0) on SPARC-T7 processors.
Comments
FC Extension request Justification: Minor change that adjusts the default prefetching mechanism only on Oracle SPARC Core C4 processor based systems and can offer significant performance improvements. Risk: Low as it only impacts the default behavior of the JDK. Due date: Code changes are currently in review process Code review webrev: http://cr.openjdk.java.net/~kvn/8158012/webrev.00/ Code review thread: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2016-December/025089.html
22-06-2017