JDK-7059037 : Use BIS for zeroing on T4
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: hs23,7
  • Priority: P4
  • Status: Closed
  • Resolution: Fixed
  • OS: generic,solaris_10
  • CPU: sparc
  • Submitted: 2011-06-24
  • Updated: 2011-11-28
  • Resolved: 2011-11-28
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 7 JDK 8 Other
7u2Fixed 8Fixed hs22Fixed
Related Reports
Duplicate :  
Description
Hi Vladimir,

For user-level programs such as JVM/GC, there are only two Block Initializing Store ASIs that are of real interest, most other variants are for privileged and hyper-privileged usage:

ASI_ST_BLKINIT_PRIMARY (ASI_STBI_P), ASI Value 0xE2
ASI_ST_BLKINIT_MRU_PRIMARY (ASI_STBIMRU_P), ASI Value 0xF2

Most-Recently-Used (MRU) variant of this special store controls $L2 replacement policy, and is a result of our SPARC-Java VT collaboration that we started in 2007; citing Mark Luttrell from one of our earlier discussions on this topic:

"My understanding is that in the L2, past projects always reset the used bit (or similar state) of a line touched by a block-initializing store.  This caused it to be preferred for eviction, and helped to keep a copy operation from wiping out a large portion of the cache data.  Early on in VT, the Java group was looking at using block-init store for stuff besides copying (object allocation and initialization I believe), but the behavior of making the line preferred for eviction was undesirable there.  (If you're initializing an object you're probably going to use it soon.)  So, we added the MRU variants which install the line as *most* recently used - the same behavior as a typical cache access."

Please sync with Mark Luttrell about current status of this implementation and memory ordering issues that John described below. As Mark explained earlier today, we expect to see performance gains over traditional Block Load/Store variants, so all this is good news for performance!

Best regards,

-- Zoran Radovic

Comments
EVALUATION See main CR
14-09-2011

EVALUATION http://hg.openjdk.java.net/hsx/hotspot-main/hotspot/rev/baf763f388e6
08-09-2011

PUBLIC COMMENTS On T4 BIS to the beginning of cache line always zeros it. Use it for zeroing new allocated java objects. The main code is in MacroAssembler::bis_zeroing() and is used by C2 generated code (ClearArray), runtime (Copy::fill_to_aligned_words()) and template interpreter (TemplateTable::_new()). New stub zero_aligned_words was added to use in runtime. BIS is used only for objects bigger than BlkZeroingLowLimit (2Kbyte) since it requires membar. 2Hb was selected based on microbenchmark results. I also added wrasi(Reg, immI) instruction which I used during development. VM_Version::has_mru_blk_init() is replaced with has_blk_zeroing() since original was not used. Zap new object in CollectedHeap::allocate_from_tlab_slow() instead of zeroing it since it will be cleaned later in init_obj(). Fixed call sites of check_for_bad_heap_word_value() where klass is not initialized to avoid the verification failure.
26-08-2011

EVALUATION http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/baf763f388e6
26-08-2011