United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
Bug ID: JDK-7059037 Use BIS for zeroing on T4
JDK-7059037 : Use BIS for zeroing on T4

Details
Type:
Enhancement
Submit Date:
2011-06-24
Status:
Closed
Updated Date:
2011-11-28
Project Name:
JDK
Resolved Date:
2011-11-28
Component:
hotspot
OS:
generic,solaris_10
Sub-Component:
compiler
CPU:
sparc
Priority:
P4
Resolution:
Fixed
Affected Versions:
hs23,7
Fixed Versions:
hs22 (b05)

Related Reports
Backport:
Backport:
Duplicate:

Sub Tasks

Description
Hi Vladimir,

For user-level programs such as JVM/GC, there are only two Block Initializing Store ASIs that are of real interest, most other variants are for privileged and hyper-privileged usage:

ASI_ST_BLKINIT_PRIMARY (ASI_STBI_P), ASI Value 0xE2
ASI_ST_BLKINIT_MRU_PRIMARY (ASI_STBIMRU_P), ASI Value 0xF2

Most-Recently-Used (MRU) variant of this special store controls $L2 replacement policy, and is a result of our SPARC-Java VT collaboration that we started in 2007; citing Mark Luttrell from one of our earlier discussions on this topic:

"My understanding is that in the L2, past projects always reset the used bit (or similar state) of a line touched by a block-initializing store.  This caused it to be preferred for eviction, and helped to keep a copy operation from wiping out a large portion of the cache data.  Early on in VT, the Java group was looking at using block-init store for stuff besides copying (object allocation and initialization I believe), but the behavior of making the line preferred for eviction was undesirable there.  (If you're initializing an object you're probably going to use it soon.)  So, we added the MRU variants which install the line as *most* recently used - the same behavior as a typical cache access."

Please sync with Mark Luttrell about current status of this implementation and memory ordering issues that John described below. As Mark explained earlier today, we expect to see performance gains over traditional Block Load/Store variants, so all this is good news for performance!

Best regards,

-- Zoran Radovic

                                    

Comments
EVALUATION

http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/baf763f388e6
                                     
2011-08-26
PUBLIC COMMENTS

On T4 BIS to the beginning of cache line always zeros it. Use it for zeroing new
allocated java objects. The main code is in MacroAssembler::bis_zeroing() and is
used by C2 generated code (ClearArray), runtime (Copy::fill_to_aligned_words())
and template interpreter (TemplateTable::_new()). New stub zero_aligned_words
was added to use in runtime.

BIS is used only for objects bigger than BlkZeroingLowLimit (2Kbyte) since it
requires membar. 2Hb was selected based on microbenchmark results.

I also added wrasi(Reg, immI) instruction which I used during development.
VM_Version::has_mru_blk_init() is replaced with has_blk_zeroing() since original
was not used.
Zap new object in CollectedHeap::allocate_from_tlab_slow() instead of zeroing it
since it will be cleaned later in init_obj().
Fixed call sites of check_for_bad_heap_word_value() where klass is not
initialized to avoid the verification failure.
                                     
2011-08-26
EVALUATION

http://hg.openjdk.java.net/hsx/hotspot-main/hotspot/rev/baf763f388e6
                                     
2011-09-08
EVALUATION

See main CR
                                     
2011-09-14



Hardware and Software, Engineered to Work Together