United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
Bug ID: JDK-6544710 Performance regression in RangeCheckMicroBenchmark
JDK-6544710 : Performance regression in RangeCheckMicroBenchmark

Details
Type:
Bug
Submit Date:
2007-04-11
Status:
Resolved
Updated Date:
2010-04-03
Project Name:
JDK
Resolved Date:
2007-05-24
Component:
hotspot
OS:
solaris_9
Sub-Component:
compiler
CPU:
sparc
Priority:
P4
Resolution:
Fixed
Affected Versions:
7
Fixed Versions:
hs10 (b13)

Related Reports
Backport:
Backport:
Relates:
Relates:

Sub Tasks

Description
Martin Buchholz wrote:

I noticed that hotspot changes in jdk 7-b03 had dramatic
effects on the performance of my ArrayList microbenchmark
/net/suttles.sfbay/u/martin/ws/Iter72/test/java/util/ArrayList/RangeCheckMicroBenchmark.java


I notice that some operations are much much faster (good!)
but others are slower:

mergeBench 7-b02 7-b03 jr -dsa -da -server RangeCheckMicroBenchmark.java
==> javac -Xlint:all RangeCheckMicroBenchmark.java
==> java -dsa -da -server RangeCheckMicroBenchmark
Method                    Millis Ratio vs. 7-b02
get                          285 1.000 1.000
set                          185 0.651 0.995
get/set                      357 1.255 0.811
add/remove at end           3579 12.558 1.676
subList get                  253 0.888 0.286
subList set                  187 0.656 0.177
subList get/set              334 1.172 0.263
subList add/remove at end   4687 16.444 1.278

subList operations that are not structural changes are
vastly improved, but the structural changes add/remove
are significantly slower.  (And the factor of 10 penalty
relative to get/set seems higher than we would like)

It would be good to know whether there is something more hotspot
can do here.  If there's interest in the server compiler
team, I could file a bug for the add/remove performance regression.

(I understand there are lots of performance tradeoffs here)

Martin

                                    

Comments
EVALUATION

The regression starts after the next putback:

20061023120554.jrose.dolphin-cleanups with fixes for 6470497.

John Rose wrote:
 I removed the inline zero-length test from the arraycopy intrinsic 
 because it was complicating the code for some other cases, and the 
 out-of-line paths handle the zero-length case fine.
 I mistakenly thought zero-length arraycopy requests were rare,  because
 of Java-level guard code like Tom points out.


Also after that putback we generates a lot more spills on stack.
                                     
2007-04-11
EVALUATION

Performance regression appears to be larger on 32-bit solaris-sparc
than on other platforms.  Here are some more results
(not run under serious benchmark conditions)

(mb29450@suttles) Iter72/test/java/util/ArrayList $ uname -a; repeat 5 /home/mb29450/bin/sun/mergeBench 7-b02 7-b03 /home/mb29450/bin/sun/jr -dsa -da -server -d32 RangeCheckMicroBenchmark.java | grep add/remove
SunOS suttles 5.9 Generic_112233-05 sun4u sparc SUNW,Sun-Blade-1000
add/remove at end           3447 12.050 1.610
subList add/remove at end   4319 15.097 1.215
add/remove at end           2949 10.344 1.400
subList add/remove at end   4745 16.643 1.440
add/remove at end           3616 12.673 1.715
subList add/remove at end   4580 16.053 1.554
add/remove at end           3776 12.982 1.764
subList add/remove at end   5117 17.591 1.645
add/remove at end           3546 12.407 1.719
subList add/remove at end   4886 17.092 1.624


(mb29450@lasker) Iter72/test/java/util/ArrayList $ uname -a; repeat 5 /home/mb29450/bin/sun/mergeBench 7-b02 7-b03 /home/mb29450/bin/sun/jr -dsa -da -server -d32 RangeCheckMicroBenchmark.java | grep add/remove
SunOS lasker 5.11 snv_55 i86pc i386 i86pc Solaris
add/remove at end           1401 30.294 1.116
subList add/remove at end   1709 36.946 0.995
add/remove at end           1401 28.743 1.177
subList add/remove at end   2218 45.491 1.241
add/remove at end           1410 30.366 1.181
subList add/remove at end   2212 47.629 1.234
add/remove at end           1401 28.870 1.171
subList add/remove at end   2215 45.638 1.202
add/remove at end           1410 29.042 1.185
subList add/remove at end   2219 45.695 1.242


(martin@morphy) Iter72/test/java/util/ArrayList $ uname -a; repeat 5 mergeBench 7-b02 7-b03 jr -dsa -da -server RangeCheckMicroBenchmark.java | grep add/remove
CYGWIN_NT-5.0 morphy 1.5.12(0.116/4/2) 2004-11-10 08:34 i686 unknown unknown Cygwin
add/remove at end           2028 25.785 1.279
subList add/remove at end   2628 33.422 1.209
add/remove at end           2110 26.875 1.332
subList add/remove at end   2655 33.813 1.418
add/remove at end           2152 27.408 1.359
subList add/remove at end   3010 38.338 1.599
add/remove at end           1880 23.833 1.050
subList add/remove at end   2660 33.719 1.256
add/remove at end           1877 23.915 1.186
subList add/remove at end   2921 37.216 1.526
                                     
2007-04-12
EVALUATION

Array-backed collections are the source of many of the performance-sensitive
uses of arraycopy.  Since insertions/removals at the end of the array
will tend to be O(1) while insertions/removals at a random index will tend
to be O(N), much code will tend to operate at the end of an array,
which will result in a call to arraycopy with a length of zero.

A particular example is AbstractList.add(E), which is *specified*
to call add(size(), e)    

    public boolean add(E e) {
	add(size(), e);
	return true;
    }

which is likely to result in a zero-length call to arraycopy in some subclass.
                                     
2007-04-12
SUGGESTED FIX

Webrev:                 http://prt-web.sfbay.sun.com/net/prt-archiver.sfbay/data/archived_workspaces/main/c2_baseline/2007/20070412132611.kvn.6544710/workspace/webrevs/webrev-2007.04.12/index.html

Solution:
1. Add arraycopy 0 length check by replacing current length < 0 check
which 2 checks:
     if (length <= 0) {
       if (length < 0) {
         call slow arraycopy()
       }
    } else {
      call fast stub arraycopy()
    }

2. Add memory barrier after arraycopy call under the flag
InsertMemBarAfterArraycopy which is ON by default.
It reduces spills on stack by forcing reload values after arraycopy.

3. Move uncommon blocks which have uncommon predecessors to the end.
It moves arraycopy slow path outside of hot code.
                                     
2007-04-13



Hardware and Software, Engineered to Work Together