JDK-6544710 : Performance regression in RangeCheckMicroBenchmark
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 7
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • OS: solaris_9
  • CPU: sparc
  • Submitted: 2007-04-11
  • Updated: 2010-04-03
  • Resolved: 2007-05-24
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 6 JDK 7 Other
6u4Fixed 7Fixed hs10Fixed
Related Reports
Relates :  
Relates :  
Description
Martin Buchholz wrote:

I noticed that hotspot changes in jdk 7-b03 had dramatic
effects on the performance of my ArrayList microbenchmark
/net/suttles.sfbay/u/martin/ws/Iter72/test/java/util/ArrayList/RangeCheckMicroBenchmark.java


I notice that some operations are much much faster (good!)
but others are slower:

mergeBench 7-b02 7-b03 jr -dsa -da -server RangeCheckMicroBenchmark.java
==> javac -Xlint:all RangeCheckMicroBenchmark.java
==> java -dsa -da -server RangeCheckMicroBenchmark
Method                    Millis Ratio vs. 7-b02
get                          285 1.000 1.000
set                          185 0.651 0.995
get/set                      357 1.255 0.811
add/remove at end           3579 12.558 1.676
subList get                  253 0.888 0.286
subList set                  187 0.656 0.177
subList get/set              334 1.172 0.263
subList add/remove at end   4687 16.444 1.278

subList operations that are not structural changes are
vastly improved, but the structural changes add/remove
are significantly slower.  (And the factor of 10 penalty
relative to get/set seems higher than we would like)

It would be good to know whether there is something more hotspot
can do here.  If there's interest in the server compiler
team, I could file a bug for the add/remove performance regression.

(I understand there are lots of performance tradeoffs here)

Martin

Comments
SUGGESTED FIX Webrev: http://prt-web.sfbay.sun.com/net/prt-archiver.sfbay/data/archived_workspaces/main/c2_baseline/2007/20070412132611.kvn.6544710/workspace/webrevs/webrev-2007.04.12/index.html Solution: 1. Add arraycopy 0 length check by replacing current length < 0 check which 2 checks: if (length <= 0) { if (length < 0) { call slow arraycopy() } } else { call fast stub arraycopy() } 2. Add memory barrier after arraycopy call under the flag InsertMemBarAfterArraycopy which is ON by default. It reduces spills on stack by forcing reload values after arraycopy. 3. Move uncommon blocks which have uncommon predecessors to the end. It moves arraycopy slow path outside of hot code.
13-04-2007

EVALUATION Performance regression appears to be larger on 32-bit solaris-sparc than on other platforms. Here are some more results (not run under serious benchmark conditions) (mb29450@suttles) Iter72/test/java/util/ArrayList $ uname -a; repeat 5 /home/mb29450/bin/sun/mergeBench 7-b02 7-b03 /home/mb29450/bin/sun/jr -dsa -da -server -d32 RangeCheckMicroBenchmark.java | grep add/remove SunOS suttles 5.9 Generic_112233-05 sun4u sparc SUNW,Sun-Blade-1000 add/remove at end 3447 12.050 1.610 subList add/remove at end 4319 15.097 1.215 add/remove at end 2949 10.344 1.400 subList add/remove at end 4745 16.643 1.440 add/remove at end 3616 12.673 1.715 subList add/remove at end 4580 16.053 1.554 add/remove at end 3776 12.982 1.764 subList add/remove at end 5117 17.591 1.645 add/remove at end 3546 12.407 1.719 subList add/remove at end 4886 17.092 1.624 (mb29450@lasker) Iter72/test/java/util/ArrayList $ uname -a; repeat 5 /home/mb29450/bin/sun/mergeBench 7-b02 7-b03 /home/mb29450/bin/sun/jr -dsa -da -server -d32 RangeCheckMicroBenchmark.java | grep add/remove SunOS lasker 5.11 snv_55 i86pc i386 i86pc Solaris add/remove at end 1401 30.294 1.116 subList add/remove at end 1709 36.946 0.995 add/remove at end 1401 28.743 1.177 subList add/remove at end 2218 45.491 1.241 add/remove at end 1410 30.366 1.181 subList add/remove at end 2212 47.629 1.234 add/remove at end 1401 28.870 1.171 subList add/remove at end 2215 45.638 1.202 add/remove at end 1410 29.042 1.185 subList add/remove at end 2219 45.695 1.242 (martin@morphy) Iter72/test/java/util/ArrayList $ uname -a; repeat 5 mergeBench 7-b02 7-b03 jr -dsa -da -server RangeCheckMicroBenchmark.java | grep add/remove CYGWIN_NT-5.0 morphy 1.5.12(0.116/4/2) 2004-11-10 08:34 i686 unknown unknown Cygwin add/remove at end 2028 25.785 1.279 subList add/remove at end 2628 33.422 1.209 add/remove at end 2110 26.875 1.332 subList add/remove at end 2655 33.813 1.418 add/remove at end 2152 27.408 1.359 subList add/remove at end 3010 38.338 1.599 add/remove at end 1880 23.833 1.050 subList add/remove at end 2660 33.719 1.256 add/remove at end 1877 23.915 1.186 subList add/remove at end 2921 37.216 1.526
12-04-2007

EVALUATION Array-backed collections are the source of many of the performance-sensitive uses of arraycopy. Since insertions/removals at the end of the array will tend to be O(1) while insertions/removals at a random index will tend to be O(N), much code will tend to operate at the end of an array, which will result in a call to arraycopy with a length of zero. A particular example is AbstractList.add(E), which is *specified* to call add(size(), e) public boolean add(E e) { add(size(), e); return true; } which is likely to result in a zero-length call to arraycopy in some subclass.
12-04-2007

EVALUATION The regression starts after the next putback: 20061023120554.jrose.dolphin-cleanups with fixes for 6470497. John Rose wrote: I removed the inline zero-length test from the arraycopy intrinsic because it was complicating the code for some other cases, and the out-of-line paths handle the zero-length case fine. I mistakenly thought zero-length arraycopy requests were rare, because of Java-level guard code like Tom points out. Also after that putback we generates a lot more spills on stack.
11-04-2007