JDK-6498658 : System.arraycopy performance lags
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 6
  • Priority: P2
  • Status: Resolved
  • Resolution: Fixed
  • OS: generic
  • CPU: generic
  • Submitted: 2006-11-29
  • Updated: 2013-11-01
  • Resolved: 2007-03-15
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 6 JDK 7 Other
6u2Fixed 7Fixed hs10Fixed
Related Reports
Relates :  
Relates :  
Description
System.arraycopy() of char[] types is a bottleneck for many appserver benchmarks (specifically jshort_disjoint_arraycopy). We made some optimizations to the appserver around the way in which it handles certain data; essentially we changed a number of char[] to byte[]. Because of the encoding used, the byte arrays are half as long as the char arrays, and we expected a decrease in the amount of time the appserver spends copying these arrays. Instead, our general performance regressed, and the amount of time in System.arraycopy (now in jbyte_disjoint_array) doubled.

The attached test program shows the problem; somehow, c2 is using type information and not optimizing the system array copy. The test program has two types of methods. The first method uses a specific type:
    public void doit(byte[] b1, byte[] b2, int len) {
         System.arraycopy(b1, 0, b2, 0, len);
    }

The second method uses a generic type:
    public void doit(Object o1, Object o2, int len) {
          System.arraycopy(o1, 0, o2, 0, len);
    }

On Solaris/Sparc, copy various arrays gives this performance:

Time to copy 1024 bytes (1024 bytes): 436
Time to copy 512 chars (1024 bytes): 255
Time to copy 1024 chars (2048 bytes): 439
Time to copy 256 ints (1024 bytes): 254
Time to copy 1024 ints (4096 bytes): 872
Time to copy (generic interface) 1024 bytes (1024 bytes): 340
Time to copy (generic interface) 512 chars (1024 bytes): 387
Time to copy (generic interface) 256 ints (1024 bytes): 387

The first two cases are copying the same amount of data (using a method with an explicit type defined) and hence should take the same amount of time. 

The very odd thing is that the last three cases all also copy 1024 bytes (using the Object-type interface) and take the same amount of time regardless of the actual data type (but still take longer than the best cases where the type is known).

With C1, the times are roughly the same (in fact, they favor byte[] copying slightly).

I listed the OS/hardware as generic, but in fact I've observed this only on Solaris (both sparc and x86) and windows (i586). On Linux (i586) the performance was as expected (amount of time was always dependent on total number of bytes copied).

Comments
SUGGESTED FIX Webrev: http://prt-web.sfbay.sun.com/net/prt-archiver.sfbay/data/archived_workspaces/main/c2_baseline/2007/20070214151808.kvn.6498658/workspace/webrevs/webrev-2007.02.14/index.html
15-02-2007

SUGGESTED FIX Optimize arraycopy stubs for all types. I rewrote all of them (except for amd64 which were well optimized already). I also added generic arraycopy stub when Object passed as arrays. The stub does a dynamic checks and jumps to a type specific stub if arrays well defined or return -1 to go slow path. This is what C1 is doing for some time. I also added stack frame for stubs on x86 since it is helpful and does not hurt performance. Appserver performance improved by 6% on N1 with this fix. I added different platforms results to the bug report with a modified the test case.
11-01-2007

EVALUATION We have optimized stubs only for char arraycopy.
11-01-2007