JDK-4463011 : (bf) View-buffer bulk get/put operations are slow
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.nio
  • Affected Version: 1.4.0
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • OS: solaris_8,windows_nt,windows_2000
  • CPU: x86,sparc
  • Submitted: 2001-05-24
  • Updated: 2013-11-01
  • Resolved: 2002-02-08
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other Other
1.4.0_01 01Fixed 1.4.1Fixed
Related Reports
Duplicate :  
Duplicate :  
Relates :  
Description
ingrid.yao@Eng 2001-05-24


J2SE Version (please include all output from java -version flag):

    java version "1.4.0-beta"
    Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.0-beta-b64)
    Java HotSpot(TM) Client VM (build 1.4.0-beta-b64, mixed mode)

Does this problem occur on J2SE 1.3?  Yes / No (pick one)

    No

Operating System Configuration Information (be specific):

      Win 2000 5.00.2195 Service pack 1

Hardware Configuration Information (be specific):

     x86 Family 6 Model 7 Stepping 3

Bug Description:

     bulk transfer (using eg DoubleBuffer) are 2-3 times slower 
     than element access. should be otherway around. 

Test programs: (see attachments)
    4 files - readNIOSmap.java, readNIOSmapBulk.java, writeNIOSmap.java
              and writeNIOSmapBulk.java

Test result:

  The slowdown doesnt effect writing until size ~ 1M elements. reading 
  is about twice as slow on 10K elements, a bit slower on 1M. It only 
  is a problem when n is large, but that is when you hope bulk transfers
  would speed things up.

  n=10,000:

   write NIO Stream mapped(10000) 0.094 seconds 4.9995E7
   write NIO Stream mapped Bulk(10000) 0.031 seconds 4.9995E7
   Read NIO Stream mapped (10000) takes  0.009000000000000001 seconds 4.9995E7
   Read NIO Stream mapped Bulk (10000) takes 0.021 seconds 4.999

  n=100,000:

   write NIO Stream mapped(100000) 0.188 seconds 4.99995E9
   write NIO Stream mapped Bulk(100000) 0.14100000000000001 seconds 4.99995E9
   Read NIO Stream mapped (100000) takes  0.08600000000000001 seconds 4.99995E9
   Read NIO Stream mapped Bulk (100000) takes 0.20600000000000002 seconds 4.9999

  n=1,000,000:

   write NIO Stream mapped(1000000) 1.0 seconds 4.999995E11
   write NIO Stream mapped Bulk(1000000) 1.218 seconds 4.999995E11
   Read NIO Stream mapped (1000000) takes  0.878 seconds 4.999995E11
   Read NIO Stream mapped Bulk (1000000) takes 2.101 seconds 4.999995

Comments
CONVERTED DATA BugTraq+ Release Management Values COMMIT TO FIX: 1.4.0_01 hopper FIXED IN: 1.4.0_01 hopper INTEGRATED IN: 1.4.0_01 hopper
14-06-2004

EVALUATION This is a known performance problem that we know how to fix. -- mr@eng 2001/5/24 Summary: Bulk get and put operations are no longer slower than individual element access. On average, we have observed that bulk operations are faster by at least an order of magnitude over the times observed when the bug was reported. The fastest transfers occur when the byte order of the buffer is the same as the platform's byte order (i.e. Buffer.order() == ByteOrder.nativeOrder()) and "java -server" is used. In this example, "n" is the number of doubles in the buffer. For each value of n, we provide the elapsed time to write (put()) and read (get()) for the jdk versin at the time this bug was submitted (b64), a near-fcs version of jdk1.4 (b90) and the newest implementation produced by this bug fix. The non-bulk numbers indicate the elapsed time to perform a copy of the entire buffer by accessing and copying each element individually. The "BULK" numbers indicate the amount of time using the bulk operations. The following timings (in seconds) were generated using a Pentium III 500MHz running Linux Redhat 6.2. -client n = 10000 n = 100000 n = 1000000 b64 b90 new b64 b90 new b64 b90 new ------------------- ------------------- ------------------- Write 0.040 0.026 0.029 0.155 0.047 0.046 1.275 0.255 0.206 Write BULK 0.028 0.016 0.001 0.135 0.022 0.006 1.841 0.233 0.049 Read 0.027 0.002 0.002 0.270 0.016 0.017 2.760 0.182 0.155 Read BULK 0.020 0.000 0.000 0.207 0.009 0.005 2.078 0.093 0.046 -server n = 10000 n = 100000 n = 1000000 b64 b90 new b64 b90 new b64 b90 new ------------------- ------------------- ------------------- Write 0.168 0.081 0.080 0.384 0.828 0.777 1.702 1.289 1.234 Write BULK 0.119 0.066 0.001 0.314 0.516 0.005 2.667 0.857 0.050 Read 0.026 0.072 0.059 0.156 0.088 0.065 1.446 0.143 0.136 Read BULK 0.014 0.000 0.000 0.151 0.007 0.005 1.468 0.072 0.047 Implementation Details: Several approaches were investigated including use of current existing VM intrinsics, introduction of new VM intrinsics, and use of JNI. First, we considered using existing VM intrinsics accessible through the current implementation of sun.misc.Unsafe. Unfortunately, the base address of the array (declared in the heap) may change (i.e. garbage collection (gc) may occur) between the time that Unsafe.arrayBaseOffset() and Unsafe.copyMemory() are called. The only possible way to handle this problem would be to design some concept of "critical section" at the java-level during which gc is guaranteed not to occur. Control of gc is currently not exposed at this level. Another option which was rejected but may still be implemented at a later date is to request that new VM intrinsics be added to Unsafe. While this may possibly result in a reasonable performance improvement, the work required to handle all types and all necessary byte-swapping may make the trade-off of work-to-benefit insufficient to justify implemention. Obviously, this change would require synchronized (or sequenced) updates in both the VM and the library code. The solution implemented uses critical sections within JNI (GetPrimitiveArrayCritical and ReleasePrimitiveArrayCritical), used in src/share/native/nio/java/Bits.c and accessed through src/share/classes/java/nio/Bits.java. Though we are guaranteed that no gc will occur, if another simultaneously running thread determines that gc is necessary, the heap will grow in unexpected ways and the entire VM execution will slow down. The VM will crash if it runs out of memory. We attempt to work around this problem by doing the desired memory transfers in 1M blocks. The size of the block was chosen arbitrarily. The hope is that if a gc is needed, it may be performed between these transfers. We empirically determined the point at which the average cost of a JNI call exceeds the expense of an element by element copy. We determined this this by sampling on a a number of platforms. Why is Bits.c not in libnio.so? Ideally, we would has simply added a call to sun.nio.ch.Util.load() in the static initializer of Bits.java; however, since this class is used by the compiler, the nio and net shared object files (or dll's) have not yet been created. Rather than re-arrange the build to create these files earlier, Bits.c has been added to libjava, which is guaranteed to be loaded during VM initialization. -- iag@sfbay 2002-02-07
07-02-2002

SUGGESTED FIX Files created/modified: src/share/classes/java/nio/Bits.java src/share/classes/java/nio/Direct-X-Buffer.java src/share/native/java/nio/Bits.c make/java/java/Makefile make/java/java/mapfile-vers make/minclude/java_java.cmk make/minclude/java_java.jmk test/java/nio/Buffer/Basic.java test/java/nio/Buffer/genCopyDirectMemory.sh test/java/nio/Buffer/CopyDirect-X-Memory.java test/java/nio/Buffer/CopyDirectMemory.java test/java/nio/Buffer/CopyDirectByteMemory.java test/java/nio/Buffer/CopyDirectLongMemory.java test/java/nio/Buffer/CopyDirectShortMemory.java test/java/nio/Buffer/CopyDirectCharMemory.java test/java/nio/Buffer/CopyDirectIntMemory.java test/java/nio/Buffer/CopyDirectDoubleMemory.java test/java/nio/Buffer/CopyDirectFloatMemory.java -- iag@sfbay 2002-02-07
07-02-2002