JDK-7176485 : (bf) Allow temporary buffer cache to grow to IOV_MAX
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.nio
  • Affected Version: 6u10,7u4
  • Priority: P3
  • Status: Closed
  • Resolution: Fixed
  • OS: linux
  • CPU: x86
  • Submitted: 2012-06-12
  • Updated: 2016-07-27
  • Resolved: 2012-07-11
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 7 JDK 8
7u10Fixed 8 b45Fixed
Related Reports
Duplicate :  
Description
JMS non-persistent micro benchmark with 16K message size, 8 queues, runs at a rate of 40K messages per second. The RT is poor because of a particular lock inflating. The cause of lock inflation is the inefficient use of DirectByteBuffers by sun.nio.ch.IOUtil.

java version "1.7.0-ea"
Java(TM) SE Runtime Environment (build 1.7.0-ea-b141)
Java HotSpot(TM) 64-Bit Server VM (build 21.0-b11, mixed mode)

Sun Analyzer shows the following:

java.nio.DirectByteBuffer.<init>(int)
        Exclusive User CPU Time:    3.351 (  0.1%)
        Inclusive User CPU Time:  286.137 (  7.5%)
                           Size:      146
                     PC Address: 2:0x000015aa
                    Source File: /tmp/analyzer.DirectByteBuffer.java.16752
                    Object File: java.nio.DirectByteBuffer
                    Load Object: /home/mwm_jms/JMSB/log/214/server.er/jclasses
                   Mangled Name: java.nio.DirectByteBuffer.<init>
                        Aliases:

(as a result, the JMS-specific path around the following method takes 19%)
weblogic.socket.NIOSocketMuxer$NIOOutputStream.write(java.nio.ByteBuffer[], int, int)
        Exclusive User CPU Time:    1.070 (  0.0%)
        Inclusive User CPU Time:  737.277 ( 19.3%)
                           Size:       88
                     PC Address: 2:0x000012e9
                    Source File: weblogic/socket/NIOSocketMuxer.java
                    Object File: weblogic.socket.NIOSocketMuxer$NIOOutputStream
                    Load Object: /home/mwm_jms/JMSB/log/214/server.er/jclasses
                   Mangled Name: weblogic.socket.NIOSocketMuxer$NIOOutputStream.write
                        Aliases:

call tree looks like this:
 567.193 100.00         |  +-weblogic.socket.
NIOSocketMuxer$NIOOutputStream.write(java.nio.ByteBuffer[], int, int)
 566.003  99.79         |    +-sun.nio.ch.
SocketChannelImpl.write(java.nio.ByteBuffer[], int, int)
 555.391  98.13         |    |  +-sun.nio.ch.
IOUtil.write(java.io.FileDescriptor, java.nio.ByteBuffer[], int, int, sun.nio.ch.NativeDispatcher)
 235.957  42.48         |    |  |  +-sun.nio.ch.
SocketDispatcher.writev(java.io.FileDescriptor, long, int)
 235.917  99.98         |    |  |  |  +-sun.nio.ch.
FileDispatcherImpl.writev0(java.io.FileDescriptor, long, int)
 234.587  99.44         |    |  |  |    +-Java_sun_nio_ch_FileDispatcherImpl_writev0
 233.587  99.57         |    |  |  |      +-writev
   0.370   0.16         |    |  |  |      |  +-__libc_enable_asynccancel
   0.360   0.15         |    |  |  |      |  +-__libc_disable_asynccancel
   0.470   0.20         |    |  |  |      +-jni_fast_GetIntField
   0.020   0.01         |    |  |  |      +-_init
 208.232  37.49         |    |  |  +-sun.nio.ch.Util.getTemporaryDirectBuffer(int)
 203.651  97.80         |    |  |  |  +-java.nio.ByteBuffer.allocateDirect(int)
 202.681  99.52         |    |  |  |  |  +-java.nio.DirectByteBuffer.<init>(int)
  87.287  43.07         |    |  |  |  |    +-java.nio.Bits.reserveMemory(long, int)
  64.053  31.60         |    |  |  |  |    +-sun.misc.
Unsafe.setMemory(long, long, byte)
  63.993  99.91         |    |  |  |  |    |  +-sun.misc.
Unsafe.setMemory(java.lang.Object, long, long, byte)



We notice sun.nio.ch.Util.getTemporaryDirectBuffer relies on a ThreadLocal cache of DirectByteBuffers. It appears from the profile this cache must be too small. When we modify this class to have a much larger cache, we gain 1.2x throughput improvement (becomes 48926.69 from 40805.93), primarily due to the lock getting deflated. You can appreciate the CPU reduction should have significant impact on other cases. The below shows savings seen in profile:

java.nio.DirectByteBuffer.<init>(int)
        Exclusive User CPU Time:    0.100 (  0.0%)
        Inclusive User CPU Time:    4.761 (  0.1%)
                           Size:      146
                     PC Address: 2:0x000015aa
                    Source File: /tmp/analyzer.DirectByteBuffer.java.16755
                    Object File: java.nio.DirectByteBuffer
                    Load Object: /home/mwm_jms/JMSB/log/213/server.er/jclasses
                   Mangled Name: java.nio.DirectByteBuffer.<init>
                        Aliases:
weblogic.socket.NIOSocketMuxer$NIOOutputStream.write(java.nio.ByteBuffer[], int, int)
        Exclusive User CPU Time:    1.360 (  0.0%)
        Inclusive User CPU Time:  434.637 ( 11.5%)
                           Size:       88
                     PC Address: 2:0x000012e9
                    Source File: weblogic/socket/NIOSocketMuxer.java
                    Object File: weblogic.socket.NIOSocketMuxer$NIOOutputStream
                    Load Object: /home/mwm_jms/JMSB/log/213/server.er/jclasses
                   Mangled Name: weblogic.socket.NIOSocketMuxer$NIOOutputStream.write
                        Aliases:

As you can see, the DirectByteBuffer allocation went down to nothing, and the JMS-specific code path dropped 8% of CPU time (and the throughput of this code path increased 1.2x times).

Comments
EVALUATION A simpler solution, that avoids the configuration, is to allow the temporary buffer cache grow to IOV_MAX.
15-06-2012

EVALUATION .
13-06-2012

EVALUATION This is a request to increase the size of the temporary buffer cache for applications that do gathering write operations with more than 8 non-direct buffers in the array. It's not clear why the application is doing gathering writes with such a large number of buffers as the common use-case is 2 or 3 bufffers. Also this API is optimized for use with direct buffers, it is not necessary to use tempoary buffers when using direct buffers.
13-06-2012