United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
JDK-8000303 : (bf) Allow temporary buffer cache to grow to IOV_MAX

Details
Type:
Backport
Submit Date:
2012-10-01
Status:
Closed
Updated Date:
2013-03-13
Project Name:
JDK
Resolved Date:
2012-10-02
Component:
core-libs
OS:
Sub-Component:
java.nio
CPU:
Priority:
P3
Resolution:
Fixed
Affected Versions:
6u10,7u4
Fixed Versions:
7u10 (b12)

Related Reports
Backport:

Sub Tasks

Description
JMS non-persistent micro benchmark with 16K message size, 8 queues, runs at a rate of 40K messages per second. The RT is poor because of a particular lock inflating. The cause of lock inflation is the inefficient use of DirectByteBuffers by sun.nio.ch.IOUtil.

java version "1.7.0-ea"
Java(TM) SE Runtime Environment (build 1.7.0-ea-b141)
Java HotSpot(TM) 64-Bit Server VM (build 21.0-b11, mixed mode)

Sun Analyzer shows the following:

java.nio.DirectByteBuffer.<init>(int)
        Exclusive User CPU Time:    3.351 (  0.1%)
        Inclusive User CPU Time:  286.137 (  7.5%)
                           Size:      146
                     PC Address: 2:0x000015aa
                    Source File: /tmp/analyzer.DirectByteBuffer.java.16752
                    Object File: java.nio.DirectByteBuffer
                    Load Object: /home/mwm_jms/JMSB/log/214/server.er/jclasses
                   Mangled Name: java.nio.DirectByteBuffer.<init>
                        Aliases:

(as a result, the JMS-specific path around the following method takes 19%)
weblogic.socket.NIOSocketMuxer$NIOOutputStream.write(java.nio.ByteBuffer[], int, int)
        Exclusive User CPU Time:    1.070 (  0.0%)
        Inclusive User CPU Time:  737.277 ( 19.3%)
                           Size:       88
                     PC Address: 2:0x000012e9
                    Source File: weblogic/socket/NIOSocketMuxer.java
                    Object File: weblogic.socket.NIOSocketMuxer$NIOOutputStream
                    Load Object: /home/mwm_jms/JMSB/log/214/server.er/jclasses
                   Mangled Name: weblogic.socket.NIOSocketMuxer$NIOOutputStream.write
                        Aliases:

call tree looks like this:
 567.193 100.00         |  +-weblogic.socket.
NIOSocketMuxer$NIOOutputStream.write(java.nio.ByteBuffer[], int, int)
 566.003  99.79         |    +-sun.nio.ch.
SocketChannelImpl.write(java.nio.ByteBuffer[], int, int)
 555.391  98.13         |    |  +-sun.nio.ch.
IOUtil.write(java.io.FileDescriptor, java.nio.ByteBuffer[], int, int, sun.nio.ch.NativeDispatcher)
 235.957  42.48         |    |  |  +-sun.nio.ch.
SocketDispatcher.writev(java.io.FileDescriptor, long, int)
 235.917  99.98         |    |  |  |  +-sun.nio.ch.
FileDispatcherImpl.writev0(java.io.FileDescriptor, long, int)
 234.587  99.44         |    |  |  |    +-Java_sun_nio_ch_FileDispatcherImpl_writev0
 233.587  99.57         |    |  |  |      +-writev
   0.370   0.16         |    |  |  |      |  +-__libc_enable_asynccancel
   0.360   0.15         |    |  |  |      |  +-__libc_disable_asynccancel
   0.470   0.20         |    |  |  |      +-jni_fast_GetIntField
   0.020   0.01         |    |  |  |      +-_init
 208.232  37.49         |    |  |  +-sun.nio.ch.Util.getTemporaryDirectBuffer(int)
 203.651  97.80         |    |  |  |  +-java.nio.ByteBuffer.allocateDirect(int)
 202.681  99.52         |    |  |  |  |  +-java.nio.DirectByteBuffer.<init>(int)
  87.287  43.07         |    |  |  |  |    +-java.nio.Bits.reserveMemory(long, int)
  64.053  31.60         |    |  |  |  |    +-sun.misc.
Unsafe.setMemory(long, long, byte)
  63.993  99.91         |    |  |  |  |    |  +-sun.misc.
Unsafe.setMemory(java.lang.Object, long, long, byte)



We notice sun.nio.ch.Util.getTemporaryDirectBuffer relies on a ThreadLocal cache of DirectByteBuffers. It appears from the profile this cache must be too small. When we modify this class to have a much larger cache, we gain 1.2x throughput improvement (becomes 48926.69 from 40805.93), primarily due to the lock getting deflated. You can appreciate the CPU reduction should have significant impact on other cases. The below shows savings seen in profile:

java.nio.DirectByteBuffer.<init>(int)
        Exclusive User CPU Time:    0.100 (  0.0%)
        Inclusive User CPU Time:    4.761 (  0.1%)
                           Size:      146
                     PC Address: 2:0x000015aa
                    Source File: /tmp/analyzer.DirectByteBuffer.java.16755
                    Object File: java.nio.DirectByteBuffer
                    Load Object: /home/mwm_jms/JMSB/log/213/server.er/jclasses
                   Mangled Name: java.nio.DirectByteBuffer.<init>
                        Aliases:
weblogic.socket.NIOSocketMuxer$NIOOutputStream.write(java.nio.ByteBuffer[], int, int)
        Exclusive User CPU Time:    1.360 (  0.0%)
        Inclusive User CPU Time:  434.637 ( 11.5%)
                           Size:       88
                     PC Address: 2:0x000012e9
                    Source File: weblogic/socket/NIOSocketMuxer.java
                    Object File: weblogic.socket.NIOSocketMuxer$NIOOutputStream
                    Load Object: /home/mwm_jms/JMSB/log/213/server.er/jclasses
                   Mangled Name: weblogic.socket.NIOSocketMuxer$NIOOutputStream.write
                        Aliases:

As you can see, the DirectByteBuffer allocation went down to nothing, and the JMS-specific code path dropped 8% of CPU time (and the throughput of this code path increased 1.2x times).

                                    

Comments
The chanfes cause JCK failure. api/java_nio/channels/AsynchronousSocketChannel/AsynchronousSocketChannel_readWrite test fails with 7u10 b12.

Log: http://aurora-ds.ru.oracle.com:9500/runs/113270.PROMOTED-2/results/workDir/api/java_nio/channels/AsynchronousSocketChannel/AsynchronousSocketChannel_readWrite.jtr

readWrite_scattering test case should be excluded from JCK 7 if it this is considered as test bug for 7.
                                     
2012-10-23



Hardware and Software, Engineered to Work Together