JDK-4837564 : (bf) Please make DirectByteBuffer performance enhancements
  • Type: Enhancement
  • Component: core-libs
  • Sub-Component: java.nio
  • Affected Version: 1.4.1
  • Priority: P3
  • Status: Closed
  • Resolution: Fixed
  • OS: windows_2000
  • CPU: x86
  • Submitted: 2003-03-26
  • Updated: 2017-05-16
  • Resolved: 2011-05-18
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 7
7 b116Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Description

Name: rmT116609			Date: 03/25/2003


A DESCRIPTION OF THE REQUEST :
The most important enhancement I would like to see is removal of the page alignment.
It would appear that every DirectByteBuffer wastes a PageSize of data (4096 bytes on my platform).  It also wastes time setting this unused memory to 0.  When it doesn't get returned page aligned it then creates another buffer.
I have looked through the code and it never makes use of the page alignment.
Instead it uses an aligned property to determine if it can get the values directly or if it needs to create them from bytes instead.
If the intention was to have bigger allocation blocks then IMHO a better approach would have been to call allocateMemory with capacity rounded up to the nearest page size (e.g. 8000 bytes goes to 8192 bytes).  Also this unused space (in this case 192) could be returned to another buffer, if that later buffer requests a buffer with capacity <= 192.

If changes are getting made anyway, then removal of the << 0 scatterred throughout the DirectByteBuffer class might also be a good idea.  I assume these were generated somehow.  For int i the << 0 makes absolutely no difference for any value of i except for possibly slowing down performance.

Also assert (pos <= lim);
int rem = (pos <= lim ? lim - pos : 0);
should be simplified to assert (pos <= lim);
int rem = lim - pos;

etc.

Note: Fixes are getting done to DirectByteBuffer for JDK 1.4.2 for bugs 4827358 and hopefully also for the bug corresponding to review id 182986.

JUSTIFICATION :
I noticed incredible amounts of paging going on under Windows, so much so that the whole system just stood still.  After tracing through to find the cause of it, yet again DirectByteBuffer's were the problem.  The cause was that every DirectByteBuffer was wasting 4096 bytes, hence memory consumption was a lot more than it needed to be.

EXPECTED VERSUS ACTUAL BEHAVIOR :
Performance of the direct buffer to be somewhat comparable with the heap buffer.
Most of our direct byte buffer usage is for reading writing to channels.
With this behaviour heap buffers do a better job than direct buffers even it situations ideally suited to direct buffers.  The performance difference is from almost instantaneous, to seemingly stationary.
On my platform, heap buffer took 0 seconds, direct buffers took 4 minutes.
The exaggerated example is just to make the point, in actual code, direct byte buffers are only about 4000 x slower than heap buffers even though in our actual code direct buffers are meant to have the potential of being a lot faster.

---------- BEGIN SOURCE ----------
import java.nio.*;

public class Test {
  public static void main(String[] args) {
    ByteBuffer[] buffers = new ByteBuffer[100000];
    long startTime = System.currentTimeMillis();
    for (int i = 0; i < buffers.length; i++) {
      buffers[i] = ByteBuffer.allocate(1);
    }
    for (int i = 0; i < buffers.length; i++) {
      buffers[i].put((byte)0);
    }
    System.out.println("Time Taken Heap Buffer: " + (System.currentTimeMillis() - startTime) / 1000 + " seconds");

    buffers = new ByteBuffer[100000];
    System.gc();
    System.runFinalization();

    startTime = System.currentTimeMillis();
    for (int i = 0; i < buffers.length; i++) {
      buffers[i] = ByteBuffer.allocateDirect(1);
    }
    for (int i = 0; i < buffers.length; i++) {
      buffers[i].put((byte)0);
    }
    System.out.println("Time Taken Direct Buffer: " + (System.currentTimeMillis() - startTime) / 1000 + " seconds");
  }
}

---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
Allocate your own direct buffers a lot bigger than they need to be.
Then use slice to return buffers the actual size they need to be.
4096 bytes will only get wasted per allocation buffer instead of per buffer this way.
It is a little more tricky as if any of the buffer slices still exist then none of the allocation buffer can be freed.

Alternatively use Heap Buffers instead.  The performance is still faster than using direct buffers when all this paging needs to be done.  Bug 4411600 is still a problem for this way as well though.
(Review ID: 183021) 
======================================================================

Comments
EVALUATION We've never documented that direct buffers are page aligned and we've never encountered an application that assumes it either. For jdk7, we would like to change the implementation so that direct buffers are not aligned by default. A new VM option can be added to force page alignment if required. This solution should reduce the memory usage, and also improve the allocation slightly as there is less memory to be be zero'ed (on that, there shouldn't be a need to zero the memory between address and base because it is not accessible via a ByteBuffer).
14-10-2010

EVALUATION This RFE suggests four different enhancements: 1. Removal of page alignment. 2. Removal of memory zero-initialization. 3. Removal of the "<< 0" scatterred throught the DirectByteBuffer class. 4. Simplication of assert statements of the form: assert (a <= b); int x = (a <= b ? b - a : 0); To: assert (a <= b); int x = b - a; The first suggestion seems reasonable. I presume that page-alignment was originally required for performance reasons since it would allow us to guarantee a minimum number of pages for any memory allocation (thus reducing page faults). Presumably it would also allow for device I/O where pagealignment is required. After careful consideration, we've decided to remove this alignment. We believe that it is no nessary because as time has progressed, page sizes have become larger; typically exceeding the size of buffer allocation. Also, we havn't found any code that depends or requires page aligment. The specification makes no reference to this implementation detail, so thisshould just be a simple matter of code removal/simplification. Since buffers must always be initialized with some fixed value for security purposes, the second suggestion can not be implemented. Bug 6535542 (integrated jdk7-b15) modified the specification to require zero-initialization. The code-snippet referenced in the third suggestion is, as speculated, a result of auto-generated code. The value of $LG_BYTES_PER_VALUE$ for byte is 0. I have verified that the javac compiler optimizes this shift out of the generated byte code. I suppose that we could we could introduce additional complexity in the source to handle this degenerate case, but it hardly seems necessary. The final suggestion, the simplification of assert/assignment statements is not possible. The replacement code is only equivalent if system assertions are enabled. For performance reasons, this is not the default behaviour of the VM. Furthermore since the buffer classes explicitly state they expect user code to handle synchronization, it is possible that a buffer may be in an incorrect state for at the time when the values for "a" or "b" are retrieved.
31-07-2007