JDK-8164001 : Improve HTTP/2 client scalability and performance (up to 3x times).
  • Type: Enhancement
  • Component: core-libs
  • Sub-Component: java.net
  • Affected Version: 9
  • Priority: P3
  • Status: Resolved
  • Resolution: Duplicate
  • Submitted: 2016-08-13
  • Updated: 2016-12-12
  • Resolved: 2016-12-12
Related Reports
Blocks :  
Duplicate :  
Description
HTTP/2 Client has a very poor scalability. The source for that is over synchronization.
There are two points for that:
- java.net.Queue class - synchronization + callbacks leads to that only one thread may performs progress, besides the Queue has generic implementation for all usage. 
      At the same moment more specialized queues implementation could be more effective in particular cases.
- Http2Connection.sendFrame implementation do global (for the whole connection) synchronization on "sendlock".

That causes a large nested sequence of locks which a large critical sections (all synchronized actions are marked with *):

* Http2Connection.sendFrame lock, here are doing:
   - headers encoding
   - streams number assignments
   - frames encoding (to byte buffers) 
   * enqueuing ByteBufers to output queue:
     - callback invocation:
       * lock on Connection global write lock:
          * write to SocketChannel and subsequent internal SocketChannel locks.
          
That all leads to the fact: in multythreaded HttpClient execution each user thread (thread there requests to HttpClient performed) utilization never reach 10%.
SelectorManager thread reaches 30% CPU utilization.

Suggested fix move possible actions out of synchronized section, other actions where ordering is important enclosed into smaller critical sections with more effective synchronization primitives.

As result performance improved up to 3x times at some scenarios and scalability was significantly improved (see charts).

Webrev could be found here:
http://cr.openjdk.java.net/~skuksenko/jep110/8164001/

Comments
Performance charts
13-08-2016

Implementation details: 1. Do not use java.net.Queue. 3 other queues implemented: - ClosableBlockingQueue - replace Queue for data frames queue. - AsyncReadQueue - replace Queue for read ByteBuffers queue (in AsyncSSLDelegate). - AsyncWriteQueue - replace Queue for writing in AsyncSSLDelegate and PlainHttpConnection. 2. The second key modification (in write operations) is splitting the single write operation to enqueuing operation and wiring to SocketChannel operation. AsyncConnection interface has new operations: writeAsync(ByteBuffer[] buffers) - enqueus sequence of ByteBuffers to the end of send queue. writeAsyncUnordered(ByteBuffer[] buffers) - enqueus sequence of ByteBuffers to the beginning send queue. Allowed only for frames which may be send before other frames according to HTTP/2 specification and critical for performance. Right now it is used only for WindowUpdateFrame and PingFrame. The single frame may be encoded into sequence of ByteBuffers. In order to prevent buffers from different frames interleaving - AsyncWriteQueue atomically operates with arrays of ByteBuffers. Buffers from the same frame should be written once with the single array and order will be preserved. Different arrays may change order (if allowed by HTTP/2 specification). The same technique is used to prevent other frames to be inserted between HeadersFrames and following ContinuationFrame sequence (that is forbidden by HTTP/2 spec). writeAsync* operations do not perform writing data to socket, just enqueuing. Thus after each writeAsync* (or sequence of writeAsync*) operation the following operation should be invoked. flushAsync() - performs actual data write to socket. Drain full send queue. In case of invocation this operation from multiple threads, the only one thread wins the race (winner) and performs writing, other threads (loosers) are not blocked and continue execution. Winner will drain the full queue and write data that were enqueued by loosers. Internal state machine of AsyncWriteQueue class also carefully process situation when one thread is performing flushing and the other thread put data into the queue (these operation are not block each other). Desides AsyncWriteQueue has dedicated "DELAYED" state. Queue is explicitly transferred into this state if PlainHttpConnection register WriteEvent or if AsyncSSLDelegate performs handhshaking. Now sequence and nestness of lock looks like: * Http2Connection.sendFrame lock, here are doing: - headers encoding - streams number assignments - frames encoding (to byte buffers) * enqueuing ByteBufers to output queue: - Http2Connection.sendFrame exits lock and performs flushAsync - PingFrame and WindowUpdateFrame are not subject of Http2Connection.sendFrame lock and has higher priority for sending. - DataFrame and WindowUpdateFrame are not subject of Http2Connection.sendFrame, but has the same priority as other Frames. Please note, that all measurement are done using as baseline exiting code + JDK-8161004 and JDK-8162497 which a critical for performance evaluation. Also, because of JDK-8161004 and JDK-8162497 are still not committed the webrev also includes JDK-8161004 and JDK-8162497 changes.
13-08-2016