JDK-8161004 : Bulk sending WindowUpdate frame speedup HTTP/2 performance up to 18x times and improve scalability
  • Type: Enhancement
  • Component: core-libs
  • Sub-Component: java.net
  • Affected Version: 9
  • Priority: P3
  • Status: Closed
  • Resolution: Duplicate
  • Submitted: 2016-07-07
  • Updated: 2016-12-09
  • Resolved: 2016-12-09
Related Reports
Duplicate :  
Relates :  
Description
Current HTTP/2 implementation sends WindowUpdate frame each time when data frame is processed. That leads to network saturation and oversynchronization on "Http2Connection.sendlock".

In case of getting small data (less than max frame size) we get 55% of blocks on sendWindowUpdate (and only 37% on sendHeaders).
Even in case of getting large data (1Mbytes) - 18% of all blocks is caused by sendWindowUpdate.

Suggested performance improvement provides 2x-18x performance speedup.
Implemented optimizations:
1. Accumulate size of received data and send bulk WindowUpdate when some watermark is reached.
2. Don't send stream WindowUpdate when the last data frame is received (set flag END_STREAM)
3. WindowUpdate frame goes directly to HttpConnection bypassing Http2Connection.sendlock. That is allowed with proper protection for "do not  insert WindowUpdate frame between Headers and Continuation frames".

webrev: http://cr.openjdk.java.net/~skuksenko/jep110/8161004/ 
Comments
Fixed in sandbox.
09-12-2016

Some exact benchmark numbers: * get 1 byte (baseline vs optimzied): 1 thread : 2746 vs 3207 requests/sec (17% speedup) 24 threads : 3938 vs 7282 requests/sec (85% speedup) 48 threads : 3779 vs 7064 requests/sec (87% speedup) * get 128K bytes (baseline vs optimzied): 1 thread : 1076 vs 1077 requests/sec (~0% speedup) 24 threads : 975 vs 2087 requests/sec (114% speedup) 48 threads : 990 vs 2092 requests/sec (111% speedup) * get 1M bytes (baseline vs optimzied): 1 thread : 142 vs 159 requests/sec (12% speedup) 24 threads : 19 vs 300 requests/sec (16x times speedup) 48 threads : 58 vs 332 requests/sec (5.7x times speedup)
07-07-2016

The following bulk sending water was chosen: - not less than max frame size (don't prevent server to send large amount of data) - whole numbers of max frame size near half of initial window size. That heuristic was chosen after set of experiments and behave better.
07-07-2016

webrev http://cr.openjdk.java.net/~skuksenko/jep110/8161004/
07-07-2016

Attached charts show performance and scaling improvement on [2x12x2] cores IvyBridge for different data size (get 1 byte, 128K bytes and 10^6 bytes).
07-07-2016