JDK-8102696 : Optimize rendering of Controls by minimizing state changes
  • Type: Enhancement
  • Component: javafx
  • Sub-Component: graphics
  • Affected Version: 8
  • Priority: P4
  • Status: Resolved
  • Resolution: Duplicate
  • Submitted: 2013-04-19
  • Updated: 2015-06-16
  • Resolved: 2013-09-11
Related Reports
Blocks :  
Relates :  
Description
Richard, Steve and I were discussing the fact that we have seen there is a big cost every-time we change OpenGL state to change to new shader as we also have to flush the buffers. Typically when we render UI controls its all rounded rectangles and text. So almost every draw call we are switching from one to the other. Steve found that in simple tests there might be a 5x or more gain if we could render all the rectangles first then all the text after. We have investigated state sorting before and not found a easy way to apply to 2D graphics with lots of transparency(due to antialiasing). We then thought hand on a minute, with Region caching all region drawing after the first time should just be simple texture draws. Also all text is in a glyth cache so again that should be all texture calls. So in theory once we have drawn all regions to cache we should only be doing texture draws.

To test this out we need to investigate is it possible to draw text and region cache textures using the same shaders and setup and avoiding the state switching and flushes.

Secondly we will need to see how many Region cache misses we get with typical applications and look at extending the range of cases that get cached if this proves to offer big performance gains.
Comments
This has been addressed by super shader.
11-09-2013

Using CheckBox makes a huge difference. Here are the numbers (running with -Djavafx.animation.fullspeed=true): NULL: View size: 800.0 x 600.0 Debug: false Toggle step: 15 Starting warmup for 10 sec... Measurement phase for 20 sec... Score: 656.5866 Average FPS over 20s instant pulses: 566.549, average pulses: 656.5824 ES2 Normal: View size: 800.0 x 600.0 Debug: false Toggle step: 15 Starting warmup for 10 sec... Measurement phase for 20 sec... Score: 156.43756 Average FPS over 20s instant pulses: 156.80698, average pulses: 156.43642 ES2 Draw Text Last: View size: 800.0 x 600.0 Debug: false Toggle step: 15 Starting warmup for 10 sec... Measurement phase for 20 sec... Score: 337.7895 Average FPS over 20s instant pulses: 341.32043, average pulses: 337.78748
25-04-2013

You can also try CheckBox benchmark. I would also suggest upping the size of the window to be as big as you can make it, so that it is render bound not CSS/Layout. Also for ListViewBenchmark there is a option to do keyboard one down arrow press per pulse, that will probably give you better results, I assume there is a equivalent for TableViewBenchmark. The problem with scroll_drag or key press(100) is they do large scrolls where none of the old row cells can be reused. In that case we are more bound on applying CSS/Layout to the new cells than we are by drawing the cells. If you scroll one row at a time you will be more bound by drawing.
25-04-2013

I looked into this, specifically for TableViewBenchmark. I modified Prism to count buffer flushes tranverse the render tree twice, once to draw everything that is not text and once more to draw the text. Of course this is not right but I wanted to count flushes and see if performance could be improved. Unfortunately, we flush the vertex buffer for all sorts of reasons, so the number of flushes did not go down. Next I tried TableViewBenchmark using the null pipeline (running with -Djavafx.animation.fullspeed=true). Here are the results on OS X: NULL: InjectionRate :20 ms Test mode :SCROLL_DRAG Total cells :1000x30 Cell type :REGULAR Max Resize Delta: 100 Starting warmup for 10 sec... Measurement phase for 20 sec... Score: 45.76682 Average FPS over 20s instant pulses: 45.56454, average pulses: 45.76657 ES2: InjectionRate :20 ms Test mode :SCROLL_DRAG Total cells :1000x30 Cell type :REGULAR Max Resize Delta: 100 Starting warmup for 10 sec... Measurement phase for 20 sec... Score: 43.383633 Average FPS over 20s instant pulses: 44.875072, average pulses: 43.383404 It seems that TableViewBenchmark is not a good candidate for Prism optimization. Did I get this wrong? Is it that my machine and graphics card are too fast. What is the result of running with the null pipeline on embedded?
25-04-2013

Change issue type to Tweak (Optimization is deprecated).
19-04-2013

Assigned to Steve as he said he would investigate.
19-04-2013