JDK-8097799 : Optimize blurs in shaders
  • Type: Enhancement
  • Component: javafx
  • Sub-Component: graphics
  • Priority: P3
  • Status: Closed
  • Resolution: Duplicate
  • Submitted: 2010-12-28
  • Updated: 2015-06-12
  • Resolved: 2014-07-15
Related Reports
Duplicate :  
Relates :  
Description
ColorfulCircles on my MacBookPro (2.8GHZ core 2 duo, NVidia 9400M) has very poor frame rate which seems to be related to the blurs and the fact that the 9400M uses system memory instead of VRAM. Using a 1 pixel box blur is much faster, which demonstrates that the issue is related to the actual blur algorithm rather. Using the 9600M GT (also on this notebook) showed good performance initially but soon become very jittery (perhaps this was due to overheating the GPU?).

For many types of blurs where precision is less important (as in ColorfulCircles and a lot of other animated blur situations) we could use MIP Mapped blurring (downscale the image and upscale again) to get the desired effect. Perhaps what we need is a FastBlur class which uses a box, gaussian, or image-scaled approach depending on the situation. In this way we could make sure that typical usages of animated blurs would be fast even on lame hardware (and even in software mode).
Comments
Per the comment in RT-2892, as of the fix of RT-13275 we now perform a Gaussian (or any linear convolution) blur at the optimal size for the indicated scaling factors, implicitly using the "scale down, blur, and scale up" technique with the new implementation. Unless/until there is a concrete suggestion about a change to our choice of scaling or other shader calculations I am going to close this as a duplicate of RT-13275 and we can open new bugs as concrete suggestions arise.
15-07-2014

OK to defer.
23-08-2011

SQE: ok to defer
10-08-2011

Jim, is there a resolution on your proposal above? How serious is this issue? Is this the correct priority?
07-06-2011

Here are the comments from Jim: On 1/24/11 2:42 PM, Jim Graham wrote: > The hardware pipelines perform box blurs using an array of coefficients to multiply against adjacent pixels to average them into the result for a given destination pixel. > > As the number of pixels computed goes up then this process takes longer and longer per-pixel. But, the story is a little more complicated. Because the version of Shaders > that we are using don't have looping constructs we have to manually replicate the work done on the pixels. It still takes longer for more pixels, but the difference tends > to be quantized a bit. As it turns out, this doesn't seem to affect the performance much, or you'd see more stepping in the graph. > > Another complication is that we can only have so many instructions in a shader so there is a limit to how much we can manually replicate the blur equations. Once we have > more pixels than we can emit instructions to combine them we overflow. The way we deal with that overflow is by scaling down the image and then blurring the scaled down > version and then scaling it back up to the original size. For large radii the difference isn't visible. As a result, at radius=N we might use a shader that does N pixel > operations, but at radius=N+1 we might not have enough shader instructions to do it so we scale down and blur the smaller image with half the radius so we only do "(N+1)/2" > operations. In those cases, it can actually be cheaper to do a larger radius. This is the effect that causes the sudden increase in performance at various cutoffs on the > larger radii. > > The following issues are raised: > > - If downscaling is such a big performance win, can we get away with more downscaling and less pixel operations? Can we lower the threshold where we first consider > scaling? Can we increase the number of scale steps we use at a given radius? > > - Rather than reach a threshold and suddenly scale down by a factor of 2, what if we scaled just enough to keep our pixel equations just under the maximum threshold? That > way performance would be more monotonic with no "hard to explain" jumps up in performance. This might take away the advantage of scaling somewhat, but if combined with > more aggressive scaling criteria we might end up with something both more linear, and generally faster than what we have now... > > ...jim
25-01-2011

I have created Blur effect performance chart for fx-b12, on our win mid-range machine: Core 2 Duo 3.3 ghz, 3.2 gb ram, NVIDIA Quadro FX 570 See "ColorfulCircle Blur Test 01" chart here: http://javaweb.sfbay.sun.com/~ep155969/Performance/JavaFX2/Experiments/ColorfulCircles/res_01.html You can zoom the chart (double click to reset). You can also get detail on the data point if you move mouse over it. The results were gathered using ColorfulCircle benchmark which was created based on ColorfulCircle example. The original code was modified so the circles coordinates are now deterministic (not based on random()). This will allow to compare apples to apples. The benchmark can be run with different number of circles, different circle radius and different Blur values. All results were obtained using 75 circles of radius 100 with BoxBlur(w,h=w,3). You can find current source of benchmark here: http://javaweb.sfbay.sun.com/~ep155969/Performance/JavaFX2/Benchmarks/ColorfulCircles/ Results seems to be interesting. There are 2 drops when running in prism-hw mode (-Xjavafx.toolkit=glass): 1) from blur width/heights 12 till 43, the drop is from 30fps till 9 fps 2) from blur width/heights 50 till 85, the drop is from 30fps till 18.5 fps Running in prism-sw mode (-Xjavafx.toolkit=glass -Dprism.order-j2d) shows interesting pattern. Odd (blur w/h = {1,3,...,127}) results are always better than even results. Why is that? Is it because in first case the center of the region is an integer number?
25-01-2011

Set priority to Major because we can't ship the ColorfulCircles demo until it has reasonable performance. Also, our goal is that the shipping platform in September is able to run the JavaOne 2010 keynote demo without using special flags and such, and having this part of the demo run acceptably on target hardware would help us reach that goal.
28-12-2010