JDK-8099811 : Add optimized BoxBlur filter effect
  • Type: Enhancement
  • Component: javafx
  • Sub-Component: graphics
  • Affected Version: fx1.2
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2009-01-29
  • Updated: 2015-06-16
  • Resolved: 2009-05-01
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
fx1.2Fixed
Related Reports
Blocks :  
Relates :  
Relates :  
Relates :  
Description
The current GaussianBlur class implies a Gaussian kernel, which looks nice but can be expensive in terms of performance.  Having a BoxBlur class would allow us to have a faster blur implementation that could also potentially be used by other classes, such as DropShadow.


Comments
The final stage of this issue is to complete the configurability of the Shadow classes (DropShadow, InnerShadow, and Shadow) so that they can use either a Box or Gaussian blur filter on the shadow. The API to specify the algorithm was already added earlier, but the Decora Gaussian filter was not able to support the complete API required and so the API was hooked up to always use the Box filter even when Gaussian was requested. The Gaussian filters were updated to allow separate configuration of the horizontal and vertical sizes and the Gaussian-based and Box-based Shadow classes were modified so that an implementation could switch back and forth between the two algorithms by setting a property. Now the configurable API of the FX Shadow classes is fully functional. Webrevs are at: http://javaweb.sfbay.sun.com/~flar/webrev/FXConfigShadow.2/
01-05-2009

Optimized versions of the BoxShadow filter based on an incremental algorithm have been written and are up to 25x faster than the general versions that rely on the LinearConvolve peers. These filters are also as much faster, or more, than the software versions of the Gaussian filters that were the default Shadow implementations in Franca. The webrevs for the new peers are at: http://javaweb.sfbay.sun.com/~flar/webrev/SWBoxShadow.0/ This is perhaps the second to last step to completing this bug. One remaining step involves modifying the Decora GaussianBlur effect so it can take separate width and height parameters rather than a single radius parameter. That step is necessary to complete the new *Shadow APIs that allow the developer to specify a Gaussian implementation and it is minor.
24-04-2009

There is now a BoxBlur FX class and the Shadow, DropShadow, and InnerShadow allow you to specify a Box implementation as well. All that remains is a software optimized implementation of the BoxShadow decora shader to complete this task. Currently the SW implementation of the shadow filters is using a generic "convolution kernel" based implementation, not an optimized incremental integer algorithm...
16-04-2009

The API suggested for the BoxBlur and BoxShadow effects is: BoxBlur extends Effect { input: Effect width: Number=5 /* [0,255] */ height: Number=5 /* [0,255] */ iterations: Integer=1 /* [0,3] */ } BoxShadow extends Effect { input: Effect width: Number=5 /* [0,255] */ height: Number=5 /* [0,255] */ iterations: Integer=1 /* [0,3] */ spread: Number=0 /* [0,1] */ color: Color=Black }
03-04-2009

The new BoxBlur API comes with a new shader implementation for the HW pipelines as well. The code now uses a generalized linear convolution shader which it can share with other linearly separable (or arbitrary linear) convolutions. As a result, the HW performance numbers have been affected. The new numbers are: D3D s=3 (similar to r=1) 2.19 ms/op D3D s=21 (i.e. r=10) 3.86 ms/op D3D s=129 (i.e. r=64) 23.1 ms/op D3D s=255 42.2 ms/op OGL s=3 (similar to r=1) 0.91 ms/op OGL s=21 (i.e. r=10) 2.28 ms/op OGL s=129 (i.e. r=64) 12.9 ms/op OGL s=255 13.9 ms/op Some of the times for larger radii are better than the original BoxBlur and Gaussian implementations, but are worse for small radii.
13-03-2009

In order to match the capabilities of other typical image filtering packages the API of the BoxBlur class has been changed to have 2 separate controls for vertical and horizontal size, and to interpret these sizes as the total diameter of the effect rather than as the radius of blurring from the original pixel location. At the same time a "passes" attribute was added to control the number of iterations of the box blur so that higher quality blurs can be achieved with the simple box blur algorithm. These changes required some restructuring of the algorithm and elimination of the transposing algorithm which previously achieved a 15 to 30T gain. Here are the new software performance results for the re-specified BoxBlur: SSE s=3 (similar to r=1) 4.86 ms/op SSE s=21 (i.e. r=10) 5.41 ms/op SSE s=129 (i.e. r=64) 9.02 ms/op SSE s=255 13.1 ms/op Java s=3 (similar to r=1) 5.42 ms/op Java s=21 (i.e. r=10) 5.64 ms/op Java s=129 (i.e. r=64) 9.96 ms/op Java s=255 14.3 ms/op Most of these numbers have reverted to their pre-transpose-optimization values.
13-03-2009

Using a technique where the filter transposes the image as it works on each pass (2 passes == 2 transposes == returning to the original orientation), gains another 15 to 30% performance gain for the Software implementations, as follows (all numbers are ms/op, smaller is better): SSE r=1 5.0 => 3.85 SSE r=10 5.5 => 4.4 SSE r=64 8.9 => 6.3 Java r=1 5.4 => 4.66 Java r=10 6.1 => 5.1 Java r=64 9.7 => 7.5
04-02-2009

There was an existing BoxBlur filter in Decora that was not exposed at the FX level due to poor performance. Here are some results for rewriting the BoxBlur filter to do separated (horizontal and vertical) passes, processing only 1 row or column on each pass, plus writing some hand-tuned Java and SSE loops to use an incremental algorithm. All numbers below are ms/op - smaller is better. Note that these results are measured directly at the Decora level - the BoxBlur filter is not yet exposed at the FX level to get numbers which include any overhead for the FX to Decora bridging. radius 64 (63 for Gaussian): oldBox newBox Gaussian D3D 595 25.3 23.9 OGL 680 13.0 12.9 Java 36843 9.7 347.9 SIMD 32266 8.9 232.9 radius 10: oldBox newBox Gaussian D3D N/A 6.3 7.0 OGL N/A 2.9 2.5 Java N/A 6.1 44.6 SIMD N/A 5.5 32.1 radius 1: oldBox newBox Gaussian D3D 3.0 1.14 1.13 OGL 1.4 0.88 0.9 Java 35.7 5.4 10.6 SIMD 29. 5.0 10.0
30-01-2009