The Gaussianblur filter is used in many common situations, including a component of the Bloom and Glow effects. Right now any device without hardware (GPU) acceleration will use either a native compiled (SSE enabled) function or a Java method that was written by the automated JSL compiler. The code is surprisingly fast for code that was derived from a Shader source language, but it isn't as fast as it could be by rewriting the inner loops by hand.
Further, there is an interesting problem with the automatically generated SSE and Java loops - the Java loops are actually faster than the SSE loops
A quick hand-tuning of the inner loops of the Java and SSE backends for GaussianBlur show anywhere from about a 3x to about an 11x speedup.
Here is a quick table of the results:
GaussianBlurTest
radius 10 radius 63
JSL Client Java 135ms 1540ms
hand Client Java 45ms 347ms
JSL Server Java 120ms 1210ms
hand Server Java 35ms 240ms
JSL SSE 234ms 2770ms
hand SSE 32ms 235ms
D3D 7ms 24ms
---------------------------
BloomTest (fixed radius = 10)
JSL Client Java 150-200ms
hand Client Java 60ms
JSL Server Java 125-200ms
hand Server Java 47-50ms
JSL SSE 225ms
hand SSE 40ms
D3D 5ms
---------------------------
GlowTest (fixed radius = 10)
JSL Client Java 220ms
hand Client Java 68ms
JSL Server Java 232ms
hand Server Java 55ms
JSL SSE 278ms
hand SSE 48ms
D3D 4ms