The Gaussianblur filter is used in many common situations, including a component of the Bloom and Glow effects. Right now any device without hardware (GPU) acceleration will use either a native compiled (SSE enabled) function or a Java method that was written by the automated JSL compiler. The code is surprisingly fast for code that was derived from a Shader source language, but it isn't as fast as it could be by rewriting the inner loops by hand.
Further, there is an interesting problem with the automatically generated SSE and Java loops - the Java loops are actually faster than  the SSE loops
A quick hand-tuning of the inner loops of the Java and SSE backends for GaussianBlur show anywhere from about a 3x to about an 11x speedup.
Here is a quick table of the results:
GaussianBlurTest
                      radius 10     radius 63
JSL Client Java         135ms        1540ms
hand Client Java         45ms         347ms
JSL Server Java         120ms        1210ms
hand Server Java         35ms         240ms
JSL SSE                 234ms        2770ms
hand SSE                 32ms         235ms
D3D                       7ms          24ms
---------------------------
BloomTest (fixed radius = 10)
JSL Client Java         150-200ms
hand Client Java         60ms
JSL Server Java         125-200ms
hand Server Java         47-50ms
JSL SSE                 225ms
hand SSE                 40ms
D3D                       5ms
---------------------------
GlowTest (fixed radius = 10)
JSL Client Java         220ms
hand Client Java         68ms
JSL Server Java         232ms
hand Server Java         55ms
JSL SSE                 278ms
hand SSE                 48ms
D3D                       4ms