In 6514990, we added GPU acceleration for ConvolveOp for kernel sizes of 3x3 and 5x5.
Things work great on all shader-level Nvidia hardware from GeForce FX 5600 on up,
and on ATI hardware from R5xx on up. But on R300 boards, such as Radeon 9600 and 9800,
performance of 5x5 ConvolveOp is unacceptably slow. Enabling native tracing with
J2D_TRACE_LEVEL=2 shows the following on Radeon 9800 with Catalyst 7.2 (same can be
seen with earlier Catalyst drivers on Windows, or on Radeon 9600 with 8.34 on Linux):
[W] OGLContext_CreateFragmentProgram: linker msg (106):
Link successful. The GLSL fragment shader will run in software - available number
of constants exceeded.
Clearly the operation is causing the driver to fall back to a software path, which
means even a simple convolution operation can take seconds to complete, instead of
a few milliseconds.