JDK-6255545 : OGL: avoid redundant texture state-setting calls
  • Type: Bug
  • Component: client-libs
  • Sub-Component: 2d
  • Affected Version: 5.0
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • OS: solaris_9
  • CPU: generic
  • Submitted: 2005-04-14
  • Updated: 2008-02-06
  • Resolved: 2005-06-27
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 6
6 b43Fixed
Related Reports
Relates :  
Description
The OGL pipeline is currently not very efficient when it comes to managing
texture state.  For example, we set the texture filter everytime we copy a
texture to an OGL surface, even though this rarely changes for a given
texture.  Likewise, we set the texture function (e.g. GL_MODULATE) everytime
we copy from a texture, render from the glyph cache, etc, even though 99%
of the time the function hasn't changed since last time.
###@###.### 2005-04-14 20:13:42 GMT

Comments
EVALUATION The texture filtering mode (e.g. GL_NEAREST) should be set once when the texture is created (this state is per-texture object). We should update this state only when we need to use a different filtering mode for that texture. It appears that this change improves performance of small (20x20) texture copies by about ~3% on most hardware. The texture function (e.g. GL_MODULATE) should be set once the first time a texture is rendered (this state is per-context). We should update this state only when we need to use a different texture function (e.g. GL_REPLACE, in the case of OGLMaskBlit). Surprisingly, this single change improves performance of small (20x20) texture copies by as much as 35%, depending on the hardware. It also has a modest benefit for other texture-based operations, like text rendering (drawString() with 8 characters improves by 10% with this change). These changes have little impact on larger image copies since the calls involved are dwarfed by the texture mapping operation itself. For smaller operations (20x20 drawImage()), the overall improvements from the above changes varies by hardware: Sol9 900MHz USIII XVR-1200 +11% WinXP 2x2.6GHz P4 NV GF FX 5600 +37% NV GF2 MX400 +26% ATI R9500 Pro +42% JDS 2x2.6HGz P4 NV GF FX 5600 +12% This change has a modest improvement on SwingMark scores (e.g. on the last configuration, performance is improved from 8360 to 8130). Note that there are other potential optimizations in this area. For example, we could avoid calling glBindTexture() when the texture is the same as last time. This change is fairly complex because it requires invalidating the "lastTexture" field on all contexts when a texture is deleted (texture object IDs are frequently reused by drivers). Also, there doesn't appear to be a big win from this change (maybe 1-2% improvement at best), so it wasn't worth exploring further. Another optimization would be to avoid calling glEnable(GL_TEXTURE_2D) and glDisable(GL_TEXTURE_2D) around every texturing operation. This change is also complex because it requires tracking each operation and determining when it is safe to leave texturing enabled. I think I experimented with this a while back, and it might buy us some more gains, but that will require more investigation (outside the scope of this fix). ###@###.### 2005-04-14 20:13:42 GMT I found another simple optimization. In OGLBlitSurfaceToSurface(), we always call glPixelZoom() before and after the glCopyPixels() call, even if the scale factors are both 1.0f (the default value). Avoiding these calls results in the following gains (on JDS, 2x2.6GHz P4, Nvidia GF FX 5600): 20x20 drawImage() from VI to screen: + 2% 20x20 drawImage() from VI to VI: +20% 20x20 copyArea() (onscreen): +22% 20x20 copyArea() (VI): +22% The first case doesn't show much improvement because we flush the RQ after every pbuffer->screen copy, which requires a thread switch per copy. That immediate flush is necessary to keep Swing responsive, but it keeps the benchmarks from showing their full potential. For example, if we removed the rq.flushNow() call after every pbuffer->screen copy, SwingMark would improve by approximately 10%. (This discussion is getting outside the scope of this bug report, so I'll leave that investigation for another day.) ###@###.### 2005-04-14 22:55:47 GMT
14-04-2005