United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
Bug ID: JDK-6255545 OGL: avoid redundant texture state-setting calls
JDK-6255545 : OGL: avoid redundant texture state-setting calls

Details
Type:
Bug
Submit Date:
2005-04-14
Status:
Resolved
Updated Date:
2008-02-06
Project Name:
JDK
Resolved Date:
2005-06-27
Component:
client-libs
OS:
solaris_9
Sub-Component:
2d
CPU:
generic
Priority:
P4
Resolution:
Fixed
Affected Versions:
5.0
Fixed Versions:

Related Reports
Relates:

Sub Tasks

Description
The OGL pipeline is currently not very efficient when it comes to managing
texture state.  For example, we set the texture filter everytime we copy a
texture to an OGL surface, even though this rarely changes for a given
texture.  Likewise, we set the texture function (e.g. GL_MODULATE) everytime
we copy from a texture, render from the glyph cache, etc, even though 99%
of the time the function hasn't changed since last time.
###@###.### 2005-04-14 20:13:42 GMT

                                    

Comments
EVALUATION

The texture filtering mode (e.g. GL_NEAREST) should be set once when the
texture is created (this state is per-texture object).  We should update
this state only when we need to use a different filtering mode for that
texture.  It appears that this change improves performance of small (20x20)
texture copies by about ~3% on most hardware.

The texture function (e.g. GL_MODULATE) should be set once the first time
a texture is rendered (this state is per-context).  We should update this
state only when we need to use a different texture function (e.g. GL_REPLACE,
in the case of OGLMaskBlit).  Surprisingly, this single change improves
performance of small (20x20) texture copies by as much as 35%, depending
on the hardware.  It also has a modest benefit for other texture-based
operations, like text rendering (drawString() with 8 characters improves
by 10% with this change).

These changes have little impact on larger image copies since the calls
involved are dwarfed by the texture mapping operation itself.  For smaller
operations (20x20 drawImage()), the overall improvements from the above
changes varies by hardware:
Sol9    900MHz USIII   XVR-1200       +11%
WinXP   2x2.6GHz P4    NV GF FX 5600  +37%
                       NV GF2 MX400   +26%
                       ATI R9500 Pro  +42%
JDS     2x2.6HGz P4    NV GF FX 5600  +12%

This change has a modest improvement on SwingMark scores (e.g. on the last
configuration, performance is improved from 8360 to 8130).

Note that there are other potential optimizations in this area.  For example,
we could avoid calling glBindTexture() when the texture is the same as last
time.  This change is fairly complex because it requires invalidating the
"lastTexture" field on all contexts when a texture is deleted (texture object
IDs are frequently reused by drivers).  Also, there doesn't appear to be a
big win from this change (maybe 1-2% improvement at best), so it wasn't
worth exploring further.

Another optimization would be to avoid calling glEnable(GL_TEXTURE_2D)
and glDisable(GL_TEXTURE_2D) around every texturing operation.  This change
is also complex because it requires tracking each operation and determining
when it is safe to leave texturing enabled.  I think I experimented with this
a while back, and it might buy us some more gains, but that will require
more investigation (outside the scope of this fix).
###@###.### 2005-04-14 20:13:42 GMT

I found another simple optimization.  In OGLBlitSurfaceToSurface(), we
always call glPixelZoom() before and after the glCopyPixels() call,
even if the scale factors are both 1.0f (the default value).  Avoiding
these calls results in the following gains (on JDS, 2x2.6GHz P4,
Nvidia GF FX 5600):
  20x20 drawImage() from VI to screen: + 2%
  20x20 drawImage() from VI to VI:     +20%
  20x20 copyArea() (onscreen):         +22%
  20x20 copyArea() (VI):               +22%

The first case doesn't show much improvement because we flush the RQ after
every pbuffer->screen copy, which requires a thread switch per copy.  That
immediate flush is necessary to keep Swing responsive, but it keeps the
benchmarks from showing their full potential.  For example, if we removed
the rq.flushNow() call after every pbuffer->screen copy, SwingMark would
improve by approximately 10%.  (This discussion is getting outside the scope
of this bug report, so I'll leave that investigation for another day.)
###@###.### 2005-04-14 22:55:47 GMT
                                     
2005-04-14



Hardware and Software, Engineered to Work Together