JDK-6739267 : D3D/OGL: add missing ThreeByteBgr to texture upload blit loop
  • Type: Bug
  • Component: client-libs
  • Sub-Component: 2d
  • Affected Version: 6u10
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • OS: windows_xp
  • CPU: x86
  • Submitted: 2008-08-20
  • Updated: 2010-04-02
  • Resolved: 2009-01-09
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
6u10Fixed 7 b43Fixed
Related Reports
Relates :  
D3D is missing a loop for uploading data from a ThreeByteBgr image
into a texture, so the upload for these images happens through a 
generic loop which is slow and uses an intermediate image
(so there's one extra copy and some GC activity if the intermediate
image is collected).

ThreeByteBgr is often used by video decoders (directshow specifically)
so the upload happens on every frame.

SUGGESTED FIX http://hg.openjdk.java.net/jdk7/jdk7/rev/cd88b4ad7f25, http://sa.sfbay.sun.com/projects/java2d_data/7/6739267.2

EVALUATION Ok, here's the data with the latest version of the fix (full J2DBench results files attached): I have fixed J2DBench to add a set of drawImage+touch tests. But currently one would have to specify accthreshold=0 to properly test texture uploads (SwToTexture) because otherwise we'd be testing the 'unmanaged image' case (SwToSurface). So I ran the benchmarks, and now most variants show improvement, and the results are consistent between ogl and d3d. I tested unmanaged scale/blit/tx 'managed, touched on each iteration' scale/blit/tx d3d_3byte_noopt: Number of tests: 32 Overall average: 1371683.1740006404 Best spread: 0.04% variance Worst spread: 8.94% variance (Basis for results comparison) d3d_3byte_opt: Number of tests: 32 Overall average: 1434948.8126830158 Best spread: 0.0% variance Worst spread: 1.58% variance Comparison to basis: Best result: 20532.49% of basis Worst result: 99.58% of basis Number of wins: 24 Number of ties: 8 Number of losses: 0 ogl_3byte_noopt: Number of tests: 32 Overall average: 557509.3068086591 Best spread: 0.04% variance Worst spread: 1.31% variance (Basis for results comparison) ogl_3byte_opt: Number of tests: 32 Overall average: 659663.2264295145 Best spread: 0.04% variance Worst spread: 4.64% variance Comparison to basis: Best result: 11709.7% of basis Worst result: 99.96% of basis Number of wins: 24 Number of ties: 8 Number of losses: 0 The ties are "managed, untouched" cases, where we only pay the penalty of missing loops once when uploading to the texture for the first time.

EVALUATION J2DBench results for d3d, optimized vs non-optimized: graphics.imaging.tests.drawimage,graphics.imaging.src=unmanaged3ByteBgr opaque,graphics.opts.sizes=1000: d3d_3byte_noopt: 86677.36757 (var=0.56%) (100.0%) d3d_3byte_opt: 104417.67068 (var=0.77%) (120.47%) graphics.imaging.tests.drawimage,graphics.imaging.src=unmanaged3ByteBgr opaque,graphics.opts.sizes=250: d3d_3byte_noopt: 84648.67617 (var=1.55%) (100.0%) d3d_3byte_opt: 103302.07501 (var=0.92%) (122.04%) graphics.imaging.tests.drawimagescaledown,graphics.imaging.src=unmanaged3ByteBgr opaque,graphics.opts.sizes=1000: d3d_3byte_noopt: 1433.121019 (var=1.24%) (100.0%) d3d_3byte_opt: 58772.01761 (var=0.6%) (4100.98%) graphics.imaging.tests.drawimagescaledown,graphics.imaging.src=unmanaged3ByteBgr opaque,graphics.opts.sizes=250: d3d_3byte_noopt: 995.8624898 (var=0.45%) (100.0%) d3d_3byte_opt: 57882.63793 (var=1.13%) (5812.31%) graphics.imaging.tests.drawimagescaleup,graphics.imaging.src=unmanaged3ByteBgr opaque,graphics.opts.sizes=1000: d3d_3byte_noopt: 1571.229050 (var=2.45%) (100.0%) d3d_3byte_opt: 234860.55776 (var=0.4%) (14947.57%) graphics.imaging.tests.drawimagescaleup,graphics.imaging.src=unmanaged3ByteBgr opaque,graphics.opts.sizes=250: d3d_3byte_noopt: 1092.344644 (var=1.19%) (100.0%) d3d_3byte_opt: 231106.37876 (var=0.64%) (21156.91%) graphics.imaging.tests.drawimagetxform,graphics.imaging.src=unmanaged3ByteBgr opaque,graphics.opts.sizes=1000: d3d_3byte_noopt: 1718.75 (var=1.19%) (100.0%) d3d_3byte_opt: 126001.58982 (var=0.92%) (7331.0%) graphics.imaging.tests.drawimagetxform,graphics.imaging.src=unmanaged3ByteBgr opaque,graphics.opts.sizes=250: d3d_3byte_noopt: 1418.912175 (var=1.17%) (100.0%) d3d_3byte_opt: 123388.15789 (var=0.89%) (8695.97%) Results for OGL, inlcuding loops which we know are slower and won't be inlcuded in the fix: graphics.imaging.tests.drawimage,graphics.imaging.src=unmanaged3ByteBgr opaque,graphics.opts.sizes=1000: ogl_3byte_noopt: 86021.50537 (var=0.56%) (100.0%) ogl_3byte_opt: 7416.563658 (var=0.33%) (8.62%) graphics.imaging.tests.drawimage,graphics.imaging.src=unmanaged3ByteBgr opaque,graphics.opts.sizes=250: ogl_3byte_noopt: 80210.42084 (var=0.81%) (100.0%) ogl_3byte_opt: 5773.092369 (var=1.17%) (7.2%) graphics.imaging.tests.drawimagescaledown,graphics.imaging.src=unmanaged3ByteBgr opaque,graphics.opts.sizes=1000: ogl_3byte_noopt: 3155.048076 (var=0.64%) (100.0%) ogl_3byte_opt: 2708.667736 (var=0.89%) (85.85%) graphics.imaging.tests.drawimagescaledown,graphics.imaging.src=unmanaged3ByteBgr opaque,graphics.opts.sizes=250: ogl_3byte_noopt: 1621.264588 (var=1.91%) (100.0%) ogl_3byte_opt: 2673.937004 (var=1.0%) (164.93%) graphics.imaging.tests.drawimagescaleup,graphics.imaging.src=unmanaged3ByteBgr opaque,graphics.opts.sizes=1000: ogl_3byte_noopt: 5601.659751 (var=1.25%) (100.0%) ogl_3byte_opt: 10856.45355 (var=0.89%) (193.81%) graphics.imaging.tests.drawimagescaleup,graphics.imaging.src=unmanaged3ByteBgr opaque,graphics.opts.sizes=250: ogl_3byte_noopt: 2662.923045 (var=0.6%) (100.0%) ogl_3byte_opt: 10749.11347 (var=1.56%) (403.66%) graphics.imaging.tests.drawimagetxform,graphics.imaging.src=unmanaged3ByteBgr opaque,graphics.opts.sizes=1000: ogl_3byte_noopt: 3942.973523 (var=0.41%) (100.0%) ogl_3byte_opt: 5826.645264 (var=1.38%) (147.77%) graphics.imaging.tests.drawimagetxform,graphics.imaging.src=unmanaged3ByteBgr opaque,graphics.opts.sizes=250: ogl_3byte_noopt: 2360.369609 (var=1.32%) (100.0%) ogl_3byte_opt: 5833.668139 (var=0.84%) (247.15%)

EVALUATION The same applies to the OpenGL pipeline, but with a twist. Adding these loops only helps in case of scaling, straight blits are much slower in OGL because we need to upload the data scan line by scan line because of possible alignment issues (see bug 6207877). Here's some performance data for non-optimized vs optimized (with the loops added) case. The benchmark tests blit/scale of unmanaged 3bytebgr image. d3d: graphics.imaging.tests.drawimage,graphics.opts.sizes=1000: 3byte_noopt: 86301.92230 (var=0.49%) (100.0%) 3byte_opt: 104693.14079 (var=0.73%) (121.31%) graphics.imaging.tests.drawimage,graphics.opts.sizes=250: 3byte_noopt: 83373.93846 (var=1.73%) (100.0%) 3byte_opt: 103826.17728 (var=0.79%) (124.53%) graphics.imaging.tests.drawimagescaleup,graphics.opts.sizes=1000: 3byte_noopt: 1527.494908 (var=2.59%) (100.0%) 3byte_opt: 235402.19134 (var=0.56%) (15411.0%) graphics.imaging.tests.drawimagescaleup,graphics.opts.sizes=250: 3byte_noopt: 1087.896986 (var=1.48%) (100.0%) 3byte_opt: 233458.12958 (var=0.82%) (21459.58%) Summary: 3byte_noopt: Number of tests: 4 Overall average: 43072.81316563192 Best spread: 0.49% variance Worst spread: 2.59% variance (Basis for results comparison) 3byte_opt: Number of tests: 4 Overall average: 169344.90975135576 Best spread: 0.56% variance Worst spread: 0.82% variance Comparison to basis: Best result: 21459.58% of basis Worst result: 121.31% of basis Number of wins: 4 Number of ties: 0 Number of losses: 0 ogl: graphics.imaging.tests.drawimage,graphics.opts.sizes=1000: 3byte_noopt: 85970.39250 (var=0.78%) (100.0%) 3byte_opt: 7409.440175 (var=0.95%) (8.62%) graphics.imaging.tests.drawimage,graphics.opts.sizes=250: 3byte_noopt: 80193.14868 (var=5.45%) (100.0%) 3byte_opt: 5703.422053 (var=21.79%) (7.11%) graphics.imaging.tests.drawimagescaleup,graphics.opts.sizes=1000: 3byte_noopt: 5492.270138 (var=1.89%) (100.0%) 3byte_opt: 319366.47955 (var=0.89%) (5814.84%) graphics.imaging.tests.drawimagescaleup,graphics.opts.sizes=250: 3byte_noopt: 2676.494431 (var=6.58%) (100.0%) 3byte_opt: 313342.59059 (var=4.39%) (11707.2%) Summary: 3byte_noopt: Number of tests: 4 Overall average: 43583.076440138895 Best spread: 0.78% variance Worst spread: 6.58% variance (Basis for results comparison) 3byte_opt: Number of tests: 4 Overall average: 161455.48309378154 Best spread: 0.89% variance Worst spread: 21.79% variance Comparison to basis: Best result: 11707.2% of basis Worst result: 7.11% of basis Number of wins: 2 Number of ties: 0 Number of losses: 2

EVALUATION We need to add the loop.