JDK-8096359 : [Canvas] Significant performance issue with extra clip and temp buffer validations
  • Type: Bug
  • Component: javafx
  • Sub-Component: graphics
  • Affected Version: 8
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2014-07-02
  • Updated: 2015-06-12
  • Resolved: 2014-07-10
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 8
8u40Fixed
Related Reports
Relates :  
Description
In implementing a solution for RT-30107 (use new 1-step clip support in Canvas code) a bottleneck was discovered in the way that we manage the clip (and temp) buffers - we validate and lock them and then unlock them on every operation that uses these buffers.  For the Bitmap test with a large number of monsters we render hundreds or thousands of images per frame and each one requires us to lock a clip texture if clipping is turned on (and if it is non-rectangular after the fix for RT-37300).

As a result, the method of performing the clip, as mentioned by RT-30107, has little effect on the performance, but a simple change to only lock the clip the first time it is used in rendering a buffer and then to only unlock it as the method exits increased performance by 6-8x.
Comments
I also get 13.5 fps with fullspeed (though I never saw the frequent jumps to 14 fps that I saw without fullspeed).
10-07-2014

I tried it on a non-retina resolution (apres-fix only since I already pushed and updated my "pre-fix" workspace), but I got 14-15 fps on non-retina and 13.5-14fps on retina up from 2.7fps before the fix.
10-07-2014

I'm using the same test case, but I just use the "ant run-bitmap" target which does not set fullspeed=true. I'm not sure why that would make mine run faster without fullspeed (and at 2-5fps fullspeed should not be affecting you in any case). My machine is identical except for a 2.6GHz i7 and maybe because I am running with retina so there are more pixels to push?
10-07-2014

Fixed in the 8u-dev rep for 8u40 with the following changeset: changeset: 7470:37119e0b1c5d date: Thu Jul 10 16:02:57 2014 -0700 summary: Fix RT-37793: Performance issues on Canvas clearing temp and clip buffers http://hg.openjdk.java.net/openjfx/8u-dev/rt/rev/37119e0b1c5d
10-07-2014

We must not bet testing the same case. I'm running rt-closed/toys/CanvasTest/src/guimark/BitmapTest.java with -Djavafx.animation.fullspeed=true I have a MBP: Processor 2.4 GHz Intel Core i7 Memory 16 GB 1600 MHz DDR3 Graphics NVIDIA GeForce GT 650M 1024 MB
10-07-2014

How powerful a Mac? On my retina MBP with nVidia 650m it went from 2.7 to 14 FPS. On Windows (same machine) it went from 9 to 15 FPS.
10-07-2014

The code looks good to me. +1 On my mac bitmapTest went from 2.6 FPS to 5.7 FPS.
10-07-2014

Yes, oval clips in run-bitmap and run-vector, but the vector test didn't have much clipping overhead in the first place. The bitmap test is the one with the biggest gains. The gains are also bigger on Mac than D3D. I didn't check if ES2 on Windows saw similar gains, though...
10-07-2014

Is Monster (rt-closed/toys/CanvasTest/src/guimark/BitmapTest.java) a good toy to test this ?
10-07-2014

Fix for faster clear methods for the temp and clip buffers: http://cr.openjdk.java.net/~flar/RT-37793/webrev.00/
10-07-2014

It looks like this was partially a red herring, the improvements I saw with a quick fix turn out to have nothing to do with extra validations, but with the fact that I was inadvertently skipping the buffer clear for the temp buffer that happens on any effect, blend mode, or clipped rendering operation. In looking as to how the clears were costing us so much I discovered that there was a fast clear method that we weren't using that gives nearly the same performance improvement (surprisingly almost as much, but not quite, as the fix that skipped the clear entirely). It works out to more than 5x on Mac/ES2. I'm still getting my Windows build environment back up to speed to see how much it helps on Windows/D3D.
09-07-2014