JDK-8093950 : [D3D] Re-enable DirectX 9Ex to avoid losing textures when surface lost
  • Type: Bug
  • Component: javafx
  • Sub-Component: graphics
  • Affected Version: 8u20
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2014-04-07
  • Updated: 2015-06-12
  • Resolved: 2014-05-06
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 8
8u20Fixed
Related Reports
Blocks :  
Duplicate :  
Duplicate :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Description
Using D3D9Ex was originally done for RT-23704 but was subsequently disabled for RT-30352.

Benefits of doing this are:

1) Having persistent textures will help solve the Canvas OOM problem when the surface is lost. See RT-23312.

2) We can avoid the per-frame overhead of reading back the texture for each Canvas node. See RT-23074.

3) We can possibly use this to accelerate transparent rendering in the future (although this is unconfirmed). See RT-17510.


The following problems will need to be resolved before it can be re-enabled:

1) There were rendering problems when the surface was lost. See RT-27121.

2) There was a significant performance hit for certain cases of updating textures. See RT-27508.

3) It was never fully debugged for 3D (no bug ID).
Comments
Changeset: f08c40c607dd Author: vadim Date: 2014-05-06 11:32 +0400 URL: http://hg.openjdk.java.net/openjfx/8u-dev/rt/rev/f08c40c607dd
06-05-2014

+1. Verified that 3D apps work as before.
05-05-2014

Sounds great. Let's get this in for M4 after testing for today's build is done as long as Chien has no concerns. +1
05-05-2014

Vertex/index buffers behave the same as textures, they are in the same pool, so I don't see any concerns here.
03-05-2014

I don't see what I thought I remembered from before with the texture code, but the code has changed enough since then that even if it was an issue, I don't see one now. @Vadim: is there any concern about screen lock causing the vertex buffer data to go invalid, or does this only happen with textures? We use persistent vertex buffers for 3D meshes.
02-05-2014

So my only remaining concern is to check the 3D texture uploading code, which looked suspicious to me earlier (before I disabled D3D9Ex for FX 8), so I want to make sure that there are no lingering issues.
02-05-2014

I ran several tests, did a screen lock, came out of screen lock and all looks good. I ran Zoomy, did the same thing, and it works fine with no OOM bugs, meaning that in conjunction with the fix for RT-24903, which Jim already pushed, it will likely resolve RT-23312.
02-05-2014

Good. I will verify other 3D test programs when I get back to Windows today.
02-05-2014

Thanks. The code itself looks fine. I'll apply the patch and try it shortly.
02-05-2014

Yes, RT-27121 is fixed by the checking that the Present result is !FAILED instead of == D3D_OK. In the D3D9Ex it returns S_PRESENT_MODE_CHANGED when the screen is locked so we need to treat it as successful result. I've tested with Ensemble8 and 3DViewer.
02-05-2014

I'm just starting to take a look at this now. Have you verified that there are no regressions as a result of turning on D3D9Ex? We had some rendering issues in the past (unrelated to the performance issues) that were part of the reason we disabled D3D9Ex for FX 8, and I don't see any specific fixes for them. See: RT-27121 for one example. Also, my recollection is that the 3D code was never debugged for D3D9Ex. I'll look further.
02-05-2014

I've prepared D3D9Ex fix without any performance improvements, it's basically stripped down first version. http://cr.openjdk.java.net/~vadim/RT-36571/webrev.01/ I will create another JIRA for the SpiralText texture update caching in addition to RT-36770 for the effects performance improvements.
25-04-2014

From the comment that Vadim just added to RT-36473 it looks like this is a huge win for large canvas animations, to answer or at least add some data for the performance tradeoffs we've been discussing here. Animating text vs. animating canvases are both performance-oriented test cases, though I have no data on which is more common. Still the performance gain for large canvases is over 10x.
23-04-2014

Check the performance of RT-36743 before and after conversion to D3D9ex. You have to modify the submitted test case to use a smaller size (no larger than 4k x 4k, but the submitter was complaining about 3k x 3k) because they randomly set the size to 10k x 10k before they attached it to the bug report without realizing that that size would exceed our internal limits on Canvas size, but I'm guessing that their frame rate is suffering most because of the screen readback and this fix would help with that issue and I could believe that 9mpixels of readback per frame would cause quite a bit of lag.
22-04-2014

Hi Vadim, Setting implicit hard limit in code (esp. library) is generally not a good idea. We might be able to see a better approach if we can separate this caching work out on it own.
22-04-2014

The cache seems to be mapping one native cookie to another native cookie. Can these things become stale? In any case, let's move the discussion and the code changes to RT-36770.
22-04-2014

Another option would be to simply clear the cache if it has grown up to some reasonable value, say, 16. if (adaptersCache.size() > 16) adaptersCache.clear(); Would that be OK?
22-04-2014

I'm withdrawing the D3DPipeline changes which implements this cache and created RT-36770 to track this.
22-04-2014

I see the free'ing now in the C code. With respect to the Java code, displays can be added or remove. In general, we should not have code that caches with no code that free's.
21-04-2014

Steve, the code is optional as it was previously. The Java side cache probably can be cleaned after the screen configuration change but I don't see the need for it. C side cache is cleared in the release method.
21-04-2014

Thanks. Looking quickly at the code, I see some caches in both Java and C but I don't see any code that clears them or releases the objects that they cache. Further, if possible, using 9Ex should be optional (NOTE: "if possible"). It seems that the code is mostly written this way but I though I saw some hard requirements in the C code (could be mistaken). Having 9Ex be optional (default turned on) would allow us to easily compare old and new behaviour. However, if this causes too much hardship in the code, then we need to revisit this potential design goal.
21-04-2014

Here is first version: http://cr.openjdk.java.net/~vadim/RT-36571/webrev.00/ I've included all performance fixes here as well, just for completeness, probably will split them as Chien suggested.
21-04-2014

Do you want to do this performance work as a separate JIRA? It might be cleaner to separate this out and I can help to review it.
18-04-2014

Yes, of course, at least it will eliminate JNI call to the d3d9. There is a little gain on d3d9 too, about 15% on the slow machine (in HelloEffects) On d3d9ex, as I said, the performance with this cache is equal to d3d9.
18-04-2014

Hi Vadim, This is a good finding. Should this caching also potentially benefit d3d9 too? I think it is worth a quick prototype to see the potential speedup on d3d9 and d3d9ex.
18-04-2014

I found another performance hit in d3d9ex, which is D3DPipeline.nGetAdapterOrdinal method. What it does is ask d3d to enumerate its adapters and retrieve a HMONITOR handle for each adapter. Surprisingly, in d3d9ex this method is significantly slower. I noticed this in HelloEffects on the low-end machine, where live resizing became much slower. I did quick test on faster machine, and results for the 1000 Scene.snapshots of HelloEffects: d3d9: 8 sec d3d9ex - 20 sec This method is heavily used in the effects rendering, the basic pattern is: GraphicsPipeline.getPipeline().getResourceFactory(screen); This is actually very closely related to the RT-33475, where I first noticed that this native function is called repeatedly for no particular reason. Quick workaround for this is to store HMONITOR->adapter relations in the HashMap on the Java side, which brings the performance on par with d3d9.
18-04-2014

We can queue texture updates to the system memory texture and upload it to the video memory later and that will match the performance of usual d3d9 but it will require some probably non-trivial changes. Basically d3d9ex allows uploading pixels to texture only for system memory textures, which can't be used directly for rendering, but only for blitting to the default pool textures. I think that if one wants to animate not only position, but basically the size of several hundreds glyphs per frame, he should think of something more clever than that, I'm sure FX can provide several possible optimization options here.
17-04-2014

Is there a different mechanism in D3D9ex that we can use to update the glyph texture that will perform better? Is the way you are describing above "the only way it can be done" or "the way that is closest to how it works now"? All in all, though, the glyph texture is not usually updated that often, but when it sees frequent updates like it does in SpiralText then it is usually because it is being used to animate text so that is probably the one case where you do care about its performance, even if it also happens to be fairly rare in an absolute sense.
17-04-2014

Steve, SpiralText is basically this benchmark, it uploads ~1400 very small (about 10x10) glyph images into 1024x1024 glyph cache texture per frame. The difference between d3d9 and d3d9ex is that in d3d9 the textures in the managed pool can be locked, so we just lock, upload, unlock and the d3d uploads the texture into video memory at render time. While in d3d9ex one need to create a texture in the sysmemory pool, lock/upload/unlock it and then call UpdateSurface to blit the data to the texture in the default pool.
17-04-2014

More SpiralText benchmarks, this time on i5: reference - 35 fps 9ex w/o cache - 18 fps (-48%) 9ex with cache - 24 fps (-30%) Canvas bitmap test shows this: 300 monsters - 132 fps 600 monsters - 86 fps 3000 - 30 15000 - 6.5 9ex: 300 monsters - 600 fps 600 monsters - 165 fps 3000 - 35 15000 - 7.5 So Canvas will greatly benefit from the 9ex in more realistic scenarios. I don't have an NVIDIA card at hand, and I wonder why the benchmark can be slower there, there are only 1 texture upload per frame in the bitmap test. Vector: reference: 35 fps 9ex: 43 fps Again, a little faster. There are a couple of minor issues remaining and I hope to post a webrev very soon.
17-04-2014

If we have a choice between correct behaviour and slow behaviour, unless the slow down is stupid, we should go for the correct behaviour. In both 9 and 9Ex, texture data is uploaded. Is it just slower in 9Ex? Do we have a micro benchmark or something that focuses on this to prove that this is the case? It might be nice to know, but I'm not sure we could do much about it.
16-04-2014

Note that on NVIDIA the GUIMark2-bitmap canvas test is actually slower with D3D9Ex, presumably because the overhead of uploading texture data is more than the savings from not having to read back the buffer each frame.
16-04-2014

Correct, Vadim. D3D9Ex benefits canvas by avoiding the per-frame "save the texture". It also hurts other areas like SpiralText. I am asking "how much benefit does D3D9Ex provide to Canvas to balance against the lost performance in SpiralText?". The only Canvas Benchmarks I know of right now are in rt-closed/toys/CanvasTest. "ant run-bitmap" and "ant run-vector" are the basic GUIMark2-based bencharks. Zoomy ("ant run-zoomy") could provide a benchmark, but it generally runs at vsync rates so I don't think it would show up any improvement. You can run "ant -p" in that directory for other Canvas demos, but I don't think any of the others are benchmark-related. There is a buffer-filling benchmark-ish toy there too, but that is stressing "the number of rectangle requests we can enqueue" rather than rendering overhead and since it increases the number of rectangles on each frame I'm not sure how you can measure overhead other than a complicated observation about how many rectangles it "gets up to" before a particular wall clock time? It would also depend on the size of the rectangles. It's more of a stress test than a benchmark.
16-04-2014

Jim, do you have any decent Canvas benchmarks so I can test D3D9Ex? The overhead is basically "per-texture update", I'm not sure what this would mean for Canvas, I'm not familiar enough with it. Also we will not save the texture every frame if D3D9Ex is on, there is an if which checks if an RTTexture is volatile, and it's not volatile if D3D9Ex is on.
16-04-2014

What is the gain for Canvas? I would think it would depend on how many rendering commands are in each buffer since this is "per-frame" overhead. Also, while working on the buffer flushing mechanism I noticed that we save the texture on every paint even if nothing changed on it - which adds unnecessary overhead in the current system for Canvas objects that aren't being rendered to...
15-04-2014

Agree.
14-04-2014

Thanks for the update. If SpiralText is the only thing that gets slower it might be an OK tradeoff if we can fix the other rendering issues, fix the Canvas performance on NVIDIA (which also seemed to suffer from texture update slow-down), and allow canvas to run when the surface is lost.
14-04-2014

Unfortunately, it's not easy to achieve the same performance with D3D9Ex in SpiralText demo. Main issue here is huge amount (~1400) of small texture updates per frame when updating glyph cache texture, there are 800 constantly updated glyphs. Previously this texture was allocated in the managed pool, where it's backed in the system memory and can be updated quite efficiently, because actual upload to the video memory was occuring "as needed". In D3D9Ex managed pool is unavailable, so we need to update the texture in the system memory pool and copy this texture to the default pool. I implemented native texture update cache to minimize the overhead but the copying to the video memory is still occuring per texture update. I think it's possible to implement a Java side cache so the resulting texture will be copied to the video memory only before render, but this will require a lot more work and API change. Also currently this performance hit is masked by the default DirectWrite implementation which is very slow. My measurements on the low-end machine are: d3d9 - 17.55 fps d3d9ex - 6.52 fps (-62%) with cache - 7.66 fps (-56%) This is with t2k rasterizer, with default DW frame rate is almost the same 1.2 fps and is limited by the rasterizer. I will test it on faster machine later.
14-04-2014

Awesome. Jim, please test the latest version of you draining code using -Dprism.disableD3D9Ex=false if possible to see whether this can fix the screen lock issue for you.
11-04-2014

Sorry, I wasn't clear, enabling D3D9Ex indeed will allow to render in cases where previously device would be lost. Actually D3D9Ex code is there but disabled by default, so it can be tested with -Dprism.disableD3D9Ex=false
11-04-2014

Vadim, please confirm that the use 9Ex will allow us to continue to drain the buffer when the screen is locked (ie. Ctrl+Alt+Del). This is the major reason we are considering this work for 8u20. It wasn't clear to me from your previous comments that you confirmed this. This would allow Jim's code to drain the buffers and fix RT-23312 which cannot be fixed any other way (without adding new API).
10-04-2014

I am currently working on ways to make sure that the Canvas buffer is drained in any case where we can render to the destination. So, if this enables us to render to the destination in more cases, then it will help us keep the buffer drained better. You can see the webrev in RT-24903 for the latest "keep buffer drained" code if you want to test with it. It's still incomplete, but it solves a number of cases where we let the buffer fester when it was perfectly renderable...
09-04-2014

I've reproduced both main problems, performance regression and rendering problems. Performance is down due to the special path for updating textures which can't be locked, right now new texture is created for each update, so more effective algorithm will need to be implemented. Rendering problems occur because the Present method returns new HRESULT which break current implementation. The part of code which handles device lost/reset needs to be accurately revised. All in all this probably can be done for 8u20, although I'm not sure this is the best solution for Canvas problems. I think we still need to properly detect these situations (there are more situations where the Canvas's buffer is not drained)
09-04-2014

Yep. It is targeted for 8u20, but that might change depending on whether it can help and how invasive / dangerous the changes happen to be.
09-04-2014

Just a clarification - this mechanism will be useful in any case, but the main focus for looking at it now for 8u20 is whether or not it can help with "rendering while contexts would normally not be available" such as screen lock. If that isn't the case, then the issue is still "open" for other reasons, just not prioritized for near term releases...
09-04-2014

Important - the main reason for looking into this bug is so that we can continue to render to a canvas after the screen is locked. If we cannot do this due to a D3D limitation, then we need to know that right away. Vadim, please write sample code that proves or disproves this. It can be C++ or better, code that uses Prism directly. Jim can help with this code if calling Prism directly.
08-04-2014

Vadim: can you evaluate this for 8u20? The main reason for doing it would be to enable a solution for RT-23312, so please work with Jim on this.
07-04-2014