JDK-8097430 : [Glass, Mac] Extremely poor performance with ColorfulCirclesApp on Early 2013 retina MacBook Pro
  • Type: Bug
  • Component: javafx
  • Sub-Component: graphics
  • Affected Version: 8
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2014-04-05
  • Updated: 2015-06-12
  • Resolved: 2014-05-21
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 8
8u20Fixed
Related Reports
Relates :  
Description
Animation/FPS with JavaFX 8 on my "early 2013" MacBook Pro is really, really bad. Running ColorfulCircles from Ensemble brings the system crawling to a halt and the entire system FPS drops dramatically.

This problem seems to show up in other ways: I can't get smooth animations from a maximized window no matter how simple I make it.
Comments
I have a Mac-Mini with an Intel graphics card, so I can help verify this later today.
21-05-2014

http://hg.openjdk.java.net/openjfx/8u-dev/rt/rev/0e6a232fa8c4 BTW: I've just realized that you do not necessary need an Intel card to reproduce it. Without the fix we have horrible rendering oven on my iMac if Colorful circles app is bigger that 1000x1000.
21-05-2014

Looks good. Let's get this code in soon so that it can get maximum testing for 8u20. Steve
20-05-2014

The fix looks good to me. You might want to consider converting jboolean to BOOL in initIDs() implementation explicitly using the ternary operator so as to avoid any potential compiler warnings.
20-05-2014

The updated version of the fix is available here: http://cr.openjdk.java.net/~pchelko/fx/36541/webrev.01/ Passing the boolean to initIDs doesn't look perfect, but I didn't find a better place for it..
20-05-2014

I agree with the first concern, I'll make the new version.
19-05-2014

I'm not an expert in Prism or OpenGL, so I defer to Chien and Kevin to comment on the location of the glFinish() call. I've got two concerns regarding the fix: 1. It seems expensive and unnecessary to read the property when creating every GlassFrameBufferObject instance. I think it would be better to read the property once upon the toolkit initialization and then simply use the cached value. This could be done in Java code. 2. I think more error checks are needed in the getJavaProperty method (if it remains in the code after #1 is addressed). For instance, you don't check the result of the ClassForName call. Also, you simply return if the cid or mid are nulls, but there's no a GLASS_EXCEPTION_CHECK(). And finally, after each CHECK() is performed, you proceed as if nothing's happened. I suppose there's no reason to proceed if a previous call has failed.
19-05-2014

Webrev: http://cr.openjdk.java.net/~pchelko/fx/36541/webrev/ To be reviewed by Steve and Anthony. From the previous revision I've moved the GLFinish to the unbind call, because in this place only render thread is stopped, not Appkit thread and the effect is the same. Also I've added a system property to disable the glFinish call. Testing - run ColorfulCircles app on Intel card. With the fix it renders smoothly, if disable property is set it's choppy and basically not rendering at all.
19-05-2014

Yes, sure, sorry for the delay. It's been busy time on JDK recently. I'll finish this up by Monday. I want to move the Finish call to a different place.
16-05-2014

Petr? Are you going to release the fix?
15-05-2014

Ok, the consensus is that we develop the fix based on glFinish(). We could consider a property to turn it on or off.
02-05-2014

The glFinish fix does affect FPS as reported by JavaFX runtime. For example, for GUIMark2.Bitmap, 18000 monsters, it goes down from 11.5 to 9.9. Not surprisingly, it's due to a new delay in glFinish (shown as MacView._end() in Java). FX thread can't run during that delay. I'm linking in RT-26702 as double buffering was supposed to solve (and it apparently did) a similar problem with a different app.
29-04-2014

It fixes the problem for me. The question is, does it slow down FPS. It would be good to get Oleg to talk a look.
25-04-2014

Give me a second to try it out. If the glFlush() solution fixes the problem, then that is the answer and we close the JIRA. If there is further fallout, we can address it in other JIRA.
25-04-2014

What do you think if I push the "glFlush" solution, as it's completely safe and makes us look OK instead of "completely broken"? Then I'll continue to investigate other options.
25-04-2014

I think we need to keep our minds open to solutions that fix the problem however, rendering a different way is much riskier than flushing a buffer.
25-04-2014

I don't think that updating on the render thread is a robust solution. I think that the glFinish() solution might be a good tradeoff. As Jim says, if this costs a few FPS, then the numbers may have been inflated anyway.
24-04-2014

If it's a few FPS of performance vs. a big visual improvement, the visual improvement wins. Technically, our job isn't done until the pixels are visible to the user so the old FPS measurements were not measuring the proper amount of work that was being specified by the benchmark, right?
23-04-2014

Attached the patch_render_thread. It's a quick hack, it reverts the double-buffering and makes us render on the Prism thread. Only the firs time we render on the Appkit thread, but this does't matter. For me this patch does not resolve the problem..
23-04-2014

I remember you doing this by inserting C code deep in glass? We were evaluating solutions for the lost frame problem a while back and there was the "draw from the render thread" approach and the double/triple buffering approach. You mentioned you had a hack of the "draw from the render thread" idea around, can I try it?
23-04-2014

Steve, the problem here is that our benchmarks do not show the real situation. In the colorful circles example PerformanceTracker shows 30-40 FPS while visually the performance is 2-3 FPS. This is because Prism renders asynchronously, so it can generate rendering commands for 10 frames while OpenGL really renders only one. If this call added, Prism synchronizes with OpenGL and real visual performance should be more or less the same as measured in Prism. So when I'm running benchmarks FPS reported by Prism is a bit less with the fix, but visually things get significantly better on slower card and nothing noticeable happen on a fast card. We need to find a way to measure real FPS delivered to the screen
23-04-2014

Kevin, what do you think of the glFinish()? Can we make the change and run some benchmarks to see what we get?
23-04-2014

Mike, this is the question I'm trying to answer now, because I don't yet know all the consequences of adding this call. You could use the OpenGL profiler from Apple's Graphics Tools, but I don't think you would be able to get any useful info out of it to optimize the JavaFX application. It's too low-level. Also, the results are hard to interpret. You could try, but don't expect much..
23-04-2014

I forgot to ask - are there cases where this patch would hurt performance? Does it improve things on really slow cards and make things worse on better cards? Also you mentioned GL profilers. I'm wondering if there's any guidance anywhere on how to profile my app so I can maybe push the frame rate even higher.
23-04-2014

I compiled OpenJFX and tried with/without the glFinish call. The improvement is noticeable. It's still not completely fluid but I guess I have to accept that my nice expensive Early 2013 laptop actually has a crappy GPU :/ Still, that one line makes the difference between "looks terrible" and "looks ok" so I'll take it! Thanks guys!
22-04-2014

I've noticed visual issues with benchmarks on Mac for a while. It will report a decent frame rate (and a similar frame rate to D3D), but visually it is *not* updating at that rate. This fix could be important for that case as well, and I think visual stuttering is a much worse problem than optimizing every possible nanosecond of an asynchronous scheduling model.
15-04-2014

Looks like I've got this one. Here's the patch: diff -r 372f09649262 modules/graphics/src/main/native-glass/mac/GlassFrameBufferObject.m --- a/modules/graphics/src/main/native-glass/mac/GlassFrameBufferObject.m Tue Apr 15 12:11:13 2014 +0400 +++ b/modules/graphics/src/main/native-glass/mac/GlassFrameBufferObject.m Tue Apr 15 13:51:55 2014 +0400 @@ -313,6 +313,7 @@ glBlitFramebuffer(0,0, other_fbo->_width, other_fbo->_height, 0,0, self->_width, self->_height, GL_COLOR_BUFFER_BIT, GL_LINEAR); glBindFramebufferEXT(GL_FRAMEBUFFER_EXT, self->_fboToRestore); + glFinish(); } And here's my theory about what's happening: When we are using double-buffered GL contexts all the glFlush, CGLFlushDrawable or [NSOpenGLContext flushBuffers] functions do not really wait for all OpenGL commands to finish execution, they only notify OpenGL that it has some work to do, so out rendering is completely asynchronous. So, OpenGL profiler shows, that we are getting stuck in some internal Quartz thread which is related to vsync on glFinishFenceApple command. Looks like it's where all out commands are really synchronized. So, our prism and Appkit thread is running too fast, Intel card is very slow, so we are flooding GL with commands all of which are executed and synchronized during vsync. Adding glFinish will make prism thread wait until it's job got done before starting to post new GL commands. I don't think that the proposed patch is good, because I'm just killing the idea of the prism thread and asynchronous rendering.. But the visual performance got really better.
15-04-2014

After more debugging I've found a very interesting thing: attaching OpenGL profiler to the application makes it run smoothly and all the performance problems go away. I've ensured that we are not switching to discrete card in this case..
15-04-2014

Thanks Pter. This all I can think of trying. It is also possible that the event queue is getting flooded. SWT Glass has vastly different code for runLater() versus regular Glass.
10-04-2014

Ok, I'll try to implement drawing to the view. It's just interesting where are we loosing the time
10-04-2014

The SWT port is doing two things differently. It is drawing directly and not drawing to a layer. This has to be faster than what we are doing in Glass. The reason we moved to layers was to support running in the browser where they are required. They are not required to run on the desktop. I realize that having two code paths is bad, but if there is a performance consideration, we might consider it. Can you prove that running without a layer would get back the performance (versus drawing directly)?
10-04-2014

When the double-buffering was implemented there was another approach - to rented on the screen on a prism thread. I've implemented a quick fix to enable this, and performance increased a little bit, but it's still far away from good enough. I'l try to make this more carefully. Also, I've noticed that although the demo runs very smoothly on swt glass port, the OS it lagging a bit too. It's mostly noticeable when I slide between screens with Ctrl+Arrow. The OS jerkiness is not as big as in the normal glass port, but I still see it.
10-04-2014

OpenGL profiler shows that the glass GL context which we pass to Cocoa works extremely fast and well, consuming 0.7% of the app time. But the second context which we pass to Prism consumes 63% of the app time on CGLFlushDrawable command. This command also consumes 92% of the whole GL time of the application.
10-04-2014

I have attached a reverse patch of the Mac double-buffering fix. Unfortunately it doesn't help for me - I still se extremely bad performance in case the colorful circles app is bigger that 600x600 pixels. And I have a non-retina Mac, so it's not retina specific definitely. [~snorthov] Yes, I now can reproduce the OS lags both with and without applying a reversed patch. It's reproducible only with an Intel card, not with a GeForce card.
10-04-2014

I'm able to reproduce it on my non-retina Macbook if I force it to run on Intel card. Normally it runs on the GForce card and performance seem good. However I'm doing something different from what Steve does. For me typing or mouse wiggling in another app does not have any effect, but making Ensamble8 window bigger significantly hearts the performance. It looks like it has some cutoff size and if I make the circle demo bigger than that the FPS drops to less then 5, if I make the demo smaller again the FPS rapidly increases to the normal value. BTW, I'm using this to switch between cards: http://www.heise.de/download/gfxcardstatus-1174053.html
08-04-2014

I have attached ColorfulCirclesApp that is resizable and prints FPS information. On my retina system, I get about 50 FPS regardless of how big the ColorfulCirclesApp window is resized, however, typing Spotlight or anywhere else is glacial when the ColorfulCirclesApp is resized to be big and fast when resized to be small. With -Dglass.platform=swt I get around 57 FPS and resizing has no effect on typing. In both cases, animation is pretty smooth for me. There are two problems: 1) jerky animations 2) operating system crawling It seems that running non-retina "fixes" the original problem that Mike is seeing in his app ("I can do my full screen blur") but it does not fix ColorfulCircles (still stuttery in the regular window, OS tanks when window is maximized). @Mike, does the operating system crawl when your app does the full screen blur? @Petr, are you unable to get the operating system to tank running either graphics card? My current thinking is that the problem is still in Glass, but becomes worse with retina enabled.
08-04-2014

OK I just paid the fair price and bought QuickRes. What I see with retina disabled is: 1) it makes everything a bit blurry, but it also makes animations a ton smoother and now I can do my full screen blur and so on, and it's hitting great FPS. So it seems like the extra pixels may be to blame? 2) With retina disabled ColourfulCircles is sort of a bit stuttery when in the regular non-maximised window, but the rest of the OS is still smooth and unaffected. 3) With retina disabled and Ensemble/ColourfulCircles maximised, the OS tanks once again and things like opening mission control slows right down.
08-04-2014

Interesting, it was free a while back. This site mentions alternatives (and the fact that QuickRes used to be free): http://www.everymac.com/systems/apple/macbook_pro/macbook-pro-retina-display-faq/macbook-pro-retina-display-hack-to-run-native-resolution.html I think you can still get QuickRes for free from various software mirror sites like cnet. The only reason I mentioned it is that it gives a quick menu bar toggle for various resolutions (configurable) so I use it to 1-click my way between equivalent retina and non-retina resolutions for testing.
07-04-2014

Is there a way to disable retina without QuickRes? It seems it wants me to buy it. Given that I'll probably not use it again after this ....
07-04-2014

Right. The SWT stuff just provides the surface and Prism draws whatever pixels to it so it should be fine. The SWT stuff in question is not the interop code (it's an experimental glass port). Mike, can you confirm that running qickres or another retina disabling thing does not fix the problem? We'd like to rule out retina. Thanks,
07-04-2014

The reason I asked was that the SWT code might not be retina-aware (but that was just a shot in the dark) and so switching to the SWT back end may be inadvertently turning off our retina support and using 4x less vram. On a demo like ColorfulCircles that is already blurred (manually), you might not notice the resolution loss...
07-04-2014

I was thinking about what Jim has asked in particular was it using retina resolutions. I think this is worth a confirmation.
07-04-2014

I'm pretty sure that the problem is in the double/triple buffering code in Glass (but I could be wrong of course). The smokin' gun is that the experimental SWT port that draws directly to the underlying NSOpenGLView does not have the problem. In both cases, the Prism code is the same. Of course, it could be a resource problem and that the normal Mac glass code and Prism compete for OS resources. It's a wild world.
07-04-2014

If you use a utility to disable the "retina resolutions" (I use quickres), does the problem go away? Is it affected by -Dprism.maxvram=###m and/or -Dprism.targetvram=###m (m for megabytes)?
07-04-2014

@Anthony, at one point what you said is true but on Mac, the glass team is responsible for this code. @Petr, you don't need to make the ColorfulCirclesApp window bigger to see the problem. Just run it and type in another application (I am using Eclipse).
07-04-2014

I have a MBP mid 2012 running 10.9.2. I can see the problem with Intel card when I make a window close to the screen size. I'll check if disabling my buffering would help.
07-04-2014

While that code resides in Glass source code, it logically belongs to the Graphics code because it deals with rendering, and Glass is rendering-agnostic.
07-04-2014

Petr, can you confirm this behaviour?
07-04-2014

This is a Glass issue likely way down deep in the double/triple buffering code the Petr wrote. If you run -Dglass.platform=swt, the problem goes away. To be clear, ColorfulCirclesApp is still to update when you drag around the window, but the entire desktop and typing in other applications is unaffected.
07-04-2014

Glass itself doesn't perform any rendering. Since FPS count is good, it means that Glass is able to deliver Runnables to the FX User Thread at reasonable speed. So this indeed seems to be a Graphics bug. Assigning to Chien.
07-04-2014

No, just regular JDK8.
07-04-2014

Assigning to Anthony to evaluate if this is glass or graphics bug. I don't have a Mac, so I can't check or try whether this is still reproducible. @Mike - have you tried this with 8u20?
07-04-2014

Sorry, I didn't realize it's an Ensemble app from the the title.
07-04-2014

I already gave the sample that causes this: ColorfulCircles from Ensemble. Stephen Northover says he can reproduce my issue as well.
07-04-2014

Please provide some sample you use to reproduce the problem as described at https://wiki.openjdk.java.net/display/OpenJFX/Submitting+a+Bug+Report
07-04-2014

This is likely a graphics bug rather than a glass bug, but we will need to profile ti.
07-04-2014