JDK-8262254 : Crash on graphics card switch when dialog is open with Metal API validation
  • Type: Bug
  • Component: client-libs
  • Sub-Component: 2d
  • Affected Version: internal
  • Priority: P2
  • Status: Closed
  • Resolution: Duplicate
  • Submitted: 2021-02-23
  • Updated: 2021-03-04
  • Resolved: 2021-03-04
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
internalResolved
Related Reports
Duplicate :  
Relates :  
Relates :  
Relates :  
Relates :  
Description
I initially ran into this on NetBeans, and was able to then reproduce it with SwingSet2.

To reproduce this:

1. Run SwingSet2 as follows:

$ export METAL_DEVICE_WRAPPER_TYPE=1
$ export MTL_SHADER_VALIDATION=1
$ java -Dsun.java2d.metal=True -jar SwingSet2.jar

2. Switch to the JOptionPage demo tab

3. Open any dialog, such as "Show input dialog"

4. Then do something that causes a graphics card switch. It will crash when switching from the integrated graphics card to the discrete graphics card or vice versa (it is only necessary to switch in one direction).

BUG: it will crash as soon as the graphics card switch happens.

2021-02-23 09:35:48.956 java[93635:412999] Metal API Validation Enabled
2021-02-23 09:35:49.253 system_profiler[93637:413103] Metal API Validation Enabled
Metal pipeline enabled on screen 2077748985
Metal pipeline enabled on screen 69734662
-[MTLDebugRenderCommandEncoder setFragmentSamplerState:atIndex:]:1842: failed assertion `sampler is associated with a different device'
Abort trap: 6

See the attached crash report.
Comments
I filed JDK-8262882 for the new issue noted above. I filed it as P1, since it is a serious regression. I'll try to narrow down the fix that caused the regression.
02-03-2021

Update: I think that the hard crash (the one that produces an hs_err file and happens even without Metal API validation) only happens when switching from single screen to dual screen or back, irrespective of whether that also effected a graphics card switch. If I force discrete graphics to always be used, I can still provoke the crash, by plugging and unplugging a second monitor, even though there is no graphics card switch. Conversely, if I stay on a single screen (the builtin retina display), and switch back and forth between integrated and discrete graphics, I don't get a crash (this is without Metal API validation, so isn't hitting the assertion).
02-03-2021

I can even get it to crash without metal validation, so I will file this as a new bug.
02-03-2021

I can still see frequent crashes with NetBeans when switching from integrated to discrete and back. It doesn't crash all the time, but when it does, this is usually the error I see: -[MTLDebugRenderCommandEncoder setFragmentTexture:atIndex:]:1774: failed assertion `texture is associated with a different device' At least once, it crashed and produced an hs_err file. I have attached that along with the crash report.
02-03-2021

Tested on 16" macBook - 2019 - macOS 11.1 I tested with latest sources. The crash is not reproducible with SwingSet2 dialogs and Netbeans (dialog showing on startup to Resolve problems) Looks like this bug is fixed by JDK-8262496. I could test only with default retina screen - by toggling GPU multiple times (using "Automatic Graphics Switching" setting) while macBook is on battery. Need to test by plugging in/plugging out external monitor.
01-03-2021

I've done some hacking and cannot reproduce the crash with J2Ddemo and gfxCardStatus tool anymore. Please, have a look at my patch (device_switch.patch)
26-02-2021

I've reproduced this crash with the gfxCardStatus tool as well. It can also be reproduced in the default mode (where the graphics card can switch dynamically), by launching any Java app using the OpenGL pipeline after first starting NetBeans with Metal pipeline, since the OpenGL pipeline will cause a switch to the discrete graphics chip. Btw, the reason we were mentioning dual-screen above is that the first, preliminary fix to this issue caused a dual-screen regression.
26-02-2021

I was able to reproduce this issue too even without any external monitors. I've used the following utility https://gfx.io
26-02-2021

Thanks [~kcr] for testing this thoroughly. So - looks like it is not the correct fix. I have tested the patch with SwingSet2 dialogs on the same retina screen. Netbeans was tested for GPU switch without 'resolve problems' dialog on the same retina screen. I do have the external monitor, but can't plug that to my new macBook pro 16" as there is no HDMI port on it. Hence, relying only on the same retina screen for GPU switch testing - by toggling the system preference.
25-02-2021

The patch doesn't work for me when I have a NetBeans dialog up (in my case it was the "Project Problems" dialog that comes up when one of the open projects has a missing resource). It crashes on a graphics card switch in either direction. The patch also causes a serious rendering bug in dual screen mode. When running on two screens, all windows on the primary (retina) display are rendered as a solid color (no text, no images, etc). Only when I drag the window to the external display can I see anything. See the attached screenshot: SwingSet2-blank.png
25-02-2021

I have attached a patch which fixes this particular bug. It will need some more tweak as it avoids a call in common code LWWindowPeer.java. It makes non-animating UI (all tabs of SwingSet2 except JColorChooser demo) more stable for multiple GPU switch events. Netbeans also withstood multiple GPU switch events. Request you to try out and provide feedback. (Animating UI - such as J2DDemo still fails even with this patch)
25-02-2021

I get a crash with a different assertion - failed assertion `texture is associated with a different device' Reported assertion and this assertion both point to the fact that there is access to objects allocated on older device.
24-02-2021

I was running a build that includes the fix for JDK-8262115.
23-02-2021