JDK-8240654 : Windows GDI functions can fail and cause severe UI application repaint issues
  • Type: Bug
  • Component: client-libs
  • Affected Version: 14,15
  • Priority: P2
  • Status: Closed
  • Resolution: Fixed
  • OS: windows_10
  • CPU: x86_64
  • Submitted: 2020-03-02
  • Updated: 2020-11-05
  • Resolved: 2020-06-12
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 15 JDK 16
15 b28Fixed 16Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Relates :  
Sub Tasks
JDK-8252735 :  
JDK-8252738 :  
JDK-8252939 :  
Description
ADDITIONAL SYSTEM INFORMATION :
Occurs with ZGC activated under Windows 10  Enterprise version 1909 with Java 14 release candidate Build 36 (2020/2/6) when running the J text editor v 0.23.0 - see http://armedbear-j.sourceforge.net/ - (also happens with v 0.21.0)

A DESCRIPTION OF THE PROBLEM :
I tried the Java 14 release candidate with ZGC enabled. One text editor showed serious repaint problems: only the line in the text window with the cursor position is repainted, the remainder and other parts (filelist, navigation bar) remain invisible. Repaint of a line is triggered only by placing the cursor on the line.

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Download the binary archive version 0.23.0 from https://sourceforge.net/projects/armedbear-j/files/j/0.23.0/ and unpack it and launch the main class of j.jar something like so (note: the manifest classpath of j.jar refers to the lib folder, so don't remove that one)
start <path-to-JDK14>\bin\javaw.exe -XX:+UnlockExperimentalVMOptions -XX:+UseZGC -jar j.jar
Result:
Only the title bar and the text line with the cursor gets visible. Without ZGC it works fine (tried Java14 RC with G1 and Java13 with Shenandoah). J is the only program where I found this to happen (but I did not try many, so there could be more). With J, it is always reproducible. So, if there is something hiding in ZGC, this would be a starting point for a search.

Note: J is old and not actively developed. But it is open source, so I was able to change the main method to run the entire initialization on the EDT (since, NOT doing so, MIGHT have caused threading troubles). Result: the same behavior as the original version, so "EDT or not EDT" was not the question. And other than that, I could not think of any bug or inconsistency in J that would make J responsible for its strange interaction with ZGC.

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Normal display of the entire editor window as can be seen when running with a different garbage collector.
ACTUAL -
Only the title bar and the text line with the cursor gets visible. Moving the cursor to another line repaints that line. Switching between files always updates just the line that contains the cursor.

---------- BEGIN SOURCE ----------
The J editor as described under "Steps to reproduce" (the source is downloadable from the same website as the binary: https://sourceforge.net/projects/armedbear-j/files/j/0.23.0/)
---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
No workaround known as of current.

FREQUENCY : always



Comments
Tests used for verification: j.jar Tested on Windows 10 x64 (AMRSAHU-IN.oradev.oraclecorp.com) JDK15 b22: test FAILED. TEST RESULTS: - Only Title bar and text line with cursor gets visible. JDK15b29: test PASSED TEST RESULTS:- Whole J application was visible and controls were working properly . Resolution: The fix succeeded.
31-07-2020

Changeset: 7b988b31 Author: Phil Race <prr@openjdk.org> Date: 2020-06-12 09:31:08 +0000 URL: https://git.openjdk.java.net/lanai/commit/7b988b31
02-07-2020

Changeset: 7b988b31 Author: Phil Race <prr@openjdk.org> Date: 2020-06-12 09:31:08 +0000 URL: https://git.openjdk.java.net/panama-foreign/commit/7b988b31
02-07-2020

Changeset: 7b988b31 Author: Phil Race <prr@openjdk.org> Date: 2020-06-12 09:31:08 +0000 URL: https://git.openjdk.java.net/amber/commit/7b988b31
02-07-2020

Microsoft KB article acknowledging the problem is here :- https://support.microsoft.com/en-us/help/4567569/gdi-apis-may-fail-when-large-pages-or-vad-spanning-is-used
23-06-2020

URL: https://hg.openjdk.java.net/jdk/jdk15/rev/015533451f4c User: prr Date: 2020-06-12 16:34:32 +0000
12-06-2020

Assigning this bug over to Phil from the client team.
10-06-2020

This is my PoC workaround the GDI issue uncovered by ZGC: https://cr.openjdk.java.net/~stefank/8240654/webrev.gdi_bug.05.workaround_nativeBlit/ Visual inspection of the code shows that printing might be affected as well. An (untested) idea on how to solve that could be found here: https://cr.openjdk.java.net/~stefank/8240654/webrev.gdi_bug.06.workaround_awt_PrintJob/
10-06-2020

Created a separate bugs for: JDK-8245000: Window GDI functions don't support large pages JDK-8245002: Window GDI functions don't support NUMA interleaving
14-05-2020

The same problem occurs with UseNUMAInterleaving.
29-04-2020

A workaround could be to use d3d or opengl, if that works on the machine: -Dsun.java2d.d3d=True or -Dsun.java2d.opengl=True
21-04-2020

This seems to be problem that others have encountered as well: https://stackoverflow.com/questions/58042263/setdibitstodevice-fails-when-using-large-page http://forums.codeguru.com/showthread.php?440714-Problem-with-SetDIBitsToDevice()-and-GetDIBits() https://pastebin.com/L8rrC4mQ
25-03-2020

Somewhat simplified, this is reproducing the problem: addr = <some unused virtual memory address that we guess will be free> // Reserve (and commit) first segment void* const res = VirtualAlloc2( GetCurrentProcess(), // Process (void*)addr, // BaseAddress page, // Size MEM_RESERVE|MEM_COMMIT, // AllocationType PAGE_READWRITE, // PageProtection NULL, // ExtendedParameters 0 // ParameterCount ); assert(res == addr, "Failed to commit"); // Reserve (and commit) second, contiguous segment void* const res2 = VirtualAlloc2( GetCurrentProcess(), // Process (void*)(addr + segment_size), // BaseAddress page, // Size MEM_RESERVE|MEM_COMMIT, // AllocationType PAGE_READWRITE, // PageProtection NULL, // ExtendedParameters 0 // ParameterCount ); assert(res == (addr + segment_size), "Failed to commit"); // Copy original array so that it start in the first segment, // and extends into the second segment. char* array = addr + segment_size - (array_size - epsilon); memcpy(array, rasBase, array_size); // Use the copied array int ret = SetDIBitsToDevice(hDC, dstx, dsty, width, height, 0, 0, 0, height, array /* rasBase */, (BITMAPINFO*)&bmi, DIB_RGB_COLORS); assert(ret != 0, "It fails here!");
25-03-2020

I've managed to track down this bug. The short version: The Windows SetDIBitsToDevice function fails when the argument lpvBits points to an array that spans memory that was reserved with multiple contiguous reservations. When it fails the function returns 0 and we get the repaint issues. Longer version: I found that the problems disappeared when I ran the program with -XX:+CheckJNICalls. This flag guards many paths in the JVM, but the following code is enough to "fix" the problem: struct JNINativeInterface_* jni_functions() { #if INCLUDE_JNI_CHECK if (CheckJNICalls) return jni_functions_check(); #endif // INCLUDE_JNI_CHECK return &jni_NativeInterface; } This code replaces all JNI functions with version that do more checking. I narrowed this down to the replacement of jni_GetPrimitiveArrayCritical with checked_jni_GetPrimitiveArrayCritical (and the Release counter part). The important difference that caused the code to start working is that the checked_ version creates a malloced copy of the Java heap array, and lets the native code work with that copy instead of the Java heap array directly. I traced all GetPrimitiveArrayCritical calls and started (temporarily) replacing them with GetIntArrayElements (all arrays are ints), and found that changing this single path "fixed" the problem: 1 - sun.java2d.windows.GDIBlitLoops.nativeBlit(Native Method) 2 - sun.java2d.windows.GDIBlitLoops.Blit(GDIBlitLoops.java:141) 3 - sun.java2d.pipe.DrawImage.blitSurfaceData(DrawImage.java:972) 4 - sun.java2d.pipe.DrawImage.renderImageCopy(DrawImage.java:583) 5 - sun.java2d.pipe.DrawImage.copyImage(DrawImage.java:67) 6 - sun.java2d.pipe.DrawImage.copyImage(DrawImage.java:1027) 7 - sun.java2d.pipe.ValidatePipe.copyImage(ValidatePipe.java:186) 8 - sun.java2d.SunGraphics2D.drawImage(SunGraphics2D.java:3417) 9 - sun.java2d.SunGraphics2D.drawImage(SunGraphics2D.java:3393) 10 - javax.swing.RepaintManager$PaintManager.paintDoubleBufferedImpl(RepaintManager.java:1653) 11 - javax.swing.RepaintManager$PaintManager.paintDoubleBuffered(RepaintManager.java:1618) 12 - javax.swing.RepaintManager$PaintManager.paint(RepaintManager.java:1556) 13 - javax.swing.RepaintManager.paint(RepaintManager.java:1323) 14 - javax.swing.JComponent.paint(JComponent.java:1060) 15 - java.awt.GraphicsCallback$PaintCallback.run(GraphicsCallback.java:39) 16 - sun.awt.SunGraphicsCallback.runOneComponent(SunGraphicsCallback.java:75) 17 - sun.awt.SunGraphicsCallback.runComponents(SunGraphicsCallback.java:112) 18 - java.awt.Container.paint(Container.java:2002) 19 - java.awt.Window.paint(Window.java:3928) 20 - javax.swing.RepaintManager$4.run(RepaintManager.java:876) 21 - javax.swing.RepaintManager$4.run(RepaintManager.java:848) 22 - java.security.AccessController.executePrivileged(AccessController.java:753) 23 - java.security.AccessController.doPrivileged(AccessController.java:391) 24 - java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:85) 25 - javax.swing.RepaintManager.paintDirtyRegions(RepaintManager.java:848) 26 - javax.swing.RepaintManager.paintDirtyRegions(RepaintManager.java:823) 27 - javax.swing.RepaintManager.prePaintDirtyRegions(RepaintManager.java:772) 28 - javax.swing.RepaintManager$ProcessingRunnable.run(RepaintManager.java:1884) 29 - java.awt.event.InvocationEvent.dispatch(InvocationEvent.java:316) 30 - java.awt.EventQueue.dispatchEventImpl(EventQueue.java:770) 31 - java.awt.EventQueue$4.run(EventQueue.java:721) 32 - java.awt.EventQueue$4.run(EventQueue.java:715) 33 - java.security.AccessController.executePrivileged(AccessController.java:753) 34 - java.security.AccessController.doPrivileged(AccessController.java:391) 35 - java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:85) 36 - java.awt.EventQueue.dispatchEvent(EventQueue.java:740) 37 - java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:203) 38 - java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:124) 39 - java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:113) 40 - java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:109) 41 - java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101) 42 - java.awt.EventDispatchThread.run(EventDispatchThread.java:90) and in nativeBlit this call: srcOps->GetRasInfo(env, srcOps, &srcInfo); leads to BufImg_GetRasInfo, with the following call that creates the "problematic" array: bipriv->base = (*env)->GetPrimitiveArrayCritical(env, bisdo->array, NULL); This array is then used (as rasBase) in nativeBlit here: if (fastBlt) { // Window could go away at any time, leaving bits on the screen // from this GDI call, so make sure window still exists if (::IsWindowVisible(dstOps->window)) { // Could also call StretchDIBits. Testing showed slight // performance advantage of SetDIBits instead, so since we // have no need of scaling, might as well use SetDIBits. SetDIBitsToDevice(hDC, dstx, dsty, width, height, 0, 0, 0, height, rasBase, (BITMAPINFO*)&bmi, DIB_RGB_COLORS); } } See: https://docs.microsoft.com/en-us/windows/win32/api/wingdi/nf-wingdi-setdibitstodevice I changed the (deeply nested) ::malloc call in checked_jni_GetPrimitiveArrayCritical with my own allocator. The first failing call to SetDIBitsToDevice happened with the address 0000100000C7F018 and byte size 181406. That is, it crosses two heap segments (2MB). I tweaked the allocator so that the array-copy ending up in rasBase had the same offset into a 2MB segment, and then tested a few different allocation strategies: 1) ZGC style heap allocation - Committing multiple physical memory in 2MB page files. - Reserving memory in 2MB segments as placeholders. - Committing memory by mapping views over the 2MB placeholders. 2) With placeholders and mapped views, but one large segment - Committing one single large page file. - Reserving one large chunk - Committing one large chunk 3) Like (1) above but without page files, placeholders, or mapped views, but with plain VirtualAlloc2 calls - Reserve multiple 2MB segments - Commit multiple 2MB segments 4) Reserve and commit at the same time - Reserve with commit multiple 2MB segments 5) One large reservation - Reserve one large segment - Commit multiple 2MB segments 1, 3 and 4 causes this bug 2 and 5 removes the failure This shows that SetDIBitsToDevice doesn't fail because of multiple contiguous committed memory segments, but because of multiple contiguous reserved memory segments. I've traced the values to and from SetDIBitToDevice from a good run: [I] BufImg_GetRasInfo fastBlt dstx: 8 [I] BufImg_GetRasInfo fastBlt dsty: 31 [I] BufImg_GetRasInfo fastBlt width: 649 [I] BufImg_GetRasInfo fastBlt height: 699 [I] BufImg_GetRasInfo fastBlt rasBase: 0000200011D7F018 [I] BufImg_GetRasInfo fastBlt ret: 699 and a bad run: [I] BufImg_GetRasInfo fastBlt dstx: 8 [I] BufImg_GetRasInfo fastBlt dsty: 31 [I] BufImg_GetRasInfo fastBlt width: 649 [I] BufImg_GetRasInfo fastBlt height: 699 [I] BufImg_GetRasInfo fastBlt rasBase: 0000200011D7F018 [I] BufImg_GetRasInfo fastBlt ret: 0 This happens with ZGC because of the way we reserve memory. Other GCs fail in a similar way if we run them with -XX:+UseLargePages, because this turns on -XX:+UseLargePagesIndividualAllocation which also cause memory to be reserved in 2MB segments. (Note: you have to enable Page Locking Privilages and logging in and out for this to take - lock at output from a run to see it succeeded) I don't know why SetDIBitToDevice fails when arrays span multiple reservations. Is this a Windows bug?
24-03-2020

This issue is easily reproducible on jdk15 also. This issue seems to exist from the time ZGC is implemented in windows (JDK-8233299) 14 ea b26 - Fail 14 GA - Fail 15 ea b13 - Fail This issue is not reproducible on Linux
06-03-2020