JDK-8015730 : PIT: On Linux, OGL=true and fbobject=false leads to deadlock or infinite loop
  • Type: Bug
  • Component: client-libs
  • Sub-Component: 2d
  • Affected Version: 8
  • Priority: P2
  • Status: Closed
  • Resolution: Fixed
  • OS: linux,solaris
  • CPU: generic
  • Submitted: 2013-05-31
  • Updated: 2013-11-06
  • Resolved: 2013-07-04
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 8
8 b100Fixed
Related Reports
Duplicate :  
Relates :  
Relates :  
Relates :  
Description
Doing PIT of jdk8 b93, we encountered the following problem:
on Linux machines with OGL enabled and fbobject explicitly set to false, all tests never end.

Please run the attached test using any Linux machine with confirmed opengl support and current state of awt team repository or PIT build for client components:

b93pit/bin/java -Dsun.java2d.opengl=True  -Dsun.java2d.opengl.fbobject=false  XformVolatile

For convenience, I'm also attaching Ctrl+Shift+\ thread dump.

It is possibly regression of JDK-8005607 or maybe not a regression but a change revealing a flaw, I don't know. Please decide what to do with this build. 

RULE 2D_JavaOGLBAT/standalone/CopyAreaPerf_PBUFFER Timeout any
RULE 2D_JavaOGLBAT/standalone/CrashOnExit_ClientVM_PBUFFER Timeout any
RULE 2D_JavaOGLBAT/standalone/CrashOnExit_ServerVM_PBUFFER Timeout any
RULE 2D_JavaOGLBAT/standalone/JFrameResizeTest_PBUFFER Timeout any
RULE 2D_JavaOGLBAT/standalone/VolatileToScreen_PBUFFER Timeout any
Comments
Verified Ubuntu 12.04!
06-11-2013

RULE 2D_JavaOGLBAT/standalone/CopyAreaGarbage_PBUFFER Timeout any RULE 2D_JavaOGLBAT/standalone/JPanelTest_PBUFFER Timeout any
08-07-2013

A possible solution for this bug can be not locking on AWT lock in "sun.awt.X11.XErrorHandlerUtil.XSync()" method, when its call is a result of invocation of the native function "Java_sun_java2d_opengl_GLXSurfaceData_initPbuffer" from the file "jdk/src/solaris/native/sun/java2d/opengl/GLXSurfaceData.c". Because currently all calls to the method "sun.java2d.opengl.OGLRenderQueue.flushAndInvokeNow(Runnable r)" are always executed with taken AWT lock, which means that any attempt to take AWT lock in that "Runnable r", when it is executed in Java2D Queue Flusher, will lead to a deadlock.
28-06-2013

A reason of the observable deadlock was defined by means of analysis of the full thread dumps of Java HotSpot Server VM, which were created from the hanging testcases "XformVolatile.java" and "AWTDeadlock.java". Both test cases hang because of the same reason. The reason is the concurrency between the threads "AWT-EventQueue-0" and "Java2D Queue Flusher" for acquisition of "sun.awt.SunToolkit.AWT_LOCK" and "sun.java2d.opengl.OGLRenderQueue.flusher" locks. A simplified call stack of the deadlock case is the following: [AWT-EventQueue-0]: at java.lang.Object.wait(Native Method) - waiting on <sun.java2d.opengl.OGLRenderQueue.flusher> (a sun.java2d.opengl.OGLRenderQueue$QueueFlusher) at java.lang.Object.wait(Object.java:502) at sun.java2d.opengl.OGLRenderQueue$QueueFlusher.flushNow(OGLRenderQueue.java:181) - locked <sun.java2d.opengl.OGLRenderQueue.flusher> (a sun.java2d.opengl.OGLRenderQueue$QueueFlusher) at sun.java2d.opengl.OGLRenderQueue$QueueFlusher.flushAndInvokeNow(OGLRenderQueue.java:194) - locked <sun.java2d.opengl.OGLRenderQueue.flusher> (a sun.java2d.opengl.OGLRenderQueue$QueueFlusher) at sun.java2d.opengl.OGLRenderQueue.flushAndInvokeNow(OGLRenderQueue.java:139) at sun.java2d.opengl.OGLSurfaceData.initSurface(OGLSurfaceData.java:322) ... [Java2D Queue Flusher]: java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <sun.awt.SunToolkit.AWT_LOCK> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) ... at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:287) at sun.awt.SunToolkit.awtLock(SunToolkit.java:242) at sun.awt.X11.XErrorHandlerUtil.XSync(XErrorHandlerUtil.java:155) at sun.awt.X11.XErrorHandlerUtil.RESTORE_XERROR_HANDLER(XErrorHandlerUtil.java:110) at sun.java2d.opengl.GLXSurfaceData.initPbuffer(Native Method) at sun.java2d.opengl.OGLSurfaceData.initSurfaceNow(OGLSurfaceData.java:269) at sun.java2d.opengl.OGLSurfaceData$1.run(OGLSurfaceData.java:324) at sun.java2d.opengl.OGLRenderQueue$QueueFlusher.run(OGLRenderQueue.java:234) - locked <sun.java2d.opengl.OGLRenderQueue.flusher> (a sun.java2d.opengl.OGLRenderQueue$QueueFlusher) So these call stacks show that "AWT-EventQueue-0" thread acquires "AWT_LOCK" and then waits until "flusher" lock is released, while the thread "Java2D Queue Flusher" acquires "flusher" lock and waits until "AWT_LOCK" is released. Code waiting for "AWT_LOCK" in the thread "Java2D Queue Flusher" was introduced by the fix for JDK-80005607.
20-06-2013

The bug was reproduced with JDK 8, which was built from the source code of JDK 8 b89 supplemented with the fix for JDK-8005607, on Linux OS 32-bit by means of both test cases (XformVolatile.java, AWTDeadlock.java). This fact proves that the reason of the bug is only the fix for JDK-8005607.
19-06-2013

Simplified version of the reproducer is attached as AWTDeadlock.java, backing out the fix for JDK-8005607 resolves deadlock.
18-06-2013

This also needs to be considered in the 7 backport of 8005607
04-06-2013

The fix for 8005607 is definitely related to this bug, but I think that it just revealed, not introduced, the problem. Both before and after the fix, GLXSurfaceData.initPbuffer() called XSync(). The difference is that after the fix XSync() is wrapped into awtLock()/awtUnlock(). This wrapping is correct, as we shouldn't operate with X display object without synchronization. However, it leads to the deadlock described in this bug. So in addition to 8005607 we need to fix the synchronization in GLXSurfaceData.
31-05-2013