JDK-8065709 : Deadlock in awt/logging apparently introduced by 8019623
  • Type: Bug
  • Component: client-libs
  • Sub-Component: java.awt
  • Affected Version: 6u85,7u72,8,9
  • Priority: P2
  • Status: Resolved
  • Resolution: Fixed
  • OS: linux
  • CPU: x86
  • Submitted: 2014-11-21
  • Updated: 2015-09-29
  • Resolved: 2015-01-22
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 7 JDK 8 JDK 9
7u79Fixed 8u45Fixed 9 b52Fixed
Related Reports
Blocks :  
Duplicate :  
Relates :  
Relates :  
Description
Here at Google we are seeing a new deadlock in awt/logging, introduced apparently in the changes in jdk7u between jdk7u71 and jdk7u72.

Our top suspect is:

# User mcherkas
# Date 1396282280 -14400
#      Mon Mar 31 20:11:20 2014 +0400
# Node ID 40baa37fb20ca0411bb87f6e391a53253d4bb5e6
# Parent  2fa3608c1da848f5394a6dda775826fb988ea75e
8019623: Lack of synchronization in AppContext.getAppContext()
Reviewed-by: anthony, art

(although we don't have a reproducible test case yet)

Here's a snippet from a deadlock stacktrace:

"Replication Worker-3" Id=309 BLOCKED on sun.awt.AppContext$GetAppContextLock@22170aa9 owned by "GSE worker #18" Id=200

sun.awt.AppContext$3.run(AppContext.java:315)
 sun.awt.AppContext$3.run(AppContext.java:301)
 java.security.AccessController.doPrivileged(Native Method)
 sun.awt.AppContext.getAppContext(AppContext.java:300)
 sun.awt.AppContext$6.getAppletContext(AppContext.java:904)
 java.util.logging.LogManager.getUserContext(LogManager.java:389) <= Acquires lock
 java.util.logging.LogManager.getLogger(LogManager.java:1067)
 java.util.logging.LogManager.demandLogger(LogManager.java:430)
 java.util.logging.Logger.demandLogger(Logger.java:346)
 java.util.logging.Logger.getLogger(Logger.java:393)

"GSE worker #18 BLOCKED on sun.awt.AppContext$6@5911d676 owned by "Replication Worker-3" Id=309
java.util.logging.LogManager.getUserContext(LogManager.java:389)
 java.util.logging.LogManager.addLogger(LogManager.java:1004)
 java.util.logging.LogManager.demandSystemLogger(LogManager.java:471)
 java.util.logging.Logger.getPlatformLogger(Logger.java:473)
 java.util.logging.LoggingProxyImpl.getLogger(LoggingProxyImpl.java:41)
 sun.util.logging.LoggingSupport.getLogger(LoggingSupport.java:100)
 sun.util.logging.PlatformLogger$JavaLoggerProxy.<init>(PlatformLogger.java:639)
 sun.util.logging.PlatformLogger$JavaLoggerProxy.<init>(PlatformLogger.java:634)
 sun.util.logging.PlatformLogger.<init>(PlatformLogger.java:243)
 sun.util.logging.PlatformLogger.getLogger(PlatformLogger.java:202)
 java.awt.EventQueue.<clinit>(EventQueue.java:184)
 java.lang.Class.forName0(Native Method)
 java.lang.Class.forName(Class.java:187)
 sun.awt.SunToolkit.initEQ(SunToolkit.java:116)
 sun.awt.SunToolkit.createNewAppContext(SunToolkit.java:295)
 sun.awt.AppContext$2.run(AppContext.java:274)
 sun.awt.AppContext$2.run(AppContext.java:263)
 java.security.AccessController.doPrivileged(Native Method)
 sun.awt.AppContext.initMainAppContext(AppContext.java:263)
 sun.awt.AppContext.access$400(AppContext.java:133)
 sun.awt.AppContext$3.run(AppContext.java:318)  <= Acquires lock
 sun.awt.AppContext$3.run(AppContext.java:301)
 java.security.AccessController.doPrivileged(Native Method)
 sun.awt.AppContext.getAppContext(AppContext.java:300)
 javax.imageio.spi.IIORegistry.getDefaultInstance(IIORegistry.java:154)
 javax.imageio.ImageIO.<clinit>(ImageIO.java:65)

Comments
No plans to address for 6u.
29-01-2015

We need critical request lable for JDK-8065991 too to get it integrated into 7u79.
23-01-2015

Critical request label added to cover both fixes (991 and 709)...
22-01-2015

I made and tested the fix only with applied JDK-8065991 fix, JDK-8065991 is one line fix. So we need to push JDK-8065709 with JDK-8065991 only in this case we can be sure that we get rid of the deadlock.
22-01-2015

- Justification : Simple changes needed to resolve key deadlock - Risk Analysis : Low risk - Webrev : http://cr.openjdk.java.net/~mcherkas/8065709/webrev.01/ - Testing (done/to-be-done) : the issue can be reproduced by testcase provided by Martin( see above comments), but the testcase depends on timing and doesn't reproduce the problem permanently. So you may need to run it several times before you get a deadlock. - Back ports (done/to-be-done) : backports were done for jdk8 and 7 - FX Impact : none
22-01-2015

If I read it correctly. this deadlock hits the race when the main AppContext object has been created and the numAppContexts counter has been incremented but mainAppContext is not set. Another possible option to explore is to have JavaAWTAccess.getAppletContext simply return null if it determines the main AppContext is being initialized.
02-12-2014

Thanks Daniel et al for working on this. Google's experience is that this is one of the most severe jdk bugs in recent years, because it tends not to fail during testing and then sporadically lock up vital processes in production.
02-12-2014

Hi Martin - I believe it's the same deadlock than the one you will get if you remove the call to Logger.getLogger(...) at line 202 in the test I proposed for JDK-8065991 - and which my proposed patch does not fix. http://cr.openjdk.java.net/~dfuchs/webrev_8065991/webrev.00/test/java/util/logging/LogManagerAppContextDeadlock.java.html
28-11-2014

I stumbled upon a self-contained reproduction with 7u72 (although highly timing-dependent YMMV). import java.util.*; import java.util.concurrent.*; import java.lang.reflect.*; import java.util.logging.*; import java.lang.management.*; public class AwtLoggingDeadlock { static boolean holdsLock(Thread thread) throws Throwable { ThreadMXBean mbean = ManagementFactory.getThreadMXBean(); long[] threadIds = { thread.getId() }; ThreadInfo threadInfo = mbean.getThreadInfo(threadIds, true, true)[0]; MonitorInfo[] monitorInfo = threadInfo.getLockedMonitors(); //System.err.println(Arrays.toString(monitorInfo)); return monitorInfo.length > 0; } public static void main(String[] args) throws Throwable { System.setSecurityManager(new PermissiveSecurityManger()); Class<?> k = Class.forName("sun.awt.AppContext"); //System.err.println(Arrays.asList(k.getDeclaredFields())); Field f = k.getDeclaredField("getAppContextLock"); f.setAccessible(true); Runnable loadImageIO = new Runnable() { public void run() { try { Class.forName("javax.imageio.ImageIO"); } catch (Throwable t) { throw new Error(t); }}}; Thread t = new Thread(loadImageIO); t.start(); do {} while (!holdsLock(t)); while (t.isAlive()) LogManager.getLogManager().getLoggerNames(); System.out.println("Bye"); } static class PermissiveSecurityManger extends SecurityManager { @Override public void checkPermission(java.security.Permission p) {} } } Giving deadlock (from jstack): Found one Java-level deadlock: ============================= "Thread-0": waiting to lock monitor 0x00007f97740060b8 (object 0x0000000758cf03f0, a java.lang.Class), which is held by "main" "main": waiting to lock monitor 0x00007f9774004c18 (object 0x0000000758cf2810, a sun.awt.AppContext$GetAppContextLock), which is held by "Thread-0" Java stack information for the threads listed above: =================================================== "Thread-0": at sun.util.logging.PlatformLogger.getLogger(PlatformLogger.java:196) - waiting to lock <0x0000000758cf03f0> (a java.lang.Class for sun.util.logging.PlatformLogger) at java.awt.EventQueue.<clinit>(EventQueue.java:184) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:191) at sun.awt.SunToolkit.initEQ(SunToolkit.java:116) at sun.awt.SunToolkit.createNewAppContext(SunToolkit.java:295) at sun.awt.AppContext$2.run(AppContext.java:274) at sun.awt.AppContext$2.run(AppContext.java:263) at java.security.AccessController.doPrivileged(Native Method) at sun.awt.AppContext.initMainAppContext(AppContext.java:263) at sun.awt.AppContext.access$400(AppContext.java:133) at sun.awt.AppContext$3.run(AppContext.java:318) - locked <0x0000000758cf2810> (a sun.awt.AppContext$GetAppContextLock) at sun.awt.AppContext$3.run(AppContext.java:301) at java.security.AccessController.doPrivileged(Native Method) at sun.awt.AppContext.getAppContext(AppContext.java:300) at javax.imageio.spi.IIORegistry.getDefaultInstance(IIORegistry.java:154) at javax.imageio.ImageIO.<clinit>(ImageIO.java:65) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:191) at AwtLoggingDeadlock$1.run(AwtLoggingDeadlock.java:25) at java.lang.Thread.run(Thread.java:745) "main": at sun.awt.AppContext$3.run(AppContext.java:315) - waiting to lock <0x0000000758cf2810> (a sun.awt.AppContext$GetAppContextLock) at sun.awt.AppContext$3.run(AppContext.java:301) at java.security.AccessController.doPrivileged(Native Method) at sun.awt.AppContext.getAppContext(AppContext.java:300) at sun.awt.AppContext$6.getAppletContext(AppContext.java:904) at java.util.logging.LogManager.getUserContext(LogManager.java:389) - locked <0x0000000758cf28f0> (a sun.awt.AppContext$6) at java.util.logging.LogManager.addLogger(LogManager.java:1004) at java.util.logging.LogManager.demandSystemLogger(LogManager.java:471) at java.util.logging.Logger.getPlatformLogger(Logger.java:473) at java.util.logging.LoggingProxyImpl.getLogger(LoggingProxyImpl.java:41) at sun.util.logging.LoggingSupport.getLogger(LoggingSupport.java:100) at sun.util.logging.PlatformLogger$JavaLoggerProxy.<init>(PlatformLogger.java:639) at sun.util.logging.PlatformLogger.redirectToJavaLoggerProxy(PlatformLogger.java:230) at sun.util.logging.PlatformLogger.redirectPlatformLoggers(PlatformLogger.java:220) - locked <0x0000000758cf03f0> (a java.lang.Class for sun.util.logging.PlatformLogger) at java.util.logging.LogManager$2.run(LogManager.java:323) at java.util.logging.LogManager$2.run(LogManager.java:318) at java.security.AccessController.doPrivileged(Native Method) at java.util.logging.LogManager.readPrimordialConfiguration(LogManager.java:318) - locked <0x0000000758d10560> (a java.util.logging.LogManager) at java.util.logging.LogManager.getLogManager(LogManager.java:300) at java.util.logging.Logger.<init>(Logger.java:265) at java.util.logging.LogManager$RootLogger.<init>(LogManager.java:1468) at java.util.logging.LogManager$RootLogger.<init>(LogManager.java:1466) at java.util.logging.LogManager$1.run(LogManager.java:204) at java.security.AccessController.doPrivileged(Native Method) at java.util.logging.LogManager.<clinit>(LogManager.java:181) at AwtLoggingDeadlock.main(AwtLoggingDeadlock.java:32) Found 1 deadlock.
27-11-2014

Mikhail, the test case does not call AppContext.getAppContext() - instead it calls FakeAppContext.getAppContext() which I hand crafted to reproduce a deadlock similar to what the traces are showing. The test case can be seen there: http://cr.openjdk.java.net/~dfuchs/webrev_8065991/webrev.00/
26-11-2014

Daniel, could you please upload a test case that you wrote?
26-11-2014

8019623 fix was introduced in 8.
26-11-2014

I have one good news and one bad news: good news: I have managed to write a reproducer for the deadlock - by emulating what AppContext.getAppContext() does in a FakeAppContext class - and by making LogManager use that (through JavaAWTAccess). bad news: the fix I was proposing only works if LogManager is already initialized. Otherwise getAppContext() ends up being called from within a lock through LogManager.ensureLogManagerInitialized() - and that lock cannot be removed easily. In other words - without my proposed change, the deadlock can happen whether LogManager is initialized or not. When LogManager is already initialized, it happens in: "Thread-2": at LogManagerAppContextDeadlock$FakeAppContext.getAppContext(LogManagerAppContextDeadlock.java:55) - waiting to lock <0x000000076aee3328> (a LogManagerAppContextDeadlock$FakeAppContext$FakeAppContextLock) at LogManagerAppContextDeadlock$FakeAppContext$1.getAppletContext(LogManagerAppContextDeadlock.java:68) at java.util.logging.LogManager.getUserContext(LogManager.java:424) - locked <0x000000076aee5630> (a LogManagerAppContextDeadlock$FakeAppContext$1) at java.util.logging.LogManager.getLogger(LogManager.java:1141) at java.util.logging.LogManager.demandLogger(LogManager.java:467) at java.util.logging.Logger.demandLogger(Logger.java:455) at java.util.logging.Logger.getLogger(Logger.java:502) at LogManagerAppContextDeadlock$2.run(LogManagerAppContextDeadlock.java:105) "Thread-1": at java.util.logging.LogManager.getUserContext(LogManager.java:424) - waiting to lock <0x000000076aee5630> (a LogManagerAppContextDeadlock$FakeAppContext$1) at java.util.logging.LogManager.getLogger(LogManager.java:1141) at java.util.logging.LogManager.demandLogger(LogManager.java:467) at java.util.logging.Logger.demandLogger(Logger.java:455) at java.util.logging.Logger.getLogger(Logger.java:502) at LogManagerAppContextDeadlock$FakeEventQueue.<clinit>(LogManagerAppContextDeadlock.java:29) at LogManagerAppContextDeadlock$FakeAppContext.<init>(LogManagerAppContextDeadlock.java:50) at LogManagerAppContextDeadlock$FakeAppContext.getAppContext(LogManagerAppContextDeadlock.java:56) - locked <0x000000076aee3328> (a LogManagerAppContextDeadlock$FakeAppContext$FakeAppContextLock) at LogManagerAppContextDeadlock$1.run(LogManagerAppContextDeadlock.java:91) After my proposed changed, it only happens when LogManager has not yet been initialized: "Thread-1": at LogManagerAppContextDeadlock$FakeAppContext.getAppContext(LogManagerAppContextDeadlock.java:55) - waiting to lock <0x000000076ad9cf60> (a LogManagerAppContextDeadlock$FakeAppContext$FakeAppContextLock) at LogManagerAppContextDeadlock$FakeAppContext$1.getAppletContext(LogManagerAppContextDeadlock.java:68) at java.util.logging.LogManager.getUserContext(LogManager.java:461) at java.util.logging.LogManager.contexts(LogManager.java:490) at java.util.logging.LogManager.reset(LogManager.java:1298) at java.util.logging.LogManager.readConfiguration(LogManager.java:1379) at java.util.logging.LogManager.readConfiguration(LogManager.java:1268) at java.util.logging.LogManager$3.run(LogManager.java:434) at java.util.logging.LogManager$3.run(LogManager.java:431) at java.security.AccessController.doPrivileged(Native Method) at java.util.logging.LogManager.readPrimordialConfiguration(LogManager.java:431) - locked <0x000000076aee0130> (a java.util.logging.LogManager) at java.util.logging.LogManager.access$800(LogManager.java:155) at java.util.logging.LogManager$2.run(LogManager.java:380) at java.security.AccessController.doPrivileged(Native Method) at java.util.logging.LogManager.ensureLogManagerInitialized(LogManager.java:373) - locked <0x000000076aee0130> (a java.util.logging.LogManager) at java.util.logging.LogManager.getLogManager(LogManager.java:413) at java.util.logging.Logger.demandLogger(Logger.java:448) at java.util.logging.Logger.getLogger(Logger.java:502) at LogManagerAppContextDeadlock$2.run(LogManagerAppContextDeadlock.java:105) "Thread-0": at java.util.logging.LogManager.ensureLogManagerInitialized(LogManager.java:352) - waiting to lock <0x000000076aee0130> (a java.util.logging.LogManager) at java.util.logging.LogManager.getLogManager(LogManager.java:413) at java.util.logging.Logger.demandLogger(Logger.java:448) at java.util.logging.Logger.getLogger(Logger.java:502) at LogManagerAppContextDeadlock$FakeEventQueue.<clinit>(LogManagerAppContextDeadlock.java:29) at LogManagerAppContextDeadlock$FakeAppContext.<init>(LogManagerAppContextDeadlock.java:50) at LogManagerAppContextDeadlock$FakeAppContext.getAppContext(LogManagerAppContextDeadlock.java:56) - locked <0x000000076ad9cf60> (a LogManagerAppContextDeadlock$FakeAppContext$FakeAppContextLock) at LogManagerAppContextDeadlock$1.run(LogManagerAppContextDeadlock.java:91) So it seems that fixing the particular deadlock that this bug reports cannot be achieved by modifying LogManager only. My proposed modifications of LogManager should make it less likely to happen - since the traces shown in the description let me believe that the LogManager is already initialized. I have logged JDK-8065991 - and I'm proposing to fix JDK-8065991 in 9 & 8u (I will be looking for a volunteer to backport it on 7u). This might limit the occurrences of JDK-8065709 - but is not guaranteed to fix it completely - as per the analysis above.
26-11-2014

We have not seen reports of this bug on jdk8, but our jdk8 migration isn't far enough that this is evidence that the bug is not there. I agree with Daniel that moving lock acquisitions outside of locks is a good idea, whenever possible. "never call lock-acquiring code while holding a lock" is of course too strong, but a good strategy. We don't want class pre-loading to do unneeded initalization, but in case we are sure to need a class, pre-loading it before acquiring a lock seems reasonable.
25-11-2014

I was mistaken that this bug is present in jdk7u71 - but it is in jdk7u72 (which is a "half"-released version)
25-11-2014

@Petr your suggestion to delay the creation of logger might be worth implementing too - it should be enough to transform eventLog.<some-call> into eventLog().<some-call>, make eventLog static volatile, and initialize it in eventLog() if it's found to be null. No need for deeper synchronization, it's not an issue if it's initialized twice.
25-11-2014

Sorry for not reacting earlier. @Jeremy - you're right - and it's a very good observation. AFAICT, the problem might still be present in 8 and 9. @Martin - did you get any report of this deadlock on 8? Concerning your suggestion of preloading the EventQueue class: I remember that I had to perform some surgery in AppContext.java to guarantee that trying to get an AppContext from the LogManager did not trigger the loading/initialization of AWT. So special care must be taken to ensure that it remains so. I had a look at LogManager - and my first idea would be to see whether we could simply call getAppContext() outside of the lock: Instead of: synchronized (javaAwtAccess) { // find the AppContext of the applet code // will be null if we are in the main app context. final Object ecx = javaAwtAccess.getAppletContext(); .... maybe we could do: // find the AppContext of the applet code // will be null if we are in the main app context. final Object ecx = javaAwtAccess.getAppletContext(); synchronized (javaAwtAccess) { .... I'll have to analyze the code a bit deeper to ensure that doing so would be safe - but my first quick reading of the code lets me think that it probably is (though I need to make sure). Then we'd need to hope that this would not just shift the deadlock elsewhere :-( Comments welcome.
25-11-2014

We have a similar bug: JDK-8049585 I will close it as a duplicate of this one. 1. It's not 9-na and 8-na. I've removed these labels and updated the affected versions. Also the original fix was back ported to 6, so it it also affected. 2. There's a simple way to solver the problem: remove logging from the EventQueue as nobody use it anyway. Or delay logger initialization until it's actually used. It's simple, safe and will definitely solve the issue. 3. The problem is only with a logger inside the EventQueue class, because EventQueue is used in the AppContext initialization. So it's not a general problem. 4. There might be a 'correct' way to solve this by changing the synchronization, but it will be way more risky than removing logger from EQ.
25-11-2014

Martin, do you have such stacktrace reproducible on 8u and 9?
25-11-2014

Victor, why did you mark this bug 8-na and 9-na ? Although the deadlock has only been observed in jdk7, the likely culprit change is in all of jdk7u jdk8u jdk9. Most likely the bug affects all the releases.
25-11-2014

it might be caused by some regression bug fixes which were not back ported from 8u to 7u
25-11-2014

Presumably, it should be *relatively* possible for a keen developer to tell from the stack trace whether the problem still exists at head. The stack traces even say which frames grab the errant locks. As long as the frames still grab those locks, the deadlock is still possible.
25-11-2014

@Stephen: I am in the same situation as you - I have only ever seen a stacktrace, not a reproduction recipe, but we have multiple reports from independent sources, all with jdk7. Hopefully someone will come up with a clean recipe.
25-11-2014

Some more information... Here at Google we are seeing this problem in multiple applications, serious enough that some users resist upgrading. I'm sorry that we don't yet have a clean reproduction recipe. If you look at the provided stacktrace, we are class-loading while holding a lock, which seems like an inherent risk when calling foreign code while holding a lock. One possible fix/workaround is: public final static AppContext getAppContext() { // Avoid deadlock in java.awt.EventQueue.<clinit> (which calls this // method recursively) by eagerly pre-loading the class before // acquiring any locks. try { Class.forName("java.awt.EventQueue"); } catch (Exception e) { throw new Error(e); } AppContext.java is a scary class, using 5 different locks! How can humans get this kind of code right? Especially when combined with logging?
25-11-2014

A fairly basic troubleshooting question for the filer: q. Has this issue been seen with JDK 8 (8u25) ? As I know you fully understand, a reproducer would be very valuable, in the meantime we continue to study/evaluate this issue as a matter of some urgency.
24-11-2014

P1 is too strong for a problem with such limited information, changed to P2. We may reconsider the priority (in both ways) in the nearest future.
24-11-2014