JDK-6627356 : Try to preserve Lock invariants even when faced with Thread.stop
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.util.concurrent
  • Affected Version: 6u5,6u10,6u22,7
  • Priority: P4
  • Status: Closed
  • Resolution: Not an Issue
  • OS: generic,solaris_10,windows_xp
  • CPU: generic,x86,sparc
  • Submitted: 2007-11-08
  • Updated: 2015-09-21
  • Resolved: 2015-09-21
Related Reports
Duplicate :  
Duplicate :  
Duplicate :  
Relates :  
Relates :  
Relates :  
Relates :  
Description
If you launch the following applet and then click on the browsers reload button, you will get the exception below.  However, if you open a 2nd browser tab and launch the applet there as well, you will not get that exception when you click on the browsers reload button on either the 1st or 2nd tab containing the applets.  This happens on both IE and FireFox.  I tested this on WindowsXP using 6u5-b07.

Applet:  http://swinglabs.java.sun.com/iris/

Exception in thread "pool-6-thread-1" java.lang.IllegalMonitorStateException
   at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(Unknown Source)
   at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(Unknown Source)
   at java.util.concurrent.locks.ReentrantLock.unlock(Unknown Source)
   at java.util.concurrent.LinkedBlockingQueue.take(Unknown Source)
   at java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)
   at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
   at java.lang.Thread.run(Unknown Source)
Editor.rebuildScaledImages

Comments
7 years ago I invited others to try to take this on, but no one was foolish enough to do so. In the meantime Thread.stop has gone away, so this has mostly become a non-issue.
21-09-2015

EVALUATION I've accepted this bug as a "classes_util_concurrent" bug and changed synopsis to match the possible plan of action this discussion has converged upon. I'm still doubtful about the wisdom of modifying the very sensitive code in AbstractQueuedSynchronizer, so anyone else who feels strongly about changing this is invited to take ownership.
26-01-2008

EVALUATION The code Martin sketched out may be an oversimplication. And it would only help if the ThreadDeath was thrown while actually parked; if that happened just after then we'd have the same problem. But philosophically, having Condition.await() guarantee to not return unless it holds the lock, seems the right solution. In practice it is difficult to implement. Further, given the current situation issues a Thread.interrupt followed by a Thread.stop, there's no guarantee that the thread will be in the Condition code - that all depends on the number of threads involved, the load on the system and the number of CPU's (given there is a pause between the interrupt and the stop). I am totally opposed to intercepting ThreadDeath and presuming that it means "shutdown the executor". stop() could trigger numerous different failure modes that convert the ThreadDeath to a different exception. And applications might still use Thread.stop for their own purposes (deprecated or not). The idea solution requires more sophisticated lifecycle management API's for objects whose lifecycle is constrained by an "enclosing" object i.e. an AppContext. When the AppContext is destroyed then the objects it contains should be "destroyed" too. We can't rely on Applets to take care of this themselves, in the general case.
06-12-2007

EVALUATION I think that trying to be more robust about re-acquiring the lock is going in the wrong direction. I think instead the code should "fail fast" and propagate the ThreadDeath out. I agree with the statement to make the ThreadPool code detect ThreadDeath and force a termination of the ThreadPool owning the current thread.
06-12-2007

EVALUATION I agree with David's analysis, except for where one could apply a bandaid in the core library code. I don't think anything can be done in LinkedBlockingQueue, but I do think one could experiment in AbstractQueuedSynchronizer. Of course, asynchronous exceptions can occur anywhere, but it is most likely that they occur during a call to LockSupport.park(). We could modify the code by imagining that instead of park() always simply returning, it might occasionally throw a spurious ThreadDeath. A method like condition await is currently written in the "optimistic" style int savedState = fullyRelease(node); while (!isOnSyncQueue(node)) { LockSupport.park(this); } acquireQueued(node, savedState) whereas we could write in a more paranoid style int savedState = fullyRelease(node); try { while (!isOnSyncQueue(node)) { LockSupport.park(this); } } finally { acquireQueued(node, savedState); } but this is difficult code with many subtle details to get right. Possible loss of performance is also definitely a consideration. Anyways, this would almost guarantee that a lock is reacquired after returning from await(). I don't know exactly what the impact of such a change would be on thread pool shutdown. Perhaps thread pool code would also have to be taught about possible ThreadDeath.
06-12-2007

EVALUATION In summary I see three different actions needed, as outlined by Ken: 1. AWT/Swing has to ensure it cleanly shuts down any thread-pools or timers used within the framework when disposing of a particular AppContext. 2. The Applets concerned need to ensure they shutdown any thread-pool they use internally as part of Applet.stop() or Applet.destroy() 3. Investigate whether the code in LinkedBlockingQueue needs to be, and can be, more robust in the face of asynchronous exceptions. I think at most we would use the try-unlock-finally to mask the IllegalMonitorStateException. But rethrowing the ThreadDeath will only change the stacktrace information not the actual behaviour. I think it unreasonable to expect this group of utility classes to handle async exceptions. Aside: if all the threads are part of a ThreadGroup defined by the AppContext, then the uncaughtException method could note that the context was being disposed and so ignore all exceptions during this teardown phase.
05-12-2007

EVALUATION The suggested finally clause should work in 99% of all situations and would have better behavior than currently. Another possible solution would involve catching all remaining Throwables aside from InterruptedException, storing off the Throwable, and using sun.misc.Unsafe.throwException() in the finally block to rethrow the Throwable. However since this would introduce Sun-specific code I would personally opt for the previous suggestion despite its limitations.
13-11-2007

EVALUATION } finally { if (unexpectedExceptionOccurred) { ... } else { lock.unlock(); } } But this finally clause can not tell whether the unexpected exception prevented reacquisition of the lock or not and might not unlock a lock the current thread owns. At best it should do this: } finally { try { lock.unlock(); } catch(IllegalMonitorStateException imse) { if (!unexpectedException) throw imse; else { // ??? } } } That leaves the ??? as to how to report to a higher-level something really bad has happened. But I don't see it as feasible for this code, in a collection class, to (a) try to accommdate async exceptions (especially as the lock/condition can be left in a corrupt state - as could the queue itself) (b) assume responsibility for trying to inform callers - other than by letting the original exception propagate
13-11-2007

EVALUATION I don't have a complete solution in mind, but here is a suggested modification to LinkedBlockingQueue to at least detect the ThreadDeath and not cause the IMSE to be thrown. Higher-level code could then catch ThreadDeath and cope with it. ReentrantLock lock = ...; boolean unexpectedExceptionOccurred = false; lock.lockInterruptibly(); try { try { unexpectedExceptionOccurred = true; lock.await(); unexpectedExceptionOccurred = false; } catch (InterruptedException e) { // handle InterruptedException unexpectedExceptionOccurred = false; } } finally { if (unexpectedExceptionOccurred) { // Signal via thread-local storage or similar // to higher-level pieces of code that // catastrophic failure occurred? // Explicitly throw InterruptedException? // (Would probably require catching ThreadDeath -- // not desirable) // Consider public API changes for this case // in future releases? } else { lock.unlock(); } }
13-11-2007

EVALUATION Please give an example of where in the code you would expect the ThreadDeath to be converted to InterruptedException as I am unclear what happens to the lock.unlock() inside take() in your envisaged scenario. The key issue here is whether or not the lock was reacquired as part of await(). If we don't know the answer to that then we can't make an informed decision as to whether or not to unlock the lock. Hence my suggestion that we could hide the IMSE, but that we'd still be leaving the lock/condition in an indeterminate and possibly corrupt state which might lead to further exceptions in other threads. Also any code that presumes that ThreadDeath implies "shutdown the TPE" would have to be part of the TPE. The blocking queue code is not part of the TPE and the queue has no knowledge that it is being used in a TPE or by a TPE worker thread. So again I'm unclear exactly what code you want to catch the ThreadDeath.
13-11-2007

EVALUATION Note that the problem manifests beyond LinkedBlockingQueue.take(). Sister bug 6572867 has essentially the same problem, but with LinkedBlockingQueue.take() replaced by DelayQueue.take(). If we are to do anything here, it would be at a lower layer, either in ReentrantLock or in AbstractQueuedSynchronizer.
13-11-2007

EVALUATION No. I would suggest that code be added to translate the ThreadDeath into an InterruptedException (which should be a very small code addition and which doesn't have to handle multiple ThreadDeaths) and to simultaneously signal termination of the TPE which owns the thread.
13-11-2007

EVALUATION You could hide the IllegalMonitorStateException but the ReentrantLock and its Condition object could be in an indeterminate/invalid state and subsequent use by other threads could lead to a range of possible exceptions. Is hiding the IMSE what is desired?
13-11-2007

EVALUATION The common point in all of the stack traces has been in LinkedBlockingQueue.take() -- so I think better robustness in the implementation of this one API is probably warranted and is hardly excessive effort.
12-11-2007

EVALUATION I'm unclear how appContext.dispose knows the set of threads to try and stop but I'll assume they are in the same ThreadGroup. If that is the case and the TPE threads are also in that group then the interruption of the TPE worker threads will just cause them to be replaced by new worker threads. This is as designed and desired. The TPE must be explicitly shutdown and if the most prompt shutdown is required then shutdownNow() must be used. I do not believe it is feasible to write ThreadPoolExecutor to be aware of asynchronous exceptions and handle them. The async exception could hit at absolutely any point in the code. It might be possible to do slightly better than today - if we assume there are particular code sections that are more worthy of attempted protection than others - but this seems an excessive effort for a minimal gain. Nor should TPE make assumptions about what encountering a ThreadDeath exception means. Note that practicaly no code is written to be async exception safe so it is very easy for a ThreadDeath exception to be converted to a different exception due to execution of finally blocks where invariants are found to be violated - ref the case in point. Thread.stop is a "hammer" and when you use a hammer you should expect breakage - caveat emptor. As for the specification of Lock.lock() it mirrors the normal/expected non-interruptibility of monitor acquisition. Only having interruptible locking would be problematic for application code that needs to guarantee certain actions occur (not in the face of async exceptions, but in the face of sync interrupt requests) and would cause API pollution due to the proliferation of "throws InterruptedException"; or else would result in interrupts continually being swallowed by code that doesn't know how to handle them.
12-11-2007

EVALUATION Given the above, it sounds like one or more of the following should be done if they aren't already: - AWT / Swing should cooperatively shut down the ThreadPoolExecutor associated with Swing timers during AppContext destruction. (This might only apply to JDK 7.) - The Iris application should be changed to cooperatively shut down its ThreadPoolExecutors during Applet.stop() / Applet.destroy(). - The ThreadPoolExecutor code should be changed to detect asynchronous exceptions (in particular ThreadDeath) and consider that a signal to shut down the ThreadPoolExecutor in lieu of the application stopping the ThreadPoolExecutor explicitly. It sounds like the reason for the current behavior is that the ThreadPoolExecutor swallows the Thread.interrupt() calls made by the AppContext.dispose() code and spawns new worker threads, which cause AppContext.dispose() to think it isn't making progress and to start throwing asynchronous exceptions via Thread.stop().
12-11-2007

EVALUATION A thread pool must be shutdown to allow its threads to properly terminate. Probably the application should be calling shutdownNow when the application main thread is interrupted. Any class that creates an internal thread pool must provide its own shutdown method that calls shutdown or shutdownNow on the internal pool, and documents that this higher level shutdown method is required to be called at application exit time. Tasks running in a thread pool must respond to interrupt in order for them to be cancellable. LinkedBlockingQueue.take() will throw an InterruptedException if interrupted. If a thread running in a thread pool receives an asynchronous interrupt from an unrelated thread, it is not obvious if the interrupt is meant as a signal for the pool itself to shutdown or for a task running in that pool to terminate. It is possible to rewrite the condition await methods so that in the case of an asynchronous exception thrown while parked, the lock is reacquired. But... All of the lock and thread pool code in java.util.concurrent is extremely tricky, and we are wary of changing it in any way now.
12-11-2007

EVALUATION The AppContext shutdown sequence first calls Thread.interrupt() on all threads running in that AppContext. Only if threads fail to respond to the interrupt in a timely fashion does the AppContext disposal code use Thread.stop(), which is the only "heavy hammer" available in the current Java platform for stopping threads (and even it is not guaranteed reliable). One outstanding question is why the ThreadPoolExecutor does not respond to calls to interrupt(). This might be an application issue (where the Iris app does not properly shut down certain of its background ThreadPoolExecutors), but at least one occurrence of this exception, related to the termination of the per-AppContext Swing Timer thread, has no workaround at the application level. As a related aside, the specification of Lock.lock() is problematic. Lock.lockInterruptibly() is probably the only variant that should ever have been specified. More investigation is needed at the java.util.concurrent and ThreadPoolExecutor level to see what is needed in the face of AppContext termination. This should not simply be recategorized as an AWT bug.
12-11-2007

EVALUATION I followed the path of investigation proposed by Ken, and it indeed proved fruitful. Here's what's going on: AWT tries to shutdown an applet thread group, eventually calling Thread.stop and causing ThreadDeath to be thrown in each blocked applet thread, with this stack trace: java.lang.ThreadDeath at java.lang.Thread.stop(Thread.java:770) at java.lang.ThreadGroup.stopOrSuspend(ThreadGroup.java:687) at java.lang.ThreadGroup.stop(ThreadGroup.java:601) at sun.awt.AppContext.dispose(AppContext.java:456) at sun.applet.AppletClassLoader.release(AppletClassLoader.java:756) at sun.plugin.security.PluginClassLoader.release(PluginClassLoader.java:439) at sun.applet.AppletPanel.release(AppletPanel.java:204) at sun.applet.AppletPanel.sendEvent(AppletPanel.java:302) at sun.plugin.AppletViewer.onPrivateClose(AppletViewer.java:1071) at sun.plugin.AppletViewer$3.run(AppletViewer.java:1020) at java.lang.Thread.run(Thread.java:674) The thread receiving ThreadDeath is blocked in LinkedBlockingQueue.take(). More specifically, looking at the body of LinkedBlockingQueue.take(): it acquired LinkedBlockingQueue.takeLock, then waits on a condition for takeLock, which causes the lock to be released, then a ThreadDeath is magically thrown, which naturally leads to the lock not being reacquired, but the finally clause tries to unlock the already unlocked lock, leading to IllegalMonitorStateException. import java.util.concurrent.*; As in this reduced test: public class Bug { public static void main(String[] args) throws Throwable { final BlockingQueue q = new LinkedBlockingQueue(); Thread thread = new Thread() {public void run() { try { q.take(); } catch (Throwable t) { Thread.interrupted(); t.printStackTrace(); }}}; thread.start(); Thread.sleep(1000); thread.stop(); thread.join(); } } Should code in java.util.concurrent try to protect against ThreadDeath? Probably not. ThreadDeath can occur ANYWHERE. Explained further here: http://java.sun.com/javase/6/docs/technotes/guides/concurrency/threadPrimitiveDeprecation.html So this seems like a problem in the applet code in classes_awt. I don't have a good solution. It seems like a very hard problem. But setting an UncaughtExceptionHandler for the threads in question is worth investigating, although this might help to sweep problems deeper under the rug. Redispatching to classes_awt.
10-11-2007

EVALUATION I can reproduce this with Suse-Linux-mozilla-1.7.8-jdk-6u4. (Thanks to Ken for his patience tutoring this newbie.) It's curious that this bug and bug 6572867 with exactly the same symptom are applets, with both generating the same "impossible" stacktrace. There are enough uses of java.util.concurrent that serious bugs like this should have been reported elsewhere... still a deep mystery. The presence of a security manager while running an applet shouldn't matter, should it?
09-11-2007

EVALUATION The application uses only core JDK APIs. It does not use any Sun-private APIs. You should be able to add logging to ThreadPoolExecutor and replace or update rt.jar in an installed JRE to track down the problem. For this reason I'm unmarking this bug as incomplete. Please investigate it further with the test case available in the form of the app. The Iris app requires JDK 6, so that's the earliest version where we've seen this IllegalMonitorStateException. We did also see it on JDK 7 builds a couple of months ago, but have since stopped active testing on JDK 7.
08-11-2007

EVALUATION We are surprised by the submitted stacktrace. It should "never happen". Someone familiar with the application's source code needs to create a reduced test case using only core jdk apis, so that the underlying cause can be determined. It would aldo help to know which jdk versions can be used to reproduce this bug.
08-11-2007

EVALUATION This is not a bug in the Java Plug-In. It is probably a bug in the Iris applet as I don't think we cooperatively terminate our ThreadPoolExecutors. The reason for the IllegalMonitorStateException is probably that AppContext.dispose() is throwing ThreadDeath at the ThreadPoolExecutors' worker threads. On the other hand, I think at least one ThreadPoolExecutor might be used internally in some of the libraries we use, and those APIs don't have an option to shut down those internal ThreadPoolExecutors cooperatively, so an argument might be made that ThreadPoolExecutor should deal more gracefully with this termination scenario. I'm reassigning this bug to the Core Libraries team to see whether there is anything more we should be doing in this area. The submitter should file an Issue against the Iris project at http://iris.dev.java.net/ .
08-11-2007