JDK-8074773 : Rare lost unpark when very first LockSupport.park triggers class loading
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.lang:class_loading
  • Affected Version: 7
  • Priority: P3
  • Status: Closed
  • Resolution: Won't Fix
  • OS: generic
  • CPU: generic
  • Submitted: 2015-03-09
  • Updated: 2025-04-29
  • Resolved: 2022-11-15
Related Reports
Causes :  
Relates :  
Description
At Google, we've been seeing threads waiting in LockSupport.park for some event and apparently never getting woken from the expected call to unpark.  After much effort, we seem to have a reproduction that reduces to simple calls to park and unpark, strongly suggesting a lost unpark bug in hotspot.

It's very hard to reproduce - we've typically had to spend a couple of days of CPU time per repro.  Nevertheless, we have seen enough of these (10 so far) that we can be fairly confident of being able to create the conditions again.

We've been able to reproduce this with openjdk7 and Oracle jdk1.7.0_80-b05, but not any flavor of jdk8, on Ubuntu Linux 14.04.

The repro is checked into jsr166 CVS
http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/test/jtreg/util/concurrent/locks/LockSupport/ParkLoops.java?view=markup
We've been running it with a stupid loop like:
repeat 300000 (date; time ~/jdk/jdk1.7.0_80-b05/bin/java ParkLoops)
for days until it fails/hangs.

It is easier to reproduce when run only for a few seconds (but more java process invocations) suggesting that a one-time event during startup (e.g. c2 compilation with OSR?) is the problem.

Here's a jstack output of a process not making progress:


2015-03-08 16:59:10
Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.80-b07 mixed mode):

"Attach Listener" daemon prio=10 tid=0x00007fee4c001000 nid=0x3f42 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"pool-1-thread-8" prio=10 tid=0x00007feea40d5800 nid=0x22f2 runnable [0x00007fee8478e000]
   java.lang.Thread.State: RUNNABLE
	at ParkLoops$2.run(ParkLoops.java:50)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)

"pool-1-thread-6" prio=10 tid=0x00007feea40d1800 nid=0x22f0 runnable [0x00007fee84990000]
   java.lang.Thread.State: RUNNABLE
	at ParkLoops$2.run(ParkLoops.java:50)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)

"pool-1-thread-5" prio=10 tid=0x00007feea40cf800 nid=0x22ef waiting on condition [0x00007fee84a91000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:315)
	at ParkLoops$1.run(ParkLoops.java:36)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)

"pool-1-thread-4" prio=10 tid=0x00007feea40cd800 nid=0x22ee runnable [0x00007fee84b92000]
   java.lang.Thread.State: RUNNABLE
	at ParkLoops$2.run(ParkLoops.java:50)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)

"pool-1-thread-2" prio=10 tid=0x00007feea40c9000 nid=0x22ec runnable [0x00007fee84d94000]
   java.lang.Thread.State: RUNNABLE
	at ParkLoops$2.run(ParkLoops.java:50)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)

"Service Thread" daemon prio=10 tid=0x00007feea40a5800 nid=0x22e9 runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread1" daemon prio=10 tid=0x00007feea40a3000 nid=0x22e8 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread0" daemon prio=10 tid=0x00007feea40a0000 nid=0x22e7 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" daemon prio=10 tid=0x00007feea409e000 nid=0x22e6 runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Finalizer" daemon prio=10 tid=0x00007feea407c800 nid=0x22e5 in Object.wait() [0x00007fee8631c000]
   java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	- waiting on <0x0000000758c84858> (a java.lang.ref.ReferenceQueue$Lock)
	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
	- locked <0x0000000758c84858> (a java.lang.ref.ReferenceQueue$Lock)
	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
	at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)

"Reference Handler" daemon prio=10 tid=0x00007feea407a800 nid=0x22e4 in Object.wait() [0x00007fee8641d000]
   java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	- waiting on <0x0000000758c84470> (a java.lang.ref.Reference$Lock)
	at java.lang.Object.wait(Object.java:503)
	at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
	- locked <0x0000000758c84470> (a java.lang.ref.Reference$Lock)

"main" prio=10 tid=0x00007feea4009000 nid=0x22d6 waiting on condition [0x00007feead99b000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x0000000758cd3338> (a java.util.concurrent.CountDownLatch$Sync)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
	at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:236)
	at ParkLoops.main(ParkLoops.java:56)

"VM Thread" prio=10 tid=0x00007feea4076000 nid=0x22e3 runnable 

"GC task thread#0 (ParallelGC)" prio=10 tid=0x00007feea401f000 nid=0x22d7 runnable 

"GC task thread#1 (ParallelGC)" prio=10 tid=0x00007feea4021000 nid=0x22d8 runnable 

"GC task thread#2 (ParallelGC)" prio=10 tid=0x00007feea4022800 nid=0x22d9 runnable 

"GC task thread#3 (ParallelGC)" prio=10 tid=0x00007feea4024800 nid=0x22da runnable 

"GC task thread#4 (ParallelGC)" prio=10 tid=0x00007feea4026800 nid=0x22db runnable 

"GC task thread#5 (ParallelGC)" prio=10 tid=0x00007feea4028000 nid=0x22dc runnable 

"GC task thread#6 (ParallelGC)" prio=10 tid=0x00007feea402a000 nid=0x22de runnable 

"GC task thread#7 (ParallelGC)" prio=10 tid=0x00007feea402c000 nid=0x22df runnable 

"GC task thread#8 (ParallelGC)" prio=10 tid=0x00007feea402e000 nid=0x22e0 runnable 

"GC task thread#9 (ParallelGC)" prio=10 tid=0x00007feea402f800 nid=0x22e2 runnable 

"VM Periodic Task Thread" prio=10 tid=0x00007feea40b0000 nid=0x22ea waiting on condition 

JNI global references: 120

Comments
David wrote: """ I don't think classloading in a separate thread is either feasible or solves the problem. You would need eager resolution to avoid the "has this class been loaded yet" check - and if you have that you don't need a separate thread. And the mechanism by which you synchronize with that other thread can not itself use park/unpark. """ Yes, whatever mechanism the user thread uses to communicate with the classloading thread can use neither park nor interrupt (I imagine yet another variant of park/unpark just for use by class loading), and you would want the "has this class been loaded yet" check to be lock-free and relatively cheap. Difficult, but IMO not impossibly so.
24-03-2015

Here's a proposed clarification to the LockSupport javadoc: diff -u -r1.45 LockSupport.java --- src/main/java/util/concurrent/locks/LockSupport.java 23 Feb 2015 19:49:04 -0000 1.45 +++ src/main/java/util/concurrent/locks/LockSupport.java 24 Mar 2015 18:57:19 -0000 @@ -52,12 +52,14 @@ * method is designed for use only in constructions of the form: * * <pre> {@code + * publishCurrentThreadForUnparkers(); * while (!canProceed()) { ... LockSupport.park(this); }}</pre> * * where neither {@code canProceed} nor any other actions prior to the * call to {@code park} entail locking or blocking. Because only one * permit is associated with each thread, any intermediary uses of - * {@code park} could interfere with its intended effects. + * {@code park}, including implicitly invoked class loading, could + * interfere with its intended effects. * * <p><b>Sample Usage.</b> Here is a sketch of a first-in-first-out * non-reentrant lock class: @@ -67,21 +69,27 @@ * private final Queue<Thread> waiters * = new ConcurrentLinkedQueue<>(); * + * static { + * Class<?> ensureLoaded = LockSupport.class; + * } + * * public void lock() { * boolean wasInterrupted = false; * Thread current = Thread.currentThread(); - * waiters.add(current); + * waiters.add(current); // publish for unparkers * * // Block while not first in queue or cannot acquire lock * while (waiters.peek() != current || * !locked.compareAndSet(false, true)) { * LockSupport.park(this); - * if (Thread.interrupted()) // ignore interrupts while waiting + * // ignore interrupts while waiting + * if (Thread.interrupted()) * wasInterrupted = true; * } * * waiters.remove(); - * if (wasInterrupted) // reassert interrupt status on exit + * // maintain correct interrupt status on exit + * if (wasInterrupted) * current.interrupt(); * } *
24-03-2015

David: scanAndLockForPut() may call lock() in case it tries tryLock() more than MAX_SCAN_RETRIES (64 or 1) and doesn't get the lock. While trying to obtain the lock it also scans the entries to eventually find the one with the key which it returns, but that is not the method's goal. It only ensures the lock is held when it returns and it may return an entry too if lucky. The above change with get() pre-screening putIfAbsent() does not include the priming of the map, no. It would have to include, yes. The later change does not need priming. This only makes sense for JDK7, but as you say, it has already reached the end of life. So maybe users/comunity could use this hint here to patch their JDKs on their own...
24-03-2015

scanAndLockForPut has a tryLock loop that eventually calls lock() and so park(). But I couldn't quite grok exactly what conditions would ensure we hit the lock(). The above change doesn't actually seem to be priming the parallelLockMap ??
24-03-2015

David: CHM in JDK7 uses per-segment lock (Segment is-a ReentrantLock). Unfortunately, putIfAbsent() delegates to Segment.put(...) which has the following lines at the beggining: final V put(K key, int hash, V value, boolean onlyIfAbsent) { HashEntry<K,V> node = tryLock() ? null : scanAndLockForPut(key, hash, value); So in case of contention on the Segment, the lock is obtained which may involve parking. But CHM.get() appears to be lock-free and so if the ClassLoader.getClassLoadingLock() is modified to pre-screen putIfAbsent() with get(): protected Object getClassLoadingLock(String className) { Object lock = this; if (parallelLockMap != null) { lock = parallelLockMap.get(className); if (lock == null) { Object newLock = new Object(); lock = parallelLockMap.putIfAbsent(className, newLock); if (lock == null) { lock = newLock; } } } return lock; } ...then priming the parallelLockMap with an entry for LockSupport would guarantee lock-free operation when loading/resolving the LockSupport class. But that would have some performance impact, I believe. Which could be minimized by hard-coding the conditional on the sole LockSupport key: private static final String lockSupportName = "java.util.concurrent.locks.LockSupport"; private final Object lockSupportLock = new Object(); protected Object getClassLoadingLock(String className) { Object lock = this; if (parallelLockMap != null) { if (lockSupportName.hashCode() == className.hashCode() && lockSupportName.equals(className)) { lock = lockSupportLock; } else { Object newLock = new Object(); lock = parallelLockMap.putIfAbsent(className, newLock); if (lock == null) { lock = newLock; } } } return lock; }
24-03-2015

I need time to digest what Peter wrote and to verify exactly how implicit classloading gets triggered. Otherwise I can't be sure that a pre-load anywhere but the current method will avoid the call into getClassLoadingLock() and the CHM logic. As I said above if we can be certain that putIfAbsent for a key that is definitely present will never trigger any synchronization involving the ReentrantLock then preloading will work. At present the preloading may just work because it becomes much harder to hit contention in the CHM. Note that we already document this potential problem in the LockSupport class docs: * The {@code park} * method is designed for use only in constructions of the form: * <pre>while (!canProceed()) { ... LockSupport.park(this); }</pre> * where neither {@code canProceed} nor any other actions prior to the * call to {@code park} entail locking or blocking. Because only one * permit is associated with each thread, any intermediary uses of * {@code park} could interfere with its intended effects. However I think we should call out the implicit interference that might occur due to classloading and/or initialization. I don't think classloading in a separate thread is either feasible or solves the problem. You would need eager resolution to avoid the "has this class been loaded yet" check - and if you have that you don't need a separate thread. And the mechanism by which you synchronize with that other thread can not itself use park/unpark. Built-in monitors do not suffer from this because there are no hook points for user-defined code between the point where the thread "publishes" itself for unparking, and the park call (it also uses a distinct "parker" object from that used by LockSupport). Of course we must take steps to ensure that is true and not allow JVM TI events to be posted (for example) at inopportune times.
24-03-2015

As for what we should do in openjdk itself: We've taken action to make this already very rare bug much more difficult to reproduce in future, but it's still there. I support the idea of re-architechting class loading onto dedicated threads, but that's a daunting task likely to result in failure, and I'm not going to try it myself. We could eagerly pre-load LockSupport in AppLoader. That would probably be easy and cheap and would make future occurrences of this bug less likely. Also, note that class loading runs user-written <clinit> methods, and there is a risk that those will call park. Amusingly, someone trying to avoid this very bug might add LockSupport.unpark(Thread.currentThread()); LockSupport.park(); ... only to end up triggering this very bug by doing so!
23-03-2015

The next guava release will contain this snippet in AbstractFuture: // Prevent rare disastrous classloading in first call to LockSupport.park. // See: https://bugs.openjdk.java.net/browse/JDK-8074773 Class<?> ensureLoaded = LockSupport.class; We believe this alone will largely address Google's production reliability problem that started this bug report. jsr166 CVS has also been updated to contain similar snippets, which "doesn't fix any bug" but might help prevent a similar bug in the future.
23-03-2015

Peter wrote: > Would it help if ClassLoader called unpark(Thread.currentThread()) somewhere at exit path from loading each class? The general principle is that the implicit class-loading is supposed to be completely undetectable, and unconditionally unparking the current thread violates that principle, and only works because spurious wakeups are allowed. But consider ... that there are other ways that class-loading can interfere with activity by the current thread - interrupts are another way that threads can "signal" each other, and there's no such thing as a spurious interrupt. If it is possible for user code to get an interrupt at the point where class loading is triggered, then the interrupt is likely to not be delivered to the right recipient, messing up the class loading or the user code or both. I don't know if other JDK mechanisms also have potential for subtle bugs; builtin monitors use a very similar mechanism to park/unpark; class loading could conceivably modify thread locals while running and there's no mechanism to save/restore all thread locals. Which is why I say that a complete solution seems to involve invoking the class loading in a separate dedicated thread. I would prefer not to introduce complexification of class loading that only adds a partial fix for LockSupport.unpark. The easy fix is for the small community of synchronizer implementers that make use of LockSupport.park to add preloads to <clinit>. We can help educate and help those who make third party copies by adding such preloads to the jsr166 classes.
23-03-2015

Moved to core-libs/class_loading, even though hotspot is still involved.
23-03-2015

David: You're right. Preloading (or pre-resolving) a class in one ClassLoader only helps code defined by that ClassLoader. So LockSupport would have to be pre-resolved in each ClassLoader as soon as it is created and fully functional. Perhaps lazily, as part of 1st class-loading request. But that would mean we have to pre-resolve LockSupport in each and every ClassLoader even though LockSupport isn't actually needed by code defined by some ClassLoaders. Here's a test that demonstrates lazy class resolving behaviour (I also instumented ClassLoader.loadClass() method to print when it is called for selected classes): import java.io.IOException; import java.io.InputStream; import java.net.URL; public class Test implements Runnable { public static class A { public static void m() { } } public static class B { public static void m() { } } public static class C { public static void m() { A.m(); B.m(); } } @Override public void run() { System.out.println(); System.out.println("Running " + this + " loaded by " + getClass().getClassLoader() + " ..."); System.out.println(); System.out.println("<<< 1 >>>"); B.class.getName(); // Class c = B.class; is not enough System.out.println("<<< 2 >>>"); C.m(); System.out.println("<<< 3 >>>"); A.m(); B.m(); System.out.println("<<< 4 >>>"); System.out.println(); } public static void main(String[] args) throws Exception { Runnable t1 = new Test(); t1.run(); Runnable t2 = (Runnable) new MyClassLoader(Test.class.getClassLoader()) .loadClass(Test.class.getName()) .newInstance(); t2.run(); Runnable t3 = (Runnable) new MyClassLoader(Test.class.getClassLoader()) .loadClass(Test.class.getName()) .newInstance(); t3.run(); } } class MyClassLoader extends ClassLoader { MyClassLoader(ClassLoader parent) { super(parent); } private boolean isOurName(String name) { return name.equals("Test"); } @Override protected Class<?> loadClass(String name, boolean resolve) throws ClassNotFoundException { if (isOurName(name)) { System.out.println(this.getClass().getName() + ".loadClass(\"" + name + "\", " + resolve + ") called"); synchronized (getClassLoadingLock(name)) { Class<?> c = findLoadedClass(name); if (c == null) { c = findClass(name); } if (resolve) { resolveClass(c); } return c; } } else { return super.loadClass(name, resolve); } } @Override protected Class<?> findClass(String name) throws ClassNotFoundException { if (!isOurName(name)) { throw new ClassCastException(name); } URL url = getResource(name.replace('.', '/') + ".class"); if (url == null) { throw new ClassCastException(name); } byte[] bytes = new byte[65536]; int nread; try (InputStream in = url.openStream()) { nread = in.read(bytes); } catch (IOException e) { throw (ClassCastException) new ClassCastException("name").initCause(e); } return defineClass(name, bytes, 0, nread); } } ... which prints: sun.misc.Launcher$AppClassLoader.loadClass("Test", false) called sun.misc.Launcher$ExtClassLoader.loadClass("Test", false) called Running Test@677327b6 loaded by sun.misc.Launcher$AppClassLoader@6bc7c054 ... <<< 1 >>> sun.misc.Launcher$AppClassLoader.loadClass("Test$B", false) called sun.misc.Launcher$ExtClassLoader.loadClass("Test$B", false) called <<< 2 >>> sun.misc.Launcher$AppClassLoader.loadClass("Test$C", false) called sun.misc.Launcher$ExtClassLoader.loadClass("Test$C", false) called sun.misc.Launcher$AppClassLoader.loadClass("Test$A", false) called sun.misc.Launcher$ExtClassLoader.loadClass("Test$A", false) called <<< 3 >>> <<< 4 >>> MyClassLoader.loadClass("Test", false) called Running Test@135fbaa4 loaded by MyClassLoader@7f31245a ... <<< 1 >>> MyClassLoader.loadClass("Test$B", false) called sun.misc.Launcher$AppClassLoader.loadClass("Test$B", false) called <<< 2 >>> MyClassLoader.loadClass("Test$C", false) called sun.misc.Launcher$AppClassLoader.loadClass("Test$C", false) called <<< 3 >>> MyClassLoader.loadClass("Test$A", false) called sun.misc.Launcher$AppClassLoader.loadClass("Test$A", false) called <<< 4 >>> MyClassLoader.loadClass("Test", false) called Running Test@2503dbd3 loaded by MyClassLoader@45ee12a7 ... <<< 1 >>> MyClassLoader.loadClass("Test$B", false) called sun.misc.Launcher$AppClassLoader.loadClass("Test$B", false) called <<< 2 >>> MyClassLoader.loadClass("Test$C", false) called sun.misc.Launcher$AppClassLoader.loadClass("Test$C", false) called <<< 3 >>> MyClassLoader.loadClass("Test$A", false) called sun.misc.Launcher$AppClassLoader.loadClass("Test$A", false) called <<< 4 >>>
23-03-2015

Thanks for chasing that down Peter - it was on my todo list. I forgot that parallel capable loaders changed the entry point. I'm finding your approach somewhat complex and disruptive though. That said I'm not sure preloading LockSupport will necessarily work. I need to check how bytecode resolution is performed in detail. The basic approach with bytecodes like invokeStatic is check if the target class is loaded and if not load it, then do the invokeStatic. But once we've ensured it is loaded we can skip the check for other static bytecodes for that class, and for the existing static bytecodes if executed by another thread. Preloading the class in the same method with the call to park() should certainly avoid the load check on the park() call. But preloading elsewhere? I think we will still do the loading check and so still call into getClassLoadingLock() and into CHM.putIfAbsent and so potentially block using park if there is internal contention (not necessarily related to the LockSupport class). It depends on the detailed operation of CHM and exactly how it will behave if putIfAbsent is called for something already present (and consider a different class load attempt may also be growing/mutating the map). If it turns out that is safe then it may suffice to simply prime the parallelLocksMap by calling getClassLoadingLock() in the ClassLoader constructor (checking it hasn't been overridden of course). That would fix the extensions loader, App loader and any parallel-capable custom loader that uses the default implementation of getClassLoading Lock().
22-03-2015

Something like the following, for example: http://cr.openjdk.java.net/~plevart/jdk9-dev/ClassLoader.park/webrev.jdk.02/ http://cr.openjdk.java.net/~plevart/jdk9-dev/ClassLoader.park/webrev.hotspot.02/ (If someone would like to try this without patching hotspot, he can use webrev.01 from above and specify -XX:+MustCallLoadClassInternal VM option)
22-03-2015

Well, the following snippet from systemDictionary.cpp explains why loadClass is called directly sometimes: // Call public unsynchronized loadClass(String) directly for all class loaders // for parallelCapable class loaders. JDK >=7, loadClass(String, boolean) will // acquire a class-name based lock rather than the class loader object lock. // JDK < 7 already acquire the class loader lock in loadClass(String, boolean), // so the call to loadClassInternal() was not required. // // UnsyncloadClass flag means both call loadClass(String) and do // not acquire the class loader lock even for class loaders that are // not parallelCapable. This was a risky transitional // flag for diagnostic purposes only. It is risky to call // custom class loaders without synchronization. // WARNING If a custom class loader does NOT synchronizer findClass, or callers of // findClass, the UnsyncloadClass flag risks unexpected timing bugs in the field. // Do NOT assume this will be supported in future releases. // // Added MustCallLoadClassInternal in case we discover in the field // a customer that counts on this call if (MustCallLoadClassInternal && has_loadClassInternal()) { JavaCalls::call_special(&result, class_loader, spec_klass, vmSymbols::loadClassInternal_name(), vmSymbols::string_class_signature(), string, CHECK_(nh)); } else { JavaCalls::call_virtual(&result, class_loader, spec_klass, vmSymbols::loadClass_name(), vmSymbols::string_class_signature(), string, CHECK_(nh)); } ...in spite of this synchronizaed/unsynchronized option play, there could be (an additional) private method in ClassLoader that would be invoked from VM instead of non-final public loadClass(String). This would enable tricks in Javaland.
22-03-2015

David: the stack trace that Martin produced was probably taken after the nested call to load the LockSupport class had already been over. We see the consequence, but not the "act" itself in this stack trace. Edit: sorry, I missed the one that shows the act: at java.util.concurrent.locks.LockSupport.aboutToPark(LockSupport.java:194) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:187) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290) at java.util.concurrent.ConcurrentHashMap$Segment.scanAndLockForPut(ConcurrentHashMap.java:572) at java.util.concurrent.ConcurrentHashMap$Segment.put(ConcurrentHashMap.java:434) at java.util.concurrent.ConcurrentHashMap.putIfAbsent(ConcurrentHashMap.java:1152) at java.lang.ClassLoader.getClassLoadingLock(ClassLoader.java:467) at java.lang.ClassLoader.loadClass(ClassLoader.java:408) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:309) at java.lang.ClassLoader.loadClass(ClassLoader.java:361) at ParkLoops$1.run(ParkLoops.java:58) at java.lang.Thread.run(Thread.java:745) ...hm, so in JDK7, the VM calls java.lang.ClassLoader.loadClass(ClassLoader.java:361) ?
22-03-2015

I think I understand David now. Use of LockSupport from j.u.c. classes is not problematic, since they are loaded by bootstrap class loader and any lazy loading of LockSupport class happens entirely in the VM which doesn't use park. Right, so I think that Doug's plan to replace remaining direct usages of Unsafe.park/unpark with LockSupport is even more desirable. If LockSupport is the sole entry to park/unpark functionality, then such tweaks as presented are possible in Java code before calling raw Unsafe native methods.
22-03-2015

Peter: I don't think direct use of Unsafe makes any difference at all. It is the secondary parking that causes the problem not the secondary use of the LockSupport class. There are a number of potential spot-fixes for the current case involving the AppLoader in JDK 7: 1. Preload LockSupport.park in the AppLoader 2. Add an unpark(Thread.currentThread()) at a "suitable place" in the classloader logic Neither of these necessarily help with a similar problem induced in a custom classloader. And #1 is much more efficient As I said there are other mechanisms that can implicitly lead to an unexpected use of park, but this existing AppLoader issue is by far the most likely. I'm not averse to applying fix #1 in 7u but 7u has a high bar for fixes at this stage. It may have to wait until the community takes over 7u maintenance. I don't see any fix applicable to 8/9 other than to caution anyone directly using LockSupport.park to be aware of the potential problem of recursive use.
22-03-2015

I think we agree what the issue is. Martin nailed it perfectly. Perhaps it could be formulated even more precisely: "The risk is any use of park() between thread publication and thread actually parking/attempting to park." ...as the user-perceived call to park() may not be directly wired to thread actually parking/attempting to park but may involve recursive use of park() in case class loading is triggered and the caller (initiating) ClassLoader or any of it's ancestors uses synchronization based on park/unpark. So besides AppClassLoader, any custom ClassLoader is susceptible too. I think Doug's suggestion to use Unsafe directly in juc classes may help in situations when those classes are used by user code. But the same is achieved by pre-loading LockSupport (as Martin suggests: in juc classes and 3rd party libraries to cover for old JVMs, in JVM to cover for other 3rd party libraries that don't do that). I think the issue can only manifest if code uses a non-preloaded wrapper class for invoking park() (like LockSupport) or some kind of holder class for caching Unsafe instance that is loaded lazily, for example in hypothetical patterns like this: class X { ... private static class Holder { statc final Unsafe unsafe = Unsafe.theUnsafe(); ... } // then somewhere outside Holder ... // publish current thread Holder.unsafe.park(); // ouch! So is there a way to cover for those unfortunate hypothetical uses too, besides what Martin suggested handling the class-loading in a way that guarantees not using park/unpark in the thread triggering the class-loading? And is it worth it to cover for those cases? Do you agree that invoking unpark(Thread.currentThread()) somewhere at exit path from ClassLoader.loadClass() (perhaps as the last act of ClassLoader.loadClassInternal()) would logically solve the issue? It's just not very efficient as it would trigger some spurious wake-ups. Is there a way to make this more efficient by some additional logic in LockSupport and is it worth it?
22-03-2015

Reading more carefully what Martin wrote: "The risk is ANY class load (not just LockSupport) between thread publication and call to park." I see now what you are saying. More generally the risk would be any incidental call to park between the thread publication (ie the point at which application logic would issue an unpark) and the call to park. There's no external fix for that situation in general as you must ensure that you never call park before your intended park, once another thread could identify you as the thread needing an unpark. Normally this would be facilitated by the design of the code using park/unpark - for example, in a lock implementation the thread would CAS itself onto a list of waiters and then call park. In that case there must not be any other calls to park between the CAS and the intended park, and there wouldn't be. Again the issue here is the implicit code that can appear on the way to the intended park().
22-03-2015

The issue is not the loading of LockSupport per-se, but the loading of it via the AppClassLoader. That is what gets us into the CHM code that in turn uses LockSupport (in JDK7) - via the getClassLoadingLock() call. Even if the class is actually loaded already by the bootloader, class resolution will cause the current loader to try and load it. So the problem can be seen with any class loader that could potentially usepark/unpark as part of the class loading process. It would not matter if LockSupport is already loaded, if that classloader hasn't seen that yet (and won't see it without first using park/unpark). It also won't matter if Unsafe.park/unpark is used instead. We could fix the AppLoader by patching the Launcher to load LockSupport using the AppLoader instance. I don't agree with Martin that the issue can occur with any classload. The problem is only caused when a call to LockSupport.park can recursively lead to code that also uses park, and the only implicit recursion path here is the classloading associated with LockSupport in the current loader. Doug's suggestion to remove use of LockSupport from within j.u.c would not fix the problem as far as I can see as it is the actual low-level park/unpark usage that causes the problem, not the use of it via LockSupport. To clarify: the user code calls LockSupport.park which results in a call to currentClassloader.load(LockSupport). Let the current classloader use ReentrantLock to synchronize multiple load attempts, and let that use Unsafe.park directly instead of LockSupport.park. The thread parks because of contention in the classloader. Meanwhile the application code issues LockSupport.unpark(thread). The classloading use of park wakes up sees this as a spurious wakeup and re-parks. Eventually the classloading logic unparks the thread and completes the classloading process and the thread parks. The application level unpark has been lost. The issue here is the recursive parking when calling park(). The classloading is the mechanism by which the potential for recursion has been introduced. Other mechanisms, like bytecode instrumentation, could also introduce recursion. So we could fix the current AppLoader issue, but there is no general solution to prevent recursion from being introduced.
22-03-2015

Also note that loadClassInternal is _not_ in the stacktrace that Martin produced. That is in itself somewhat strange as I thought class resolution would lead to such a call.
22-03-2015

Direct use of Unsafe by _application_ code might avoid the classloading (but then again Unsafe itself also has to be loaded). But direct use of Unsafe by the j.u.c classes makes no difference because classloading is not involved at that stage - the classloading is happening in the application code. Bytecode instrumentation could also lead to introduced code paths that might result in a park after the "publishing of the thread for unpark" and the original intended park() call.
22-03-2015

David: Direct use of Unsafe makes a difference if it doesn't trigger class-loading in situations when using LockSupport.park() would. Preloading LockSupport fixes that and direct use of Unsafe is not needed in this case. If there's no class loading triggered in critical section between publishing the thread and thread actually parking/consuming permit, there's no possibility for secondary recursive parking. Or is there? What other mechanisms do you have in mind that can implicitly lead to unexpected use of park besides lazy class loading? If there's only class loading that can trigger use of nested park(), then I think #2 would take care of that completely. Preloading LockSupport should be performed anyway to simplify possible initializaion traces, but for #2, here's an attempt at optimizing it a little: http://cr.openjdk.java.net/~plevart/jdk9-dev/ClassLoader.park/webrev.01/ This is still not very optimal in cases when a custom ClassLoader does use park/unpark based synchronization and park() is actually invoked as part of loadClass() (not a situation any more for JDK8/9 internal class loaders as they use CHM v8 which uses monitor locks).
22-03-2015

Another possible band-aid internal to j.u.c is to directly invoke the underlying Unsafe.{park,unpark} in classes including AQS. A few other j.u.c classes already do this to manage the Thread.parkBlocker field otherwise set by LockSupport methods. I was planning to somehow rework these cases in JDK9 to help with the effort to remove direct calls to Unsafe. But instead increasing usages to a few more cases might be paranoically defensible.
20-03-2015

About where and how to fix this: Third party libraries like AbstractFuture should add the blessed snippet to load LockSupport. Existing JDKs should also be fixed to pre-load LockSupport in e.g. System.java as Jeremy suggests. But ... Peter is right that there is no 100% clean simple fix that either libraries or JDK can perform. The risk is ANY class load (not just LockSupport) between thread publication and call to park. Like "don't call foreign code while holding a lock", we now have """don't cause a class-load while in an "unpark notification region" """. But that can be impossibly difficult for either libraries or the JDK to completely prevent. The only reliable solution is to re-engineer class loading - hand off the class loading to another thread, and be very careful about the code doing the handoff (it would have to wait somehow for classloading to complete, without of course using park/unpark!) But that would be a huge amount of work for little benefit...
20-03-2015

Would it help if ClassLoader called unpark(Thread.currentThread()) somewhere at exit path from loading each class? That would account for many spurious wake-ups. I guess pre-loading LockSupport is a better option... But LockSupport is just a wrapper for Unsafe.park/unpark. What if someone creates it's own wrapper (wrapping LockSupport for example - I know this is just hypothetical, but a kind of synchronization primitive is more realistical) and exposes park via it (or a method calling park). And then creates a custom ClassLoader that uses this wrapper/primitive. If very 1st call to the method (calling park) triggers class-loading of this wrapper class, the same could happen as with LockSupport.
20-03-2015

Is it worth doing something more aggressively at VM load time, rather than spraying a snippet over all the classes that the JDK controls that call LockSupport? You could load it in ThreadLocal or AtomicInteger or sun.misc.VM or some other class that always gets loaded at startup. Also, is it worth having a belt-and-suspenders approach where this gets fixed in JDK8/9, so that it can't be reintroduced via minor changes to things like CHM?
20-03-2015

Summary of root cause for our readers: If the very first call to LockSupport.park causes a class load of LockSupport.class, and if the classloader (running in the same thread) itself calls LockSupport.park (probably because of contention with another thread busy class loading) and if another thread calls LockSupport.unpark while the class loader is parked, the classloader will interpret this as a spurious wakeup from park, but the subsequent "real" park will never see its matching unpark.
20-03-2015

I thought I read somewhere that class literals do not trigger classloading, but I tested jdk7-jdk9 - you're right. My proposed classloading snippet for <clinit> now looks like this: // Prevent rare disastrous classloading in first call to LockSupport.park. // See: https://bugs.openjdk.java.net/browse/JDK-8074773 Class<?> ensureLoaded = LockSupport.class;
20-03-2015

You are right LockSupport probably won't be loaded - that would only happen if there was some contention when using the ReentrantLock within the CHM within the ClassLoader. (I keep thinking loading is more eager than it actually is.) But this is redundant: Class.forName(LockSupport.class.getName(), true, LockSupport.class.getClassLoader()); Referring to LockSupport.class already forces the class to be loaded if not already, and that will happen in the current Classloader, which is the AppClassLoader. Regarding JDK 8+, yes a custom classloader that utilizes LockSupport directly or indirectly, could trigger the same problem.
20-03-2015

> This also explains why it doesn't reproduce in 8 and 9 - because the CHM was modified to use synchronized-based locking instead of ReentrantLock because of the StackOverflowError problem (see CHMv8 JDK-8005704) Thanks. But note that in theory jdk8+ is still affected because classloaders are user-controlled and can do anything at all. Any real fix would involve something difficult like segregating any classloader activity to a separate thread.
20-03-2015

> Note that the LockSupport class will already actually be loaded (by the bootloader) because of its use in other bootstrap classes Not sure, but I don't think that's true. Evidence: - -XX:+TraceClassLoading does not list a load of LockSupport before main() - It is sufficient to do Class.forName(LockSupport.class.getName(), true, LockSupport.class.getClassLoader()); to suppress the failure (as shown by testing); you don't need to load LockSupport via the AppClassLoader > the workaround is to "preload" this class via the AppLoader before starting any threads that will be using LockSupport directly. My current plan is to add the snippet below to classes that call LockSupport.park, including https://github.com/google/guava/blob/master/guava/src/com/google/common/util/concurrent/AbstractFuture.java // Prevent rare disastrous classloading in first call to LockSupport.park. // See: https://bugs.openjdk.java.net/browse/JDK-8074773 try { Class.forName(LockSupport.class.getName(), true, LockSupport.class.getClassLoader()); } catch (ClassNotFoundException e) { throw new Error(e); }
20-03-2015

Well done Martin! Note that the LockSupport class will already actually be loaded (by the bootloader) because of its use in other bootstrap classes, but when it is first referenced in ParkLoops.run it has to go through the lookup process involving the AppLoader. This is the first time the AppLoader has been asked for this class so the classloading lock object has to be created - and it is the contention in the CHM when two threads (or more) try to do this that leads to the internal use of park, which in turn can swallow the application level use of unpark. This also explains why it doesn't reproduce in 8 and 9 - because the CHM was modified to use synchronized-based locking instead of ReentrantLock because of the StackOverflowError problem (see CHMv8 JDK-8005704) Another classic example of unintended recursive use - worthy candidate for "concurrency puzzlers" :) And yes the workaround is to "preload" this class via the AppLoader before starting any threads that will be using LockSupport directly.
20-03-2015

My mistake about calling a park-based synchronizer when about to call park got me thinking again about class loading. I instrumented all calls to LockSupport park thus: @@ -132,6 +132,7 @@ } private static void setBlocker(Thread t, Object arg) { + if (parkBlockerOffset == 0) new Throwable().printStackTrace(); // Even though volatile, hotspot doesn't need a write barrier here. unsafe.putObject(t, parkBlockerOffset, arg); } @@ -183,10 +184,16 @@ public static void park(Object blocker) { Thread t = Thread.currentThread(); setBlocker(t, blocker); + aboutToPark(t); unsafe.park(false, 0L); setBlocker(t, null); } + static void aboutToPark(Thread current) { + if (((current.getId() - 8) & 3) == 0) + new Throwable().printStackTrace(); + } + /** * Disables the current thread for thread scheduling purposes, for up to * the specified waiting time, unless the permit is available. @@ -223,6 +230,7 @@ if (nanos > 0) { Thread t = Thread.currentThread(); setBlocker(t, blocker); + aboutToPark(t); unsafe.park(false, nanos); setBlocker(t, null); } @@ -264,6 +272,7 @@ public static void parkUntil(Object blocker, long deadline) { Thread t = Thread.currentThread(); setBlocker(t, blocker); + aboutToPark(t); unsafe.park(true, deadline); setBlocker(t, null); } @@ -312,6 +321,7 @@ * for example, the interrupt status of the thread upon return. */ public static void park() { + aboutToPark(Thread.currentThread()); unsafe.park(false, 0L); } @@ -345,8 +355,10 @@ * @param nanos the maximum number of nanoseconds to wait */ public static void parkNanos(long nanos) { - if (nanos > 0) + if (nanos > 0) { + aboutToPark(Thread.currentThread()); unsafe.park(false, nanos); + } } /** @@ -380,6 +392,7 @@ * to wait until */ public static void parkUntil(long deadline) { + aboutToPark(Thread.currentThread()); unsafe.park(true, deadline); } } and then got the very interesting stack trace: at java.util.concurrent.locks.LockSupport.aboutToPark(LockSupport.java:194) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:187) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290) at java.util.concurrent.ConcurrentHashMap$Segment.scanAndLockForPut(ConcurrentHashMap.java:572) at java.util.concurrent.ConcurrentHashMap$Segment.put(ConcurrentHashMap.java:434) at java.util.concurrent.ConcurrentHashMap.putIfAbsent(ConcurrentHashMap.java:1152) at java.lang.ClassLoader.getClassLoadingLock(ClassLoader.java:467) at java.lang.ClassLoader.loadClass(ClassLoader.java:408) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:309) at java.lang.ClassLoader.loadClass(ClassLoader.java:361) at ParkLoops$1.run(ParkLoops.java:58) at java.lang.Thread.run(Thread.java:745) On the very first call to LockSupport.park, jdk tries to class-load LockSupport using the AppClassLoader. In the rare case that there's contention, it needs to block, recursively using LockSupport.park! Probably on the second try to class-load LockSupport it loaded immediately by being loaded directly by the bootstrap class loader, which explains why you can use LockSupport to class-load LockSupport! The recursive call to park via ReentrantLock eats any unpark that may arrive in the meantime! Explaining what we've seen perfectly! In the absence of any major surgery on the class-loading mechanism itself (move it to separate threads, like finalizers?!), a solution/workaround is clear. Any class that calls LockSupport.park needs to ensure LockSupport is eagerly classloaded, most obviously from clinit. Which we have already seen to work in previous experiments in this bug.
19-03-2015

David verified that this affects 7 fcs as well, changing affects version to 7
18-03-2015

If you think the sync might show something you could use a wait/notify based synchronizer - they don't use the same "park" object that j.u.c does.
18-03-2015

Dumb idea deleted...
18-03-2015

Oops. David - Of course you are right - I can't use a synchronizer that might call park!
18-03-2015

Not at all clear what you are doing with your phase sync but you now have races between the park/unparks related to the CDL usage and the stand-along park/unpark usage. A CyclicBarrier, or a Phaser, would at least perform the signal-and-wait action atomically. No wait, they wouldn't as they are also lock-free.
18-03-2015

Yeah, it's confusing, because there are three places we've tried to put in loops, and two are still left. Your results are in the same region as mine - about 1 CPU-hour to get one repro.
17-03-2015

Sorry Martin, I conflated this comment in the code: // Experiments show this is always the first iteration! with this output: iterations=36007 time=50322 but it should have been the unparks=0 output. So seems to be first iteration. The explicit loading/initialization experiments are interesting - though still baffling. I have no idea what to suggest next.
17-03-2015

As indicated in earlier comment, we discovered some new ways of suppressing the reproduction (killing my theory of a call to unpark modifying some global data structure) // LockSupport.unpark(Thread.currentThread()); // LockSupport.unpark(null); // Class.forName("java.util.concurrent.locks.LockSupport"); and other ways that certainly did not: // Class.forName("java.util.concurrent.locks.LockSupport", false, null); // Class.forName("sun.misc.Unsafe", false, null); // Class.forName("sun.misc.Unsafe"); // Even inlining LockSupport.<clinit> here has no effect. // try { // getUnsafe().objectFieldOffset // (java.lang.Thread.class.getDeclaredField("parkBlocker")); // } catch (Exception ex) { throw new Error(ex); } I'm having a hard time finding any static initialization that could be at fault here, so I currently think this is due to simple timing jitter, given how hard this is to reproduce. I'm also stuck for what to try next. At Google we can test a JDK for reproduction in a few minutes, so we can run experiments for you if you can come up with promising experiments to run.
17-03-2015

Latest version of repro program, containing results of latest experiments in comments: import static java.util.concurrent.TimeUnit.NANOSECONDS; import static java.util.concurrent.TimeUnit.SECONDS; import java.lang.management.ManagementFactory; import java.lang.management.ThreadInfo; import java.util.ArrayList; import java.util.concurrent.CountDownLatch; import java.util.concurrent.atomic.AtomicBoolean; import java.util.concurrent.atomic.AtomicLong; import java.util.concurrent.atomic.AtomicReferenceArray; import java.util.concurrent.locks.LockSupport; /** * Repro for lost unpark bug. * Try fiddling with nParkers, nUnparkers, iters and order of thread creation. * Repro seems easier on unloaded machine with multiple free CPUs. */ public final class ParkLoops { /** * Number of parker threads. Must be power of two. 4 seems most * effective for repro purposes, but failures with nParkers = 1 * have been seen. */ static final int nParkers = 4; /** Number of unparker threads. */ static final int nUnparkers = 4; /** * How many times to park/unpark. iters = 1 works well for repro * purposes; experiments with iters > 1 demonstrate (see unparks= * below) that it's only the first iteration that ever fails. */ static final int iters = 1; /** How long to wait before we are "sure" we're stuck, in seconds */ static final int timeout = 100; public static void main(String[] args) throws Exception { // True if we detected a lost unpark final AtomicBoolean lostUnpark = new AtomicBoolean(false); final ArrayList<Thread> threads = new ArrayList<>(); final AtomicReferenceArray<Thread> parkers = new AtomicReferenceArray<>(nParkers); final CountDownLatch done = new CountDownLatch(nParkers); final Runnable parker = new Runnable() { public void run() { final SimpleRandom rng = new SimpleRandom(); final Thread current = Thread.currentThread(); for (int k = iters, j; k > 0; k--) { do { j = rng.next() & (nParkers - 1); } while (!parkers.compareAndSet(j, null, current)); do { // handle spurious wakeups LockSupport.park(); } while (parkers.get(j) == current); if (lostUnpark.get()) { // Experiments show this is always the first iteration! System.err.println("unparks=" + (iters - k)); break; } } done.countDown(); }}; final Runnable unparker = new Runnable() { public void run() { final SimpleRandom rng = new SimpleRandom(); for (int n = 0; (n++ & 0xff) != 0 || done.getCount() > 0;) { int j = rng.next() & (nParkers - 1); Thread parker = parkers.get(j); if (parker != null && parkers.compareAndSet(j, parker, null)) { LockSupport.unpark(parker); } } }}; // Any of these early classloads of LockSupport kills the repro: // LockSupport.unpark(Thread.currentThread()); // LockSupport.unpark(null); // Class.forName("java.util.concurrent.locks.LockSupport"); // ... BUT ... the following have no effect: // (in fact, Unsafe is already loaded and used at this point!) // Class.forName("java.util.concurrent.locks.LockSupport", false, null); // Class.forName("sun.misc.Unsafe", false, null); // Class.forName("sun.misc.Unsafe"); // Even inlining LockSupport.<clinit> here has no effect. // try { // getUnsafe().objectFieldOffset // (java.lang.Thread.class.getDeclaredField("parkBlocker")); // } catch (Exception ex) { throw new Error(ex); } // Try varying order of thread creation. for (int i = 0; i < nParkers; i++) threads.add(startNewThread(parker)); // Adding Thread.sleep(10) here kills the repro. for (int i = 0; i < nUnparkers; i++) threads.add(startNewThread(unparker)); if (!done.await(timeout, SECONDS)) { dumpAllStacks(); System.err.printf("JvmArgs=%s\n", ManagementFactory.getRuntimeMXBean() .getInputArguments()); lostUnpark.set(true); for (Thread thread : threads) LockSupport.unpark(thread); for (Thread thread : threads) thread.join(); throw new AssertionError("lost unpark"); } } static Thread startNewThread(Runnable r) { Thread t = new Thread(r); t.setDaemon(true); t.start(); return t; } static void dumpAllStacks() { for (ThreadInfo threadInfo : ManagementFactory.getThreadMXBean() .dumpAllThreads(true, true)) { System.err.print(threadInfo); } } static sun.misc.Unsafe getUnsafe() { try { return sun.misc.Unsafe.getUnsafe(); } catch (SecurityException tryReflectionInstead) {} try { return java.security.AccessController.doPrivileged (new java.security.PrivilegedExceptionAction<sun.misc.Unsafe>() { public sun.misc.Unsafe run() throws Exception { Class<sun.misc.Unsafe> k = sun.misc.Unsafe.class; for (java.lang.reflect.Field f : k.getDeclaredFields()) { f.setAccessible(true); Object x = f.get(null); if (k.isInstance(x)) return k.cast(x); } throw new NoSuchFieldError("the Unsafe"); }}); } catch (java.security.PrivilegedActionException e) { throw new RuntimeException("Could not initialize intrinsics", e.getCause()); } } /** * An actually useful random number generator, but unsynchronized. * Basically same as java.util.Random. */ public static class SimpleRandom { private static final long multiplier = 0x5DEECE66DL; private static final long addend = 0xBL; private static final long mask = (1L << 48) - 1; static final AtomicLong seq = new AtomicLong(1); private long seed = System.nanoTime() + seq.getAndIncrement(); public int next() { long nextseed = (seed * multiplier + addend) & mask; seed = nextseed; return ((int)(nextseed >>> 17)) & 0x7FFFFFFF; } } }
17-03-2015

David, you say: "but it doesn't support the "first park" theory", which I don't understand. The only value we've ever seen for unparks is 0, and we've only ever seen it with the first set of threads started within one VM, both of which support the "first park" theory.
17-03-2015

I finally got this to reproduce on 7u80 but it doesn't support the "first park" theory: > bash runParkLoops.sh "Thread-7" Id=15 RUNNABLE at ParkLoops$2.run(ParkLoops.java:71) at java.lang.Thread.run(Thread.java:745) "Thread-6" Id=14 RUNNABLE at ParkLoops$2.run(ParkLoops.java:71) at java.lang.Thread.run(Thread.java:745) "Thread-5" Id=13 RUNNABLE at ParkLoops$2.run(ParkLoops.java:71) at java.lang.Thread.run(Thread.java:745) "Thread-4" Id=12 RUNNABLE at ParkLoops$2.run(ParkLoops.java:71) at java.lang.Thread.run(Thread.java:745) "Thread-0" Id=8 WAITING at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:315) at ParkLoops$1.run(ParkLoops.java:52) at java.lang.Thread.run(Thread.java:745) "Signal Dispatcher" Id=4 RUNNABLE "Finalizer" Id=3 WAITING on java.lang.ref.ReferenceQueue$Lock@13cca0e at java.lang.Object.wait(Native Method) - waiting on java.lang.ref.ReferenceQueue$Lock@13cca0e at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209) "Reference Handler" Id=2 WAITING on java.lang.ref.Reference$Lock@1f5910e at java.lang.Object.wait(Native Method) - waiting on java.lang.ref.Reference$Lock@1f5910e at java.lang.Object.wait(Object.java:503) at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133) "main" Id=1 RUNNABLE at sun.management.ThreadImpl.dumpThreads0(Native Method) at sun.management.ThreadImpl.dumpAllThreads(ThreadImpl.java:446) at ParkLoops.dumpAllStacks(ParkLoops.java:106) at ParkLoops.main(ParkLoops.java:86) JvmArgs=[] unparks=0 Exception in thread "main" java.lang.AssertionError: lost unpark at ParkLoops.main(ParkLoops.java:93) iterations=36007 time=50322
16-03-2015

Can Christian confirm that this problem has been seen with 7u80?
16-03-2015

There is no lazy initialization in this area that would cause the first unpark/park to be different. When a Thread is started and its JavaThread created (and started) the Parker object is also allocated. Both park/unpark involve heavy synchronization (pthread_mutex locking). If there is any lazy initialization it may be in the pthreads library rather than in the VM. ??
16-03-2015

More notes: fastdebug jdk7 fails in the same way as product jdk7 Even with our improved repro recipe, I cannot repro with jdk8. We do see cases commonly where TWO threads are stuck in park(), i.e. there are two lost unpark()s. I imagine we have some kind of initial setup that causes a lost unpark, and that may cause lost unparks for more than one thread.
13-03-2015

Here's a shell script to drive the Java program without any Google test machinery. I succeeded in getting pretty fast reproductions with Oracle jdk7u76. David, I'm hoping you can get the same result on an ordinary modern Linux box. Exception in thread "main" java.lang.AssertionError: lost unpark at ParkLoops.main(ParkLoops.java:93) iterations=10033 time=668 ./ParkLoopsLoops 939.46s user 199.24s system 148% cpu 12:48.94 total Exception in thread "main" java.lang.AssertionError: lost unpark at ParkLoops.main(ParkLoops.java:93) iterations=15251 time=1049 ./ParkLoopsLoops 1207.12s user 349.35s system 135% cpu 19:10.14 total #!/bin/bash set -eu javac ParkLoops.java printf -v start "%(%s)T" -1 # Grab elapsed seconds; a bash-4-ism howlong=700000 # Experiment duration, in seconds iterations=0 while printf -v current "%(%s)T" -1 && ((++iterations)) && ((current - start < howlong)); do java ParkLoops || { echo iterations=$iterations time=$((current - start)); exit $?; } done
13-03-2015

import static java.util.concurrent.TimeUnit.NANOSECONDS; import static java.util.concurrent.TimeUnit.SECONDS; import java.lang.management.ManagementFactory; import java.lang.management.ThreadInfo; import java.util.ArrayList; import java.util.concurrent.CountDownLatch; import java.util.concurrent.atomic.AtomicBoolean; import java.util.concurrent.atomic.AtomicLong; import java.util.concurrent.atomic.AtomicReferenceArray; import java.util.concurrent.locks.LockSupport; /** * Repro for lost unpark bug. * Try fiddling with nParkers, nUnparkers, iters and order of thread creation. * Repro seems easier on unloaded machine with multiple free CPUs. */ public final class ParkLoops { public static void main(String[] args) throws Exception { // Number of parker threads. Must be power of two. 4 seems // most effective for repro purposes, but failures with // nParkers = 1 have been seen. final int nParkers = 4; // Number of unparker threads. final int nUnparkers = 4; // How many times to park/unpark. iters = 1 works well for // repro purposes; experiments demonstrate (see unparks= // below) that it's only the first iteration that ever fails. final int iters = 1; // How long to wait before we are "sure" we're stuck, in seconds final int timeout = 100; // True if we detected a lost unpark final AtomicBoolean lostUnpark = new AtomicBoolean(false); final ArrayList<Thread> threads = new ArrayList<>(); final AtomicReferenceArray<Thread> parkers = new AtomicReferenceArray<>(nParkers); final CountDownLatch done = new CountDownLatch(nParkers); final Runnable parker = new Runnable() { public void run() { final SimpleRandom rng = new SimpleRandom(); final Thread current = Thread.currentThread(); for (int k = iters, j; k > 0; k--) { do { j = rng.next() & (nParkers - 1); } while (!parkers.compareAndSet(j, null, current)); do { // handle spurious wakeups LockSupport.park(); } while (parkers.get(j) == current); if (lostUnpark.get()) { // Experiments show this is always the first iteration! System.err.println("unparks=" + (iters - k)); break; } } done.countDown(); }}; final Runnable unparker = new Runnable() { public void run() { final SimpleRandom rng = new SimpleRandom(); for (int n = 0; (n++ & 0xff) != 0 || done.getCount() > 0;) { int j = rng.next() & (nParkers - 1); Thread parker = parkers.get(j); if (parker != null && parkers.compareAndSet(j, parker, null)) { LockSupport.unpark(parker); } } }}; // Does an initial unpark/park kill the repro? // Uncomment next 2 lines to show that "Why yes, yes it does." // LockSupport.unpark(Thread.currentThread()); // LockSupport.park(); // Try varying order of thread creation. for (int i = 0; i < nParkers; i++) threads.add(startNewThread(parker)); // Adding Thread.sleep(10) here kills the repro. for (int i = 0; i < nUnparkers; i++) threads.add(startNewThread(unparker)); if (!done.await(timeout, SECONDS)) { dumpAllStacks(); System.err.printf("JvmArgs=%s\n", ManagementFactory.getRuntimeMXBean() .getInputArguments()); lostUnpark.set(true); for (Thread thread : threads) LockSupport.unpark(thread); for (Thread thread : threads) thread.join(); throw new AssertionError("lost unpark"); } } static Thread startNewThread(Runnable r) { Thread t = new Thread(r); t.setDaemon(true); t.start(); return t; } static void dumpAllStacks() { for (ThreadInfo threadInfo : ManagementFactory.getThreadMXBean() .dumpAllThreads(true, true)) { System.err.print(threadInfo); } } /** * An actually useful random number generator, but unsynchronized. * Basically same as java.util.Random. */ public static class SimpleRandom { private static final long multiplier = 0x5DEECE66DL; private static final long addend = 0xBL; private static final long mask = (1L << 48) - 1; static final AtomicLong seq = new AtomicLong(1); private long seed = System.nanoTime() + seq.getAndIncrement(); public int next() { long nextseed = (seed * multiplier + addend) & mask; seed = nextseed; return ((int)(nextseed >>> 17)) & 0x7FFFFFFF; } } }
13-03-2015

Here at Google we are getting better at repro-ing this, and we have a workaround. If you do: LockSupport.unpark(Thread.currentThread()); LockSupport.park(); as the very first thing in main before starting any threads, the problem goes away, consistent with my theory of the "first unpark". Instrumenting the repro shows that it's only ever the first park in a thread that misses its matching unpark. (Perhaps the JVM should be doing something unconditionally on startup instead of lazily on the first park or unpark??) We have observed the problem with -Xint and -Xint -XX:+UseMembar and with no jvm flags at all. Rates of repro is highly dependent on system load (probably need a few free cpus to tickle the race) but we see around 1 repro/CPU-hour.
13-03-2015

I checked the Parker code in 7u and 8/9 and it is the same. I can't think of any startup issue that might cause this.
13-03-2015

Progress is being made. I'm finding it easier to reproduce this with smaller iters and more process invocations. I have a new hypothesis - this only happens during startup, perhaps with the very first park/unpark events. That would explain why -Xint makes no difference - nothing has been compiled yet! Repeating the test code in-process does not help, but starting many small java processes does. This successfully reproduces the problem, using "only" half a day of CPU per repro on average. In the variant below, I start 4 threads, collected 75 failures, but observed distributions of stuck-in-park of Thread-0: 30 Thread-2: 31 Thread-4: 13 Thread-6: 1 which is consistent with the theory of only the very first park or unpark being problematic. import static java.util.concurrent.TimeUnit.NANOSECONDS; import static java.util.concurrent.TimeUnit.SECONDS; import java.lang.management.ManagementFactory; import java.lang.management.ThreadInfo; import java.util.concurrent.CountDownLatch; import java.util.concurrent.atomic.AtomicLong; import java.util.concurrent.atomic.AtomicReferenceArray; import java.util.concurrent.locks.LockSupport; public final class ParkLoops { public static void main(String[] args) throws Exception { final int nParkers = 4; // must be power of two final int iters = 10_000; final int timeout = 3500; // in seconds final AtomicReferenceArray<Thread> parkers = new AtomicReferenceArray<>(nParkers); //final CountDownLatch unparkersStarted = new CountDownLatch(nParkers); final CountDownLatch done = new CountDownLatch(nParkers); final Runnable parker = new Runnable() { public void run() { final SimpleRandom rng = new SimpleRandom(); final Thread current = Thread.currentThread(); for (int k = iters, j; k > 0; k--) { do { j = rng.next() & (nParkers - 1); } while (!parkers.compareAndSet(j, null, current)); do { // handle spurious wakeups LockSupport.park(); } while (parkers.get(j) == current); } done.countDown(); }}; final Runnable unparker = new Runnable() { public void run() { //unparkersStarted.countDown(); final SimpleRandom rng = new SimpleRandom(); for (int n = 0; (n++ & 0xff) != 0 || done.getCount() > 0;) { int j = rng.next() & (nParkers - 1); Thread parker = parkers.get(j); if (parker != null && parkers.compareAndSet(j, parker, null)) { LockSupport.unpark(parker); } } }}; for (int i = 0; i < nParkers; i++) { startNewThread(parker); startNewThread(unparker); } // for (int i = 0; i < nParkers; i++) // new Thread(unparker).start(); // if (!unparkersStarted.await(10, SECONDS)) // throw new AssertionError("unparkers!"); // for (int i = 0; i < nParkers; i++) // new Thread(parker).start(); if (!done.await(timeout, SECONDS)) { dumpAllStacks(); throw new AssertionError("lost unpark"); } } static Thread startNewThread(Runnable r) { Thread t = new Thread(r); t.setDaemon(true); t.start(); return t; } static void dumpAllStacks() { for (ThreadInfo threadInfo : ManagementFactory.getThreadMXBean() .dumpAllThreads(true, true)) { System.err.print(threadInfo); } } /** * An actually useful random number generator, but unsynchronized. * Basically same as java.util.Random. */ public static class SimpleRandom { private static final long multiplier = 0x5DEECE66DL; private static final long addend = 0xBL; private static final long mask = (1L << 48) - 1; static final AtomicLong seq = new AtomicLong(1); private long seed = System.nanoTime() + seq.getAndIncrement(); public int next() { long nextseed = (seed * multiplier + addend) & mask; seed = nextseed; return ((int)(nextseed >>> 17)) & 0x7FFFFFFF; } } }
12-03-2015

I'm continuing to run various experiments. It is indeed not obvious that the test does not have a race, but at a low level this is how all blocking synchronizers work - publish their thread object to a known location, call park, then wait for another thread to claim the thread object and call unpark. Even though we have repro via -Xint, I only observe failures near JVM startup. I still don't have a reliable repro that doesn't consume days of CPU.
11-03-2015

Ergonomics will likely select server by default. The -Xint result is surprising and does let C1/C2 off the hook. This will be incredibly difficult to track down. I can't convince myself that the test itself doesn't have any races. Can you try -Xint with -XX:+UseMembar ? The result is only interesting if we still see failures.
11-03-2015

haven't tried -server or -client yet; probably defaults to -server on 32-bit New surprising news: Successfully reproduced with openjdk7 64-bit -Xint with the frequency 4/100000 I was suspicious my -Xint flag got lost in the execution maze somewhere, but timings are noticeably slower. Does this really mean both c1 and c2 compilers are innocent?
10-03-2015

With -server or -client?
10-03-2015

Successfully reproduced with openjdk7 32-bit with the usual frequency 3/100000
10-03-2015

I agree with your plan of trying different variations. Which is more or less what we've been doing. I'll throw some more testing resources at this problem here at Google, but progress will be slow because of off-peak usage politeness etc. So far, only tested 64-bit, so no -client.
09-03-2015

I agree this seems likely to be a compiler issue, but switching out the libraries might help narrow down which code is problematic. The low-level park/unpark code is static so not affected by startup/jit etc. More likely some OSR loop replacement or something like that. So normal strategy for things like this: - does it happen with -Xint - does it happen with -client - does it happen with -server - does it happen with 32-bit and 64-bit? Then when running with compilers try disabling compiler features eg -XX:-UseOnStackReplacement I'll try to get some cycles to test locally
09-03-2015

Switching the j.u.c. implementations doesn't seem promising because the code here is so low-level. We're just CAS-ing array elements and parking/unparking, not even setting a "blocker object". jdk8 seems likely to be unaffected, but it's very uncertain because this bug is so very hard to repro. Any timing change could make it non-reproducible. Also, the fact that something during the first few seconds of execution is tickling this bug points the finger at hotspot, not the libraries.
09-03-2015

Martin: have you tried running the JDK8 j.u.c classes on the 7u80 VM ? If this only affects 7u80 then I'll need to flag it for 7u80, not for 8u or 9. Thanks
09-03-2015