FULL PRODUCT VERSION :
openjdk version "1.8.0_102"
ADDITIONAL OS VERSION INFORMATION :
Linux 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux
A DESCRIPTION OF THE PROBLEM :
UNIXProcess$ProcessPipeOutputStream.processExited and BufferedInputStream.read are both synchronized. This means a read which is blocking when a process exits causes the information that it has exited to not get through, which can lead to deadlocks. This doesn't normally happen because after most processes exit, the pipes for their stderr and stdout have no writers so the read returns immediately, but sometimes processes exit while other processes still have those pipes open for writing.
http://mail.openjdk.java.net/pipermail/core-libs-dev/2016-March/039813.html reports the same issue, and I can't find an existing bug filed as a result of that discussion.
https://github.com/bazelbuild/bazel/issues/1250 is a bug in an actual program which might be caused by this.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Run the example program.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
read does something consistent after the process exits with another writer still running on the pipe: either returns -1 immediately or continues reading from the pipe.
ACTUAL -
read sometimes returns (some) data from the escaped grandchild process (or never returns if that grandchild never writes anything), and sometimes returns -1, depending on the exact timing involved.
ERROR MESSAGES/STACK TRACES THAT OCCUR :
Traces from the relevant threads in a full application:
Command-Accumulator-Thread-81:
[1] java.io.FileInputStream.readBytes (native method)
[2] java.io.FileInputStream.read (FileInputStream.java:255)
[3] java.io.BufferedInputStream.fill (BufferedInputStream.java:246)
[4] java.io.BufferedInputStream.read1 (BufferedInputStream.java:286)
[5] java.io.BufferedInputStream.read (BufferedInputStream.java:345)
[6] java.io.FilterInputStream.read (FilterInputStream.java:107)
[7] com.google.devtools.build.lib.shell.InputStreamSink$CopySink.run (InputStreamSink.java:98)
[8] com.google.devtools.build.lib.shell.Consumers$ClosingSink.run (Consumers.java:316)
[9] java.util.concurrent.Executors$RunnableAdapter.call (Executors.java:511)
[10] java.util.concurrent.FutureTask.run (FutureTask.java:266)
[11] java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1,142)
[12] java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:617)
[13] java.lang.Thread.run (Thread.java:745)
process reaper:
[1] java.lang.UNIXProcess$ProcessPipeInputStream.processExited (UNIXProcess.java:528)
[2] java.lang.UNIXProcess.lambda$initStreams$3 (UNIXProcess.java:299)
[3] java.lang.UNIXProcess$$Lambda$8.run (null)
[4] java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1,142)
[5] java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:617)
[6] java.lang.Thread.run (Thread.java:745)
Traces from my simplified test case (included):
^\2016-11-10 02:22:03
Full thread dump OpenJDK 64-Bit Server VM (25.102-b14 mixed mode):
"process reaper" #9 daemon prio=10 os_prio=0 tid=0x00007ffa84258800 nid=0x7e34 waiting for monitor entry [0x00007ffa6de5d000]
java.lang.Thread.State: BLOCKED (on object monitor)
at java.lang.UNIXProcess$ProcessPipeInputStream.processExited(UNIXProcess.java:528)
- waiting to lock <0x000000076e2131e0> (a java.lang.UNIXProcess$ProcessPipeInputStream)
at java.lang.UNIXProcess.lambda$initStreams$3(UNIXProcess.java:299)
at java.lang.UNIXProcess$$Lambda$7/381259350.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
"Service Thread" #8 daemon prio=9 os_prio=0 tid=0x00007ffa840c1800 nid=0x7e31 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C1 CompilerThread2" #7 daemon prio=9 os_prio=0 tid=0x00007ffa840b2000 nid=0x7e30 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread1" #6 daemon prio=9 os_prio=0 tid=0x00007ffa840b0000 nid=0x7e2f waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread0" #5 daemon prio=9 os_prio=0 tid=0x00007ffa840ad000 nid=0x7e2e waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Signal Dispatcher" #4 daemon prio=9 os_prio=0 tid=0x00007ffa840ab000 nid=0x7e2d waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Finalizer" #3 daemon prio=8 os_prio=0 tid=0x00007ffa84083800 nid=0x7e2c in Object.wait() [0x00007ffa6e6ed000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x000000076df08ee0> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143)
- locked <0x000000076df08ee0> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:164)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)
"Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x00007ffa8407f000 nid=0x7e2b in Object.wait() [0x00007ffa6e7ee000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x000000076df06b50> (a java.lang.ref.Reference$Lock)
at java.lang.Object.wait(Object.java:502)
at java.lang.ref.Reference.tryHandlePending(Reference.java:191)
- locked <0x000000076df06b50> (a java.lang.ref.Reference$Lock)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153)
"main" #1 prio=5 os_prio=0 tid=0x00007ffa84009800 nid=0x7e25 runnable [0x00007ffa8c5f0000]
java.lang.Thread.State: RUNNABLE
at java.io.FileInputStream.readBytes(Native Method)
at java.io.FileInputStream.read(FileInputStream.java:255)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
- locked <0x000000076e2131e0> (a java.lang.UNIXProcess$ProcessPipeInputStream)
at Deadlock.main(Deadlock.java:5)
"VM Thread" os_prio=0 tid=0x00007ffa84077000 nid=0x7e2a runnable
"GC task thread#0 (ParallelGC)" os_prio=0 tid=0x00007ffa8401f000 nid=0x7e26 runnable
"GC task thread#1 (ParallelGC)" os_prio=0 tid=0x00007ffa84020800 nid=0x7e27 runnable
"GC task thread#2 (ParallelGC)" os_prio=0 tid=0x00007ffa84022800 nid=0x7e28 runnable
"GC task thread#3 (ParallelGC)" os_prio=0 tid=0x00007ffa84024000 nid=0x7e29 runnable
"VM Periodic Task Thread" os_prio=0 tid=0x00007ffa840c3800 nid=0x7e32 waiting on condition
JNI global references: 325
Heap
PSYoungGen total 74240K, used 3840K [0x000000076df00000, 0x0000000773180000, 0x00000007c0000000)
eden space 64000K, 6% used [0x000000076df00000,0x000000076e2c0058,0x0000000771d80000)
from space 10240K, 0% used [0x0000000772780000,0x0000000772780000,0x0000000773180000)
to space 10240K, 0% used [0x0000000771d80000,0x0000000771d80000,0x0000000772780000)
ParOldGen total 169472K, used 0K [0x00000006c9c00000, 0x00000006d4180000, 0x000000076df00000)
object space 169472K, 0% used [0x00000006c9c00000,0x00000006c9c00000,0x00000006d4180000)
Metaspace used 3689K, capacity 4670K, committed 4864K, reserved 1056768K
class space used 412K, capacity 468K, committed 512K, reserved 1048576K
REPRODUCIBILITY :
This bug can be reproduced occasionally.
---------- BEGIN SOURCE ----------
Simple test case. Run with an argument to resolve the race one way, and with no arguments to get it the other way:
class Deadlock {
public static void main(String[] args) throws java.io.IOException, InterruptedException {
Process process = Runtime.getRuntime().exec(new String[]{"/tmp/double_fork"});
if (args.length != 0) Thread.currentThread().sleep(2000);
System.out.println("read returns " + process.getInputStream().read());
}
}
C code for it to run (the Java equivalent would be way longer and I don't know how to write it off the top of my head). A binary compiled from this goes at /tmp/fork_off_child (or edit the string literal in the Java code to match):
#include <unistd.h>
#include <sys/select.h>
int main() {
// Give the Java code time to start a blocking read.
usleep(1000000);
// Now fork and then wait around forever in the child.
if (fork() == 0) {
select(0, NULL, NULL, NULL, NULL);
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
* Only call read after available() returns non-0 to avoid blocking. Have to do this in some kind of a loop with sleeps, and rely on Process.waitFor for immediate notification of the process exiting. That's way more complicated though.
* Don't start processes which let children with the stdout/stderr pipes escape without writing to them.