FULL PRODUCT VERSION :
JDK 7
Java(TM) SE Runtime Environment (build pap6470sr9-20150417_01(SR9))
IBM J9 VM (build 2.6, JRE 1.7.0 AIX ppc64-64 Compressed References 20150406_242981 (JIT enabled, AOT enabled)
J9VM - R26_Java726_SR9_20150406_1443_B242981
JIT - tr.r11_20150401_88894
GC - R26_Java726_SR9_20150406_1443_B242981_CMPRSS
J9CL - 20150406_242981)
JCL - 20150414_02 based on Oracle 7u79-b14
JDK 8
java version "1.8.0"
Java(TM) SE Runtime Environment (build pap6480sr1-20150417_01(SR1))
IBM J9 VM (build 2.8, JRE 1.8.0 AIX ppc64-64 Compressed References 20150410_243669 (JIT enabled, AOT enabled)
J9VM - R28_Java8_SR1_20150410_1531_B243669
JIT - tr.r14.java_20150402_88976.03
GC - R28_Java8_SR1_20150410_1531_B243669_CMPRSS
J9CL - 20150410_243669)
JCL - 20150413_01 based on Oracle jdk8u45-b13
ADDITIONAL OS VERSION INFORMATION :
All unix / linux based OS
Tested on AIX aix1vb11 1 6 00CE35064C00
but surely true for other *nix as well
EXTRA RELEVANT SYSTEM CONFIGURATION :
Eventhough we use AIX and IBM J9 for testing *nix, we develop under Windows 7 + Std. ORA java JDK 7 & 8 and can see that the Source of nio package is identical with resp. to the found bug.
A DESCRIPTION OF THE PROBLEM :
The problem lies occurs in
sun.nio.ch.NativeThreadSet.signalAndWait
using NativeThread.signal(th); to send a SIG*** to the Thread doing IO with NativeDispatcher read or write functions.
FileChannelImpl read and write operations are not (completely) synchronized with NativeThreadSet.signalAndWait.
Thus the status of read or write can be "before" the NativeReading / Writing step in one Thread while the interrupt signal (called via Close or Thread.interrupt in another thread) is already send. The SIG*** reaches the NativeThread before Read/Write and thus never interrupts the IO (Signals are not "stored")
The NativeThreadSet.signalAndWait call then advances to the while ... wait() section were it is stuck as it will not receive the notifyAll from NativeThreadSet.remove() (which is only called when read/write is finished)
Even more problematic: as while...wait() + catch InterruptException is used, the hanging thread cannot be interrupted itself! Other close / interrupt attemps fail as the AbstractInterruptibleChannel.closeLock is already taken.
For possibly indefinite IO read operations (waiting for data on Sockets, Pipes etc. which might not come) - this deadlock is really bad!
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
1. Create n Pipes in unix OS
2. Create n Threads using RandomAccessFiles(pipePath-i, "rw").getChannel -> FileChannels to read lines from pipes in loop
3. Interrupt Pipe-Read Threads or close Channels immediately before a new read step (e.g. immediately after thread.start or immediately after the "final" successful readline - do not send more data to this pipe then)
If necessary perform the 2 + 3 in several loops until deadlock occurs (several 10'000 steps might be necessary)
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
All threads should have been successfully interrupted and/or channels should have been closed
ACTUAL -
While thread.interrupt() works most of the time, it will fail (and not return) for some calls. The interrupting thread is then stuck and the Pipe-Thread continues to wait for new data (eventhough interrupt or channel.close have been called)
REPRODUCIBILITY :
This bug can be reproduced occasionally.
CUSTOMER SUBMITTED WORKAROUND :
Hint: not a productive / long-term workaround - but cannot sync NativeDispatcher.read/write + NativeThreadSet by myself
Change NativeThreadSet.signalAndWait as follows (-Xbootclasspath/p ...)
void signalAndWait() {
synchronized (this) {
int u = used;
int n = elts.length;
for (int i = 0; i < n; i++) {
long th = elts[i];
if (th == 0)
continue;
NativeThread.signal(th);
if (--u == 0)
break;
}
waitingToEmpty = true;
boolean interrupted = false;
while (used > 0) {
try {
wait(1000);
if(used > 0)
{
// System.err.println("Have to resend NativeThread signals ...");
u = used;
n = elts.length;
for (int i = 0; i < n; i++) {
long th = elts[i];
if (th == 0)
continue;
NativeThread.signal(th);
if (--u == 0)
break;
}
}
} catch (InterruptedException e) {
interrupted = true;
}
}
if (interrupted)
Thread.currentThread().interrupt();
}
}