JDK-8132693 : (fc) FileChannel close may block waiting for I/O operation that never completes
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.nio
  • Affected Version: 8u45
  • Priority: P4
  • Status: Open
  • Resolution: Unresolved
  • OS: linux
  • CPU: x86
  • Submitted: 2015-06-12
  • Updated: 2018-09-11
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdUnresolved
Description
FULL PRODUCT VERSION :
JDK 7
Java(TM) SE Runtime Environment (build pap6470sr9-20150417_01(SR9))
IBM J9 VM (build 2.6, JRE 1.7.0 AIX ppc64-64 Compressed References 20150406_242981 (JIT enabled, AOT enabled)
J9VM - R26_Java726_SR9_20150406_1443_B242981
JIT  - tr.r11_20150401_88894
GC   - R26_Java726_SR9_20150406_1443_B242981_CMPRSS
J9CL - 20150406_242981)
JCL - 20150414_02 based on Oracle 7u79-b14

JDK 8
java version "1.8.0"
Java(TM) SE Runtime Environment (build pap6480sr1-20150417_01(SR1))
IBM J9 VM (build 2.8, JRE 1.8.0 AIX ppc64-64 Compressed References 20150410_243669 (JIT enabled, AOT enabled)
J9VM - R28_Java8_SR1_20150410_1531_B243669
JIT  - tr.r14.java_20150402_88976.03
GC   - R28_Java8_SR1_20150410_1531_B243669_CMPRSS
J9CL - 20150410_243669)
JCL - 20150413_01 based on Oracle jdk8u45-b13


ADDITIONAL OS VERSION INFORMATION :
All unix / linux based OS

Tested on AIX aix1vb11 1 6 00CE35064C00
but surely true for other *nix as well

EXTRA RELEVANT SYSTEM CONFIGURATION :
Eventhough we use AIX and IBM J9 for testing *nix, we develop under Windows 7 + Std. ORA java JDK 7 & 8 and can see that the Source of nio package is identical with resp. to the found bug.

A DESCRIPTION OF THE PROBLEM :
The problem lies occurs in 

sun.nio.ch.NativeThreadSet.signalAndWait

using NativeThread.signal(th); to send a SIG*** to the Thread doing IO with NativeDispatcher read or write functions.

FileChannelImpl read and write operations are not (completely) synchronized with NativeThreadSet.signalAndWait.

Thus the status of read or write can be "before" the NativeReading / Writing step in one Thread while the interrupt signal (called via Close or Thread.interrupt in another thread) is already send. The SIG*** reaches the NativeThread before Read/Write and thus never interrupts the IO (Signals are not "stored")

The NativeThreadSet.signalAndWait call then advances to the while ... wait() section were it is stuck as it will not receive the notifyAll from NativeThreadSet.remove() (which is only called when read/write is finished)

Even more problematic: as while...wait() + catch InterruptException is used, the hanging thread cannot be interrupted itself! Other close / interrupt attemps fail as the AbstractInterruptibleChannel.closeLock is already taken.

For possibly indefinite IO read operations (waiting for data on Sockets, Pipes etc. which might not come) - this deadlock is really bad!

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
1. Create n Pipes in unix OS
2. Create n Threads using RandomAccessFiles(pipePath-i, "rw").getChannel -> FileChannels to read lines from pipes in loop

3. Interrupt Pipe-Read Threads or close Channels immediately before a new read step (e.g. immediately after thread.start or immediately after the "final" successful readline - do not send more data to this pipe then)

If necessary perform the 2 + 3 in several loops until deadlock occurs (several 10'000 steps might be necessary)

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
All threads should have been successfully interrupted and/or channels should have been closed
ACTUAL -
While thread.interrupt() works most of the time, it will fail (and not return) for some calls. The interrupting thread is then stuck and the Pipe-Thread continues to wait for new data (eventhough interrupt or channel.close have been called)

REPRODUCIBILITY :
This bug can be reproduced occasionally.

CUSTOMER SUBMITTED WORKAROUND :
Hint: not a productive / long-term workaround - but cannot sync NativeDispatcher.read/write + NativeThreadSet by myself 

Change NativeThreadSet.signalAndWait as follows (-Xbootclasspath/p ...)

    void signalAndWait() {
        synchronized (this) {
            int u = used;
            int n = elts.length;
            for (int i = 0; i < n; i++) {
                long th = elts[i];
                if (th == 0)
                    continue;
                NativeThread.signal(th);
                if (--u == 0)
                    break;
            }
            waitingToEmpty = true;
            boolean interrupted = false;
            while (used > 0) {
                try {
                    wait(1000);
                    if(used > 0)
                    {
                    	// System.err.println("Have to resend NativeThread signals ...");
                        u = used;
                        n = elts.length;
	                    for (int i = 0; i < n; i++) {
	                        long th = elts[i];
	                        if (th == 0)
	                            continue;
	                        NativeThread.signal(th);
	                        if (--u == 0)
	                            break;
	                    }
                    }
                } catch (InterruptedException e) {
                    interrupted = true;
                }
            }
            if (interrupted)
                Thread.currentThread().interrupt();
        }
    }


Comments
I've changed the bug summary to reflect what this issue is really about. Unlike, the network channels, FileChannel does not do a preClose/dup2 and so implCloseChannel may block waiting for I/O operations to complete. We are overhauling the async close/interrupt implementation in JDK 11 and it may be a good time to look at this one (at least for Linux and macOS, maybe not Windows as there doesn't seem to be a good solution for that platform).
28-02-2018

Received following update from the submitter including confirmation of fix with JDK 8u45: ================================================================== Closing this issue based upon submitter's response: =============================================== On 6/22/2015 12:53 PM, ........... wrote: > Dear .........., > > I can confirm that the issue seems fixed with 8u45! > > XXXXXXX:/usr/java8/jre/bin >java -version > java version "1.8.0" > Java(TM) SE Runtime Environment (build pap3280sr1-20150417_01(SR1)) > IBM J9 VM (build 2.8, JRE 1.8.0 AIX ppc-32 20150410_243669 (JIT enabled, AOT enabled) > J9VM - R28_Java8_SR1_20150410_1531_B243669 > JIT - tr.r14.java_20150402_88976.03 > GC - R28_Java8_SR1_20150410_1531_B243669 > J9CL - 20150410_243669) > JCL - 20150413_01 based on Oracle jdk8u45-b13 > > If I inspect the source code of 8u45 (or 8u40 on grepcode) I can see, that a resending of the NativeThread signal in NativeThreadSet.signalAndWait is done in JDK 8 each 50ms (similar what I proposed in my bug report for 7uXX - each 1000ms). I just had the impression that I had seen the same code of NativeThreadSet.java from 7uXX also for 8uXX. Maybe in a former version of 8 > > Concerning java7 - we "only" have 7u79 installed on our *nix machines (and I may have no rights to upgrade - centrally administred) but the problem still occurs here. I had a (decompiled) look into 7u80 I see that the infinite wait is still present here in NativeThreadSet.signalAndWait(). > > So I would propose a back port of the 8u45 version (NativeThreadSet.signalAndWait) to JDK 7. This should fix the problem IMHO > > Thanks and best regards > ................ =================================================
30-07-2015