United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
Bug ID: JDK-7101658 Backout 7082769 changes
JDK-7101658 : Backout 7082769 changes

Details
Type:
Bug
Submit Date:
2011-10-17
Status:
Closed
Updated Date:
2012-03-12
Project Name:
JDK
Resolved Date:
2011-11-09
Component:
core-libs
OS:
generic
Sub-Component:
java.io
CPU:
generic
Priority:
P3
Resolution:
Fixed
Affected Versions:
7
Fixed Versions:
7u2 (b10)

Related Reports
Backport:
Relates:
Relates:

Sub Tasks

Description
Changes for 7082769 can lead to file descriptor exhaustion on apps. Closing of IO streams that reference the same native file descriptor should also close out the streams that reference it. This apparently is similar behaviour to the underlying OS behaviour.

One such report is seen with the Hadoop hdfs project. It has code which creates RandomAccessFiles but never closes them. On JRE's without the 7082769 fix, this is not an issue since the first call to close the input/output stream associated with this FD will close out the underlying FD. With the 7082769 fix max file descriptors count can be reached since the randomAccessFile keeps reference to the underlying FD and keeps it open.

some code from Hadoop project : 

@Override // FSDatasetInterface
public synchronized InputStream getBlockInputStream(ExtendedBlock b,
    long seekOffset) throws IOException {
    File blockFile = getBlockFile(b);
    RandomAccessFile blockInFile = new RandomAccessFile(blockFile, "r");
    if (seekOffset > 0) {
        blockInFile.seek(seekOffset);
    }
    return new FileInputStream(blockInFile.getFD());
}


Due to this behavioural change, the fix should be backed out and we should look to see if the underlying issue can be addressed in a different way to avoid breaking apps that have worked in the past.

                                    

Comments
EVALUATION

back out 7082769 fix for time being.
                                     
2011-10-17
PUBLIC COMMENTS

From Eric Caspole at AMD:

Stack traces:

From hadoop 0.20.3:

2011-10-12 11:23:17,136 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: IOException in BlockReceiver constructor. Cause is
2011-10-12 11:23:17,137 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_6830850544126306450_1285 received exception java.io.FileNotFoundException: /disk5/0_20/hadoop/blocksBeingWritten/blk_6830850544126306450_1285.meta (Too many open files)
2011-10-12 11:23:17,139 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(127.0.0.1:50010, storageID=DS-577984473-10.234.222.239-50010-1317746110138, infoPort=50075, ipcPort=50020):DataXceiver
java.io.FileNotFoundException: /disk5/0_20/hadoop/blocksBeingWritten/blk_6830850544126306450_1285.meta (Too many open files)
        at java.io.RandomAccessFile.open(Native Method)
        at java.io.RandomAccessFile.<init>(RandomAccessFile.java:233)
        at org.apache.hadoop.hdfs.server.datanode.FSDataset.createBlockWriteStreams(FSDataset.java:979)
        at org.apache.hadoop.hdfs.server.datanode.FSDataset.writeToBlock(FSDataset.java:1314)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:99)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:259)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)
        at java.lang.Thread.run(Thread.java:722)



From hadoop 0.21:

2011-10-12 11:31:43,223 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-9134785909187168406_1208 src: /127.0.0.1:50238 dest: /127.0.0.1:50010
2011-10-12 11:31:43,225 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: IOException in BlockReceiver constructor. Cause is
java.io.IOException: Too many open files
        at java.io.UnixFileSystem.createFileExclusively(Native Method)
        at java.io.File.createNewFile(File.java:947)
        at org.apache.hadoop.hdfs.server.datanode.FSDataset.createTmpFile(FSDataset.java:825)
        at org.apache.hadoop.hdfs.server.datanode.FSDataset$FSVolume.createRbwFile(FSDataset.java:403)
        at org.apache.hadoop.hdfs.server.datanode.FSDataset.createRbw(FSDataset.java:1283)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:104)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.opWriteBlock(DataXceiver.java:258)
        at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.opWriteBlock(DataTransferProtocol.java:390)
        at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.processOp(DataTransferProtocol.java:331)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:111)
        at java.lang.Thread.run(Thread.java:722)
2011-10-12 11:31:43,231 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: checkDiskError: exception:
java.io.IOException: Too many open files
        at java.io.UnixFileSystem.createFileExclusively(Native Method)
        at java.io.File.createNewFile(File.java:947)
        at org.apache.hadoop.hdfs.server.datanode.FSDataset.createTmpFile(FSDataset.java:825)
        at org.apache.hadoop.hdfs.server.datanode.FSDataset$FSVolume.createRbwFile(FSDataset.java:403)
        at org.apache.hadoop.hdfs.server.datanode.FSDataset.createRbw(FSDataset.java:1283)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:104)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.opWriteBlock(DataXceiver.java:258)
        at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.opWriteBlock(DataTransferProtocol.java:390)
        at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.processOp(DataTransferProtocol.java:331)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:111)
        at java.lang.Thread.run(Thread.java:722)
2011-10-12 11:31:43,360 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-9134785909187168406_1208 received exception java.io.IOException: Too many open files
                                     
2011-10-27
PUBLIC COMMENTS

From Eric Caspole at AMD:

You can reproduce this problem by setting up one machine as a hadoop pseudo-cluster as shown at

 http://hadoop.apache.org/common/docs/stable/single_node_setup.html#PseudoDistributed

Using hadoop v0.20.3 and running teragen/terasort which is an included example/demo:

First do
 time ./bin/hadoop jar hadoop-examples-0.20.3-SNAPSHOT.jar teragen 175304788 tera-input-16g

then do
 time ./bin/hadoop jar hadoop-examples-0.20.3-SNAPSHOT.jar terasort tera-input-16g tera-out-16g

On my system the datanode will error out with too many open FDs when the terasort is about 70% complete. I guess it is possible there is a bug in the hadoop code, but that 0.20 is the standard version that most of the commercial distros are based on.
                                     
2011-10-27



Hardware and Software, Engineered to Work Together