JDK-7101658 : Backout 7082769 changes
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.io
  • Affected Version: 7
  • Priority: P3
  • Status: Closed
  • Resolution: Fixed
  • OS: generic
  • CPU: generic
  • Submitted: 2011-10-17
  • Updated: 2012-03-12
  • Resolved: 2011-11-09
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 6 JDK 7
6u30Fixed 7u2 b10Fixed
Related Reports
Relates :  
Relates :  
Description
Changes for 7082769 can lead to file descriptor exhaustion on apps. Closing of IO streams that reference the same native file descriptor should also close out the streams that reference it. This apparently is similar behaviour to the underlying OS behaviour.

One such report is seen with the Hadoop hdfs project. It has code which creates RandomAccessFiles but never closes them. On JRE's without the 7082769 fix, this is not an issue since the first call to close the input/output stream associated with this FD will close out the underlying FD. With the 7082769 fix max file descriptors count can be reached since the randomAccessFile keeps reference to the underlying FD and keeps it open.

some code from Hadoop project : 

@Override // FSDatasetInterface
public synchronized InputStream getBlockInputStream(ExtendedBlock b,
    long seekOffset) throws IOException {
    File blockFile = getBlockFile(b);
    RandomAccessFile blockInFile = new RandomAccessFile(blockFile, "r");
    if (seekOffset > 0) {
        blockInFile.seek(seekOffset);
    }
    return new FileInputStream(blockInFile.getFD());
}


Due to this behavioural change, the fix should be backed out and we should look to see if the underlying issue can be addressed in a different way to avoid breaking apps that have worked in the past.

Comments
PUBLIC COMMENTS From Eric Caspole at AMD: Stack traces: From hadoop 0.20.3: 2011-10-12 11:23:17,136 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: IOException in BlockReceiver constructor. Cause is 2011-10-12 11:23:17,137 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_6830850544126306450_1285 received exception java.io.FileNotFoundException: /disk5/0_20/hadoop/blocksBeingWritten/blk_6830850544126306450_1285.meta (Too many open files) 2011-10-12 11:23:17,139 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(127.0.0.1:50010, storageID=DS-577984473-10.234.222.239-50010-1317746110138, infoPort=50075, ipcPort=50020):DataXceiver java.io.FileNotFoundException: /disk5/0_20/hadoop/blocksBeingWritten/blk_6830850544126306450_1285.meta (Too many open files) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.<init>(RandomAccessFile.java:233) at org.apache.hadoop.hdfs.server.datanode.FSDataset.createBlockWriteStreams(FSDataset.java:979) at org.apache.hadoop.hdfs.server.datanode.FSDataset.writeToBlock(FSDataset.java:1314) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:99) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:259) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103) at java.lang.Thread.run(Thread.java:722) From hadoop 0.21: 2011-10-12 11:31:43,223 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-9134785909187168406_1208 src: /127.0.0.1:50238 dest: /127.0.0.1:50010 2011-10-12 11:31:43,225 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: IOException in BlockReceiver constructor. Cause is java.io.IOException: Too many open files at java.io.UnixFileSystem.createFileExclusively(Native Method) at java.io.File.createNewFile(File.java:947) at org.apache.hadoop.hdfs.server.datanode.FSDataset.createTmpFile(FSDataset.java:825) at org.apache.hadoop.hdfs.server.datanode.FSDataset$FSVolume.createRbwFile(FSDataset.java:403) at org.apache.hadoop.hdfs.server.datanode.FSDataset.createRbw(FSDataset.java:1283) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:104) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.opWriteBlock(DataXceiver.java:258) at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.opWriteBlock(DataTransferProtocol.java:390) at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.processOp(DataTransferProtocol.java:331) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:111) at java.lang.Thread.run(Thread.java:722) 2011-10-12 11:31:43,231 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: checkDiskError: exception: java.io.IOException: Too many open files at java.io.UnixFileSystem.createFileExclusively(Native Method) at java.io.File.createNewFile(File.java:947) at org.apache.hadoop.hdfs.server.datanode.FSDataset.createTmpFile(FSDataset.java:825) at org.apache.hadoop.hdfs.server.datanode.FSDataset$FSVolume.createRbwFile(FSDataset.java:403) at org.apache.hadoop.hdfs.server.datanode.FSDataset.createRbw(FSDataset.java:1283) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:104) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.opWriteBlock(DataXceiver.java:258) at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.opWriteBlock(DataTransferProtocol.java:390) at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.processOp(DataTransferProtocol.java:331) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:111) at java.lang.Thread.run(Thread.java:722) 2011-10-12 11:31:43,360 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-9134785909187168406_1208 received exception java.io.IOException: Too many open files
2011-10-27

PUBLIC COMMENTS From Eric Caspole at AMD: You can reproduce this problem by setting up one machine as a hadoop pseudo-cluster as shown at http://hadoop.apache.org/common/docs/stable/single_node_setup.html#PseudoDistributed Using hadoop v0.20.3 and running teragen/terasort which is an included example/demo: First do time ./bin/hadoop jar hadoop-examples-0.20.3-SNAPSHOT.jar teragen 175304788 tera-input-16g then do time ./bin/hadoop jar hadoop-examples-0.20.3-SNAPSHOT.jar terasort tera-input-16g tera-out-16g On my system the datanode will error out with too many open FDs when the terasort is about 70% complete. I guess it is possible there is a bug in the hadoop code, but that 0.20 is the standard version that most of the commercial distros are based on.
2011-10-27

EVALUATION back out 7082769 fix for time being.
2011-10-17