FULL PRODUCT VERSION :
openjdk version "1.7.0-jdk7u4-b21"
OpenJDK Runtime Environment (build 1.7.0-jdk7u4-b21-20120427)
OpenJDK 64-Bit Server VM (build 23.0-b21, mixed mode)
For the reference, used:
java version "1.6.0_31"
Java(TM) SE Runtime Environment (build 1.6.0_31-b04-415-11M3635)
Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01-415, mixed mode)
ADDITIONAL OS VERSION INFORMATION :
Darwin Mistral-MacOSX.local 11.4.0 Darwin Kernel Version 11.4.0: Mon Apr 9 19:32:15 PDT 2012; root:xnu-1699.26.8~1/RELEASE_X86_64 x86_64
A DESCRIPTION OF THE PROBLEM :
I have some code that performs I/O by means of file channels to compute MD5
fingerprints of files. The code performs very well on Apple JDK 6 (90% of the
theoretical disk speed) and very bad on OpenJDK 7 (roughly ten times slower).
So far I've drilled down the problem to the behaviour of MapByteBuffer.load().
While on Apple JDK 6 it actually loads file contents in memory (and it's fast),
it seems to do nothing on OpenJDK 7. I suppose that data are only loaded
on-demand when the MD5 is computed and it's done in an inefficient way.
The self-contained attached code creates 20 files whose sizes range from 10MB
to 100MB and then loads them by means of MapByteBuffer. On Apple JDK 6 it
reports
Read 1020 MB, speed 61 MB/sec
while on OpenJDK 7u4 it reports
Read 1020 MB, speed 1861 MB/sec
But the latter is a fake measure, as a system monitor reports no read disk
activity (and the measure is unrealistic).
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Execute the attached test case.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Similar performance in JDK 6 and OpenJDK 7 is expected.
ACTUAL -
OpenJDK 7 performs differently, as it doesn't read data in memory.
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
package it.tidalwave.integritychecker.impl;
import java.util.Random;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.nio.ByteBuffer;
import static java.nio.channels.FileChannel.MapMode.READ_ONLY;
public class IoPerformanceTest
{
private final static double MEGA = 1024 * 1024;
private final static int MIN_FILE_SIZE = 10 * 1000 * 1000;
private final static int MAX_FILE_SIZE = 100 * 1000 * 1000;
private File testFileFolder;
public static void main (final String ... args)
throws Exception
{
final IoPerformanceTest test = new IoPerformanceTest();
test.createTestFiles();
test.test();
}
private void createTestFiles()
throws IOException
{
System.err.println("Creating test files...");
testFileFolder = new File(System.getProperty("java.io.tmpdir"));
testFileFolder.mkdirs();
final Random random = new Random(342345426536L);
for (int f = 0; f < 20; f++)
{
final File file = new File(testFileFolder, "testfile" + f);
System.err.println(">>>> creating " + file.getAbsolutePath());
int size = MIN_FILE_SIZE + random.nextInt(MAX_FILE_SIZE - MIN_FILE_SIZE);
final byte[] buffer = new byte[size];
random.nextBytes(buffer);
final FileOutputStream fos = new FileOutputStream(file);
fos.write(buffer);
fos.close();
}
}
public void test()
throws Exception
{
final long startTime = System.currentTimeMillis();
long size = 0;
for (int f = 0; f < 20; f++)
{
final File file = new File(testFileFolder, "testfile" + f).getAbsoluteFile();
final FileInputStream fis = new FileInputStream(file);
final ByteBuffer byteBuffer = nioRead(fis, (int)file.length());
fis.close();
size += file.length();
}
final long time = System.currentTimeMillis() - startTime;
System.err.printf("Read %d MB, speed %d MB/sec\n", (int)(size / MEGA), (int)(((size / MEGA) / (time / 1000.0))));
}
private ByteBuffer nioRead (final FileInputStream fis, final int length)
throws IOException
{
return fis.getChannel().map(READ_ONLY, 0, length).load();
}
private ByteBuffer ioRead (final FileInputStream fis, final int length)
throws IOException
{
final byte[] bytes = new byte[length];
fis.read(bytes);
return ByteBuffer.wrap(bytes);
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
Regular IO can be used in place of NIO, but at the expense of allocating memory in the heap, which could be a problem for processing multiple large files at the same time.