Date: Fri, 27 Feb 1998 15:09:41 -0800
From: bill massena <###@###.###>
Our geometry application reads sizable files (multi-Mb) of binary geometry data
consisting of integers and floats or doubles. I am finding that the Streams I
am using are much, much slower than C or Fortran doing the same job.
I have a test file of 500K floats. When I read it using a C program on my SGI
R10000 machine, it takes .03 to .04 seconds. Using the code below
FileInputStream fin;
BufferedInputStream bin;
FastInputStream din = null;
fin = new FileInputStream("fil.bin");
bin = new BufferedInputStream(fin);
din = new FastInputStream(bin);
for (int i=0; i<500000; i++) {
data[i] = din.readFloat();
}
takes from 5 to 6 seconds. So the speed ratio is about 150-to-1!!!
Looking at the DataInputStream source code, I find that in.read() is being used
to get each byte. I read the data into a byte
array, and used the conversion logic from the DataInputStream source to get:
byte b[] = new byte[500000*4];
din.readFully(b);
int ib = 0;
for (int i=0; i<NSIZ; i++) {
data[i] = Float.intBitsToFloat(
((b[ib]&0xff) << 24) + ((b[ib+1]&0xff) << 16) +
((b[ib+2]&0xff) << 8) + (b[ib+3]&0xff));
ib += 4;
}
This process takes about .5 seconds, which is still 16 times slower.
I don't want to put file-reading logic like this in our code. Also, this is
just a test case. It looks like time spent reading and writing binary files may
go from a few seconds to many minutes for us. This is a big deal, because the
first thing a prospective user of the application will do is read historical
geometry data to begin evaluation.
I would like to believe that file reading can be made much faster; that there is
nothing inherent in Java which prevents this.
I was surprised to find that there is a native method for reading an array of
bytes, but nothing similar for ints, floats, and doubles. With methods like
these, file reading would be much faster. Could you folks consider coming up
with changes like this?
Finally, I assume that everybody in the numerical analysis community would
benefit from much greater I/O rates.