FULL PRODUCT VERSION :
java version "1.7.0_15"
Java(TM) SE Runtime Environment (build 1.7.0_15-b03)
Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode)
ADDITIONAL OS VERSION INFORMATION :
Microsoft Windows [Version 6.1.7601]
A DESCRIPTION OF THE PROBLEM :
java.util.Scanner scanner = new Scanner(longString);
String result = scanner.useDelimiter("\\z").next();
I expect the result to be equal the original input string. However, the returned string is cut off at position 1024 which happens to be the size of Scanner's internal buffer. The internal buffer is supposedly intended to grow as needed for large inputs so this is clearly a bug.
This I not an academic problem as I am using Scanner to parse databases given as large text files.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Compile and run the program below. This program creates two long strings, 1000 and 1100 characters long, and reads until end of input with Scanner. The first string is read OK, while the second is cut off after 1024 chars.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
I expect the whole remainder of an input string to be returned if I perform a scan using 'end of input' as the delimiter. The buffer is an internal structure of Scanner whose size and characteristics I cannot be expected to know anything about..
ACTUAL -
Scanner stops reading at the end of its internal buffer instead of increasing the buffer to accommodate the whole input string.
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
package scannerbug;
import java.util.Scanner;
/**
* Test program that verifies that Scanner stops reading at the end of its internal buffer while scanning for the end
* of input of a large string.
*/
public class ScannerBug {
private static final String string1000 =
"0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789" +
"0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789" +
"0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789" +
"0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789" +
"0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789" +
"0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789" +
"0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789" +
"0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789" +
"0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789" +
"0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789";
private static final String string1100 =
"0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789" +
"0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789" +
"0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789" +
"0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789" +
"0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789" +
"0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789" +
"0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789" +
"0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789" +
"0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789" +
"0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789" +
"0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789";
public static void main(String[] args) {
Scanner scanner1000 = new Scanner(string1000);
Scanner scanner1100 = new Scanner(string1100);
String scanned1000 = scanner1000.useDelimiter("\\z").next();
String scanned1100 = scanner1100.useDelimiter("\\z").next();
int length1000 = scanned1000.length();
int length1100 = scanned1100.length();
if (length1000 == 1000) {
System.out.println("The length of the first scanned line is 1000, OK!");
} else {
System.err.println("The length of the first scanned line is " + length1000 + " while 1000 was expected, failure!");
}
if (length1100 == 1100) {
System.out.println("The length of the second scanned line is 1100, OK!");
} else {
System.err.println("The length of the second scanned line is " + length1100 + " while 1100 was expected, failure!");
}
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
By using the end of line delimiter $ I can work around this bug but obviously this only works for strings that don't contain line breaks.