Java has had for a while now a bug in its regex system which I'd like to see fixed.
The short of it is that the \z pattern does not return 'requiresEnd' and it should.
public void endTest()
{
Matcher m = Pattern.compile( "\\z" ).matcher( "" );
m.find();
System.out.println( m.requireEnd() );
assert ( m.requireEnd() );
}
This prints 'false'. It shouldn't take much thought to convince yourself that if the end of input is required, then 'requiresEnd()' should always be true. There's never a case for the \z pattern that you want to match less than all of input. The above code snippet would make a fine unit test for this bug, btw.
You can see the results of this bug if you use other parts of the API, for example java.util.Scanner. Since the the requiresEnd() method always returns false, the Scanner will match its own internal buffer (usually 1024 characters) and not the end of input.
public void demo()
{
StringBuilder str = new StringBuilder( 4 * 1024 );
for( int i = 0; i < 1024; i++ ) {
str.append( i );
str.append( ',' );
}
Scanner s = new Scanner( str.toString() );
String result = s.useDelimiter( "\\z" ).next();
String expected = str.toString();
System.out.println( result.length()+", "+expected.length() );
assert( expected.equals( result ) );
}
Output:
C:\Users\Brenden\Dev\proj\Test2\build\classes>java -version
java version "1.8.0"
Java(TM) SE Runtime Environment (build 1.8.0-b129)
Java HotSpot(TM) 64-Bit Server VM (build 25.0-b69, mixed mode)
C:\Users\Brenden\Dev\proj\Test2\build\classes>java -cp . quicktest.RegexBug
1024, 4010
You can see that the length of the matched string is 1024, not 4010 from the original string.
Finally, if you need more convincing, you can test the \Z (capital Z) pattern, which does essentially the same thing as \z. \Z always sets its requiresEnd() flag, and it works as expected in the tests above.