A DESCRIPTION OF THE PROBLEM : The java.io.Reader API says essentially that, if a reader supports mark/reset, marking a certain number of characters using reader.mark() will allow the caller to reset the position back to the beginning of those characters later using reader.reset(), as long as the caller didn't ask the reader to read more characters than I had marked. LineNumberReader automatically reads two characters internally if it finds a CRLF sequence, normalizing the sequence to LF. This results in two calls to the underlying read() method. If the caller has called reader.mark(1), and if a CRLF straddles the end of the buffer, then reading the second character will invalidate the mark. Subsequently calling reset() to reset the mark will result in "java.ioException: Mark invalid". Why this happens is obvious, once one is aware of it. However from the point of view of a consumer of the Reader API, the reader is not following its contract. If a consumer has a Reader reference, not knowing whether it is a LineNumberReader or any other sort of reader, it expects that, if mark/reset is supported, then mark/reset should work. In other words, the API contract says that some readers will support mark/reset and some won't. The API does _not_ say that some of those which support mark/reset may fail arbitrarily. Moreover even if LineNumberReader were intended to behave differently (breaking the Liskov Substitution Principle if not breaking the contract), this is not documented anywhere. In summary, LineNumberReader will behave unexpectedly in a certain edge case and throw an exception, _not related to any fault of the caller_. STEPS TO FOLLOW TO REPRODUCE THE PROBLEM : 1. Create a LineNumberReader with a small buffer size. 2. Provide test data in which CRLF straddles the end of the buffer. 3. Before reading the LF character, set reader.mark(1). 4. After reading the LF character try to reset using reader.reset(). EXPECTED VERSUS ACTUAL BEHAVIOR : EXPECTED - The reader should reset to its position before calling mark(1). In the example below, the code should print "bar". ACTUAL - java.io.IOException: Mark invalid at java.io.BufferedReader.reset(BufferedReader.java:512) at java.io.LineNumberReader.reset(LineNumberReader.java:277) ... ---------- BEGIN SOURCE ---------- final String string = "foo\r\nbar"; try (final Reader reader = new LineNumberReader(new StringReader(string), 5)) { reader.read(); reader.read(); reader.read(); reader.read(); reader.mark(1); reader.read(); reader.reset(); //error! System.out.print((char)reader.read()); System.out.print((char)reader.read()); System.out.println((char)reader.read()); } ---------- END SOURCE ---------- CUSTOMER SUBMITTED WORKAROUND : Use "reader.mark(2)" instead of "reader.mark(1)". (But how would a caller know it needs to do that with the general Reader API?) FREQUENCY : always
|