JDK-8235668 : LineNumberReader#getLineNumber() returns wrong line number (one fewer) in Lucene test
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.io
  • Affected Version: 14
  • Priority: P3
  • Status: Closed
  • Resolution: Fixed
  • OS: generic
  • CPU: generic
  • Submitted: 2019-12-10
  • Updated: 2020-05-18
  • Resolved: 2019-12-12
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 14 JDK 15
14 b28Fixed 15Fixed
Related Reports
Relates :  
Relates :  
Description
Lucene has a test to check certain broken input files with patterns (hunspell library). The files are read by LineNumberReader.

The test expects a ParseException which is thrown by Lucene's code. The PraseException is initialized using the LineNumberReader#getLineNumber() method to refer to the line number where the parse error occurs. The test asserts that error occurs on correct line number.

With latest JDK14 EA builds (b24), the line number is one-off (23 instead of 24).

Code:
https://github.com/apache/lucene-solr/blob/53064e46ddfc94a0b0e1d9c9f3e94786fb6701cd/lucene/analysis/common/src/test/org/apache/lucene/analysis/hunspell/TestDictionary.java#L127-L142

The code to read the file is in Dictionary:

https://github.com/apache/lucene-solr/blob/78d5cfefe2453345c498984bf0e405d254a9d5bc/lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java#L320-L394

Test file:
https://github.com/apache/lucene-solr/blob/master/lucene/analysis/common/src/test/org/apache/lucene/analysis/hunspell/broken.aff#L24

To reproduce check out Lucene/Solr from Github, run "ant bootstrap" from checkout root and then run "ant test" inside lucene/analysis/common. 

With JDK 11, 12, 13 it works (also Java 8 in previous Lucene release branch), but fails with 14 b24.
Comments
URL: https://hg.openjdk.java.net/jdk/jdk14/rev/de66d41b9486 User: bpb Date: 2019-12-12 22:03:39 +0000
12-12-2019

Brian Burkhalter and I chatted about this issue today. LineNumberReader clearly specifies when the line number is incremented. So the changes in JDK-8230342 are correct but surprising results for code like this: ``` String line; while ((line = lineNumberReader.readLine()) != null) { System.out.format("%3d: %s%n", lineNumberReader.getLineNumber(), line); } ``` when the last line doesn't have a line terminator. The right solution will likely need the line number to be incremented at EOF for cases like this and will probably need a spec clarification. This is too late for JDK 14 so the plan is to revert JDK-8230342 and try to address the issue in the next train.
11-12-2019

It seems to me that the current JDK 14 behavior is correct with respect to the specification: "By default, line numbering begins at 0. This number increments at every line terminator as the data is read, and can be changed with a call to setLineNumber(int)." [1] The output of $ wc -l broken.aff is also 23 on Linux and macOS. What might be questionable however is whether EOF should be given the same weight as line terminators with respect to incrementing the line number. [1] https://docs.oracle.com/en/java/javase/13/docs/api/java.base/java/io/LineNumberReader.html
11-12-2019

I think the fix in JDK-8230342 should rethought: - There is an inconsistency, which is correct and makes it buggy. But a LineNumberReader should by end-user code only be consumed using readLine() while loops - Reading a LineNumberReader with char[] or one char at a time looks like bad practice (the user could also use a plain Reader), so I would't care about inconsistencies too much. But the fix in JDK-8230342 has a significant drawback: Common coding practice to use LineNumberReader is to have a while loop that calls readLine() in the check condition until it returns null. People use getLineNumber() after the readLine() call (in body of while). The nice side-effect is that you have a one-based index, which is good for printing error messages to end-users. The fix breaks this pattern for the very last line of a file, and that's the issue we are seeing. You are breaking "common code" for the sake of "misuse" of LineNumberReader (reading one char at a time).
10-12-2019

From my understanding of the code, the following happens: - we read the file line by line. We get the line number after readLine(), so the number is one-based (as the newline came before). - if we read the broken line (very last line of the file) the following problems occurs: the last broken line does not have a terminating newline. So it is read and returned correctly, but the counter is not updated, as the line did not end in a newline. Depending on how the spec handles a line without trailing NL at end, we may need to fix our test using a trailing newline in test data to work around the issue.
10-12-2019

Need to re-examine the compatibility impact of JDK-8230342.
10-12-2019