JDK-8241020 : LineNumberReader.getLineNumber() behavior is inconsistent with respect to EOF
  • Type: CSR
  • Component: core-libs
  • Sub-Component: java.io
  • Priority: P4
  • Status: Closed
  • Resolution: Approved
  • Fix Versions: 16
  • Submitted: 2020-03-13
  • Updated: 2020-08-04
  • Resolved: 2020-08-04
Related Reports
CSR :  
Description
Summary
-------

Modify `LineNumberReader` to consider end-of-stream to be a line terminator.

Problem
-------

`LineNumberReader` presently does not consider end-of-stream to be a line terminator, i.e, one of `\r`, `\n`, or `\r\n`. Thus for example for a stream such as

```
line 1\n
line 2\n
line 3
```

which ends without a line terminator, only two lines would be reported.

Solution
--------

Modify `LineNumberReader` to consider end-of-stream to be a line terminator.

Specification
-------------

Modify the specification of `LineNumberReader` as follows.

### Class specification

 By default, line numbering begins at 0. This number increments at every line terminator as the data is read, and at the end of the stream if the last character in the stream is not a line terminator. This number can be changed with a call to s`etLineNumber(int)`. Note however, that `setLineNumber(int)` does not actually change the current position in the stream; it only changes the value that will be returned by `getLineNumber()`.

A line is considered to be terminated by any one of a line feed ('\n'), a carriage return ('\r'), or a carriage return followed immediately by a linefeed, or any of the previous terminators followed by end of stream, or end of stream not preceded by another terminator.

### read() specification

Read a single character. Line terminators are compressed into single newline ('\n') characters. The current line number is incremented whenever a line terminator is read, or when the end of the stream is reached and the last character in the stream is not a line terminator.

### read(char[],int,int) specification

Read characters into a portion of an array. Line terminators are compressed into single newline ('\n') characters. The current line number is incremented whenever a line terminator is read, or when the end of the stream is reached and the last character in the stream is not a line terminator.

### readLine() specification

Read a line of text. Line terminators are compressed into single newline ('\n') characters. The current line number is incremented whenever a line terminator is read, or when the end of the stream is reached and the last character in the stream is not a line terminator.
Comments
Moving amended request to Approved.
04-08-2020

Moving back to finalized; neither reviewer has further comments.
03-08-2020

1. Add "Line terminators are compressed into single newline ('\n') characters." as the second sentence of the specifications of readLine() and read(char[],int,int). 2. Corrected the specification of the return value of read(char[],int,int) to state "characters" instead of "bytes:" Before: @return The number of bytes read, or -1 if the end of the stream has already been reached <p> After: @return The number of characters read, or -1 if the end of the stream has already been reached
24-07-2020

Moving revised request to Approved.
20-07-2020

We need to tread carefully here as this is changing the behavior of a JDK 1.1 era API. The previous attempt to fix this has to be backed out. I've changed the Compatible Risk to "Medium". Overall I think this proposal is good and benefits from the experience of the first attempt to fix this issue. The common case of using readLine and getLineNumber will not change. I suspect it will be less common to mix read and getLineNumber. A release note is important for this change. It would also be useful to see if we can get some of the open source libraries testing the EA builds to test the change too.
19-07-2020

Moving to Approved, contingent on a release note being written for this change. An alternative and smaller spec edit would just be to redefine line termination. Currently, the spec says: "A line is considered to be terminated by any one of a line feed ('\n'), a carriage return ('\r'), or a carriage return followed immediately by a linefeed." This could be augmented to say: "... or any of the previous terminators followed by end of stream, or end of stream not preceded by another terminator." The current phrasing highlights the difference on each method, which may or may not be an advantage. If you decided to go with this alternative, just revise and re-finalize the spec and I'll re-approve it.
19-03-2020