JDK-8176407 : (scanner) Scanner delimits incorrectly when delimiter spans a buffer boundary
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.util
  • Affected Version: 8,9
  • Priority: P4
  • Status: Open
  • Resolution: Unresolved
  • Submitted: 2017-03-09
  • Updated: 2023-10-13
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdUnresolved
Related Reports
Cloners :  
Relates :  
Description
FULL PRODUCT VERSION :
$ java -version
java version "1.8.0_31"
Java(TM) SE Runtime Environment (build 1.8.0_31-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.31-b07, mixed mode)


ADDITIONAL OS VERSION INFORMATION :
$ uname -a
Linux myhost 3.17.3-200.fc20.x86_64 #1 SMP Fri Nov 14 19:45:42 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

A DESCRIPTION OF THE PROBLEM :
When using a java.util.Scanner to read text (from a String or a file), the scanner doesn't apply delimiters correctly if a delimiter falls across the internal CharBuffer (buf) boundary.

This appears to only occur when the delimiter is a disjunctive regex, where the one delimiter includes another.

For example, if the delimiters are "," and "#,#" (without quotes), and the scanner source contains

    ...ddd#,#eee...

with the internal Scanner CharBuffer spanning up to and including the first #, then the scanner will return "...ddd#" as one token, followed by "#eee..." as the next token.

See included example source, which sets up the above scenario, based on the scanner BUFFER_SIZE of 1024.

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Execute the program provided in the "Source code" section below.

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
No output expected, as I wouldn't expect any delimiter value to appear in a token obtained from scanner.next()
ACTUAL -
Delimiter # found in: dddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd#
Delimiter # found in: #eeeeeeeeeeee


REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
import java.io.FileNotFoundException;
import java.util.Scanner;

public class ScannerBug
{
    public static void main(String[] args) throws FileNotFoundException
    {
        Scanner scanner = new Scanner("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa,bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb#,#cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc,dddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd#,#eeeeeeeeeeee");
        scanner.useDelimiter("(,)|(#,#)"); // delimit on "," and "#,#"

        while(scanner.hasNext()){
            String next = scanner.next();
            if(next.contains("#")){
                System.out.println("Delimiter # found in: " + next);
            }
        }
    }
}
---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
Don't use the Scanner class - use an alternative.


Comments
Please take a look. Context: https://mail.openjdk.org/pipermail/core-libs-dev/2023-October/113324.html
13-10-2023

This is a clone of JDK-8072582, which was fixed and then backed out. The test case from the submitter still reproduces the problem in JDK 9 build 157.
09-03-2017