JDK-8072582 : Scanner delimits incorrectly when delimiter spans a buffer boundary
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.util
  • Affected Version: 8u31
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • OS: linux
  • CPU: x86_64
  • Submitted: 2015-01-27
  • Updated: 2017-03-09
  • Resolved: 2016-06-25
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 9
9 teamFixed
Related Reports
Cloners :  
Description
FULL PRODUCT VERSION :
$ java -version
java version "1.8.0_31"
Java(TM) SE Runtime Environment (build 1.8.0_31-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.31-b07, mixed mode)


ADDITIONAL OS VERSION INFORMATION :
$ uname -a
Linux myhost 3.17.3-200.fc20.x86_64 #1 SMP Fri Nov 14 19:45:42 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

A DESCRIPTION OF THE PROBLEM :
When using a java.util.Scanner to read text (from a String or a file), the scanner doesn't apply delimiters correctly if a delimiter falls across the internal CharBuffer (buf) boundary.

This appears to only occur when the delimiter is a disjunctive regex, where the one delimiter includes another.

For example, if the delimiters are "," and "#,#" (without quotes), and the scanner source contains

    ...ddd#,#eee...

with the internal Scanner CharBuffer spanning up to and including the first #, then the scanner will return "...ddd#" as one token, followed by "#eee..." as the next token.

See included example source, which sets up the above scenario, based on the scanner BUFFER_SIZE of 1024.

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Execute the program provided in the "Source code" section below.

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
No output expected, as I wouldn't expect any delimiter value to appear in a token obtained from scanner.next()
ACTUAL -
Delimiter # found in: dddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd#
Delimiter # found in: #eeeeeeeeeeee


REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
import java.io.FileNotFoundException;
import java.util.Scanner;

public class ScannerBug
{
    public static void main(String[] args) throws FileNotFoundException
    {
        Scanner scanner = new Scanner("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa,bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb#,#cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc,dddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd#,#eeeeeeeeeeee");
        scanner.useDelimiter("(,)|(#,#)"); // delimit on "," and "#,#"

        while(scanner.hasNext()){
            String next = scanner.next();
            if(next.contains("#")){
                System.out.println("Delimiter # found in: " + next);
            }
        }
    }
}
---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
Don't use the Scanner class - use an alternative.


Comments
Reopened as JDK-8176407.
09-03-2017

The fix has been backed out, as it triggered the regression reported in JDK-8159545.
24-06-2016

Checked this with JDK 7u76, 8u31, 8u40ea and 9ea and could reproduce the issue. >java ScannerBug Delimiter # found in: dddddddddddddddddddddddddddddddddddddddddddddddddddddddddd dddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd dddddddddddddddddddddddddddddddddddddddddddddddddd# Delimiter # found in: #eeeeeeeeeeee
05-02-2015