JDK-6559590 : Pattern.compile(".*").split("") returns incorrect result
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.util.regex
  • Affected Version: 6
  • Priority: P4
  • Status: Closed
  • Resolution: Fixed
  • OS: linux
  • CPU: x86
  • Submitted: 2007-05-18
  • Updated: 2013-12-16
  • Resolved: 2013-11-13
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 8 Other
8 b117Fixed port-stage-ppc-aixFixed
Description
FULL PRODUCT VERSION :
java version "1.6.0"
Java(TM) SE Runtime Environment (build 1.6.0-b105)
Java HotSpot(TM) Client VM (build 1.6.0-b105, mixed mode, sharing)


ADDITIONAL OS VERSION INFORMATION :
Linux matthew-desktop 2.6.20-15-generic #2 SMP Sun Apr 15 07:36:31 UTC 2007 i686 GNU/Linux


A DESCRIPTION OF THE PROBLEM :
I believe that Pattern.split() and String.split() are implemented incorrectly for the case where the input is an empty string, and the pattern can match zero-length subsequences. For example: Pattern.compile(".*").split("") returns an array containing an empty string. The correct behaviour would be for it to return an empty array.

Rationale: the API docs promise that, "trailing empty strings will be discarded" -- always in the one-argument version of split(), or when the limit is zero in the two argument version. This is not happening in the above case.

While the API docs do also say that, "If this pattern does not match any subsequence of the input then the resulting array has just one element, namely the input sequence in string form", this is not the case here, because the pattern does match the input (as shown in test case).

Looking at the source code for Pattern.split(), it would seem that the test for "no match was found" is incorrect for this particular case.

This is not the most earth-shatteringly critical bug, of course ;-)

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Run test case.

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Regex matches: 1
Number of split() results: 1
split() result 0: ""
ACTUAL -
Regex matches: 1
Number of split() results: 0


REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class SplitTest {
    public static void main(String[] args) {
        int count = 0;
        Pattern pattern = Pattern.compile(".*");
        Matcher matcher = pattern.matcher("");
        while (matcher.find())
            count++;
        System.out.println("Regex matches: " + count);
        String[] strings = pattern.split("");
        System.out.println("Number of split() results: " + strings.length);
        for (int i = 0; i < strings.length; i++)
            System.out.println("split() result " + i + ": \"" + strings[i] + "\"");
    }
}

---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
Can't think of any, barring avoiding doing wacky things like attempting to split empty strings with weird delimiters.

Comments
This changes cause regression and changes undo in JDK-8028321
16-12-2013

to follow perl's spec to always return zero-length resulting array if input sequence is zero-length
11-11-2013

EVALUATION Need to decide whether to update the implementation or the API doc to address this "split an empty string" case.
27-07-2007