JDK-8020986 : Space or \\s at the end of Regular Expression fails.
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.util.regex
  • Affected Version: 6u24
  • Priority: P3
  • Status: Resolved
  • Resolution: Not an Issue
  • OS: windows_7
  • Submitted: 2013-07-22
  • Updated: 2013-07-24
  • Resolved: 2013-07-24
Description
FULL PRODUCT VERSION :
java version  " 1.6.0_24 " 
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) Client VM (build 19.1-b02, mixed mode)


ADDITIONAL OS VERSION INFORMATION :
Windows 7  enterprise edition, Service pack 1

A DESCRIPTION OF THE PROBLEM :
Need to use regular expression to validate a string for preventing special characters, but space, underscore and Hyphen should be allowed.

Use of following java.util.regEx classes is done -

Use of following regular expression : -
Pattern stringPattern = Pattern.compile( " [a-zA-Z0-9_- ]{1,40} " );

Results in following exception -
java.util.regex.PatternSyntaxException: Illegal character range near index 13
[ a-zA-Z0-9_- ]{1,40}
                      ^
at java.util.regex.Pattern.error(Unknown Source)

Or alternatively if we use \s instead of  "   "  the same exception is observed.


but if we use the same expression as follows - i.e. use \\s character in some other place other than end of RegEx, then it works just fine!

Pattern stringPattern = Pattern.compile( " [ a-zA-Z0-9_- ]{1,40} " );

This expression uses  "   "  (space) at the beginning of the RegEx which is working flawlessly.

Kindly provide explanation if i am mistaking anything.




STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Use following method to validate String with white space which doesn't allow any special character except hyphen and underscore.
1)
private static boolean validationRule(String referenceString) {
boolean valid = Boolean.TRUE;
if (!referenceString.isEmpty()) {
Pattern stringPattern = Pattern.compile( " [a-zA-Z0-9_- ]{1,40} " );
if (!(stringPattern.matcher(referenceString).matches())) {
valid = Boolean.FALSE;
}
}
return valid;}


2)
Use of following regular expression is allowed, but not the one mentioned in above method.
( " [ a-zA-Z0-9_-]{1,40} " )


EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Use of space (simply  "   "  or \\s)  in regular expression should be allowed at any place of the regular expression. Also even at the end of regular expression.
ACTUAL -
Illegal character range near index 13
[ a-zA-Z0-9_- ]{1,40}
                       ^
at java.util.regex.Pattern.error(Unknown Source)


ERROR MESSAGES/STACK TRACES THAT OCCUR :
Illegal character

REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
public class JustForTest{

public static void main(String ar[]){
String refString= " This is Test " ;
System.out.println( " Is Referenced String valid? " + validationRule(refString));
}

private static boolean validationRule(String referenceString) {
boolean valid = Boolean.TRUE;
if (!referenceString.isEmpty()) {
Pattern stringPattern = Pattern.compile( " [a-zA-Z0-9_- ]{1,40} " );
if (!(stringPattern.matcher(referenceString).matches())) {
valid = Boolean.FALSE;
}
}
return valid;}

}

---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
replace the line 3 in method validationRule with following one-

Pattern stringPattern = Pattern.compile( " [ a-zA-Z0-9_-]{1,40} " );
Comments
This is not a bug. Pattern.compile() complains about wrong range in the character class []. Range is a pair of alpha characters or digits with a dash in between e.g. a-z, A-Z, 0-9. '_- ' is not a valid range. To include a dash in the character class it must be either first or last symbol in brackets. That's why the 'workaround' works - that's simply a correct way to define a character class.
24-07-2013