JDK-4948767 : String.replaceAll()/replaceFirst(): alternatives without regular expressions
  • Type: Enhancement
  • Component: core-libs
  • Sub-Component: java.lang
  • Affected Version: 1.4.2
  • Priority: P4
  • Status: Closed
  • Resolution: Duplicate
  • OS: windows_nt
  • CPU: x86
  • Submitted: 2003-11-04
  • Updated: 2005-09-19
  • Resolved: 2003-11-05
Related Reports
Duplicate :  
Description
Name: rmT116609			Date: 11/04/2003


A DESCRIPTION OF THE REQUEST :
First, I was happy that there is finally a replaceAll() method in the String  class. So, I tried to use it to simple do what the method name promises: replace all occurences of a substring with another substring. Until I got my first PatternSyntaxException! The problem was that I wanted to replace a substring '{reportId}' with a string '4711'. Just a simple task. But the String.replaceAll() handles special characters of the string to replace in a way to interpret (regular expressions).


JUSTIFICATION :
Offer a replace function to replace substrings (and not only single chars) without caring about special characters in the substring to be replaced. This should be a simple replace function and not a regular expression function.

This is especially difficult, if the strings (original, substring and replacement string) are dynamically read, so the application does not know if a string contains any characters that are significant for regular expressions.

Another problem is, that there is no method or at least a well defined list of all characters that need to be escaped.

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
1. Original String: "The quick brown {animal} jumps over the lazy dog."
   String ori = "The quick brown {animal} jumps over the lazy dog.";
   String search = "{animal}";
   String newsub = "dog";
2. Replace '{animal}' with 'dog' via
   String newString = string.replaceAll(search, newsub);
3. Output: "The quick brown dog jumps over the lazy dog."
   System.out.println("New string: "+newString);
ACTUAL -
1. Original String: "The quick brown {animal} jumps over the lazy dog."
2. Replace '{animal}' with 'dog'
3. PatternSyntaxException:
java.util.regex.PatternSyntaxException: Illegal repetition
{animal}
	at java.util.regex.Pattern.error(Pattern.java:1528)
	at java.util.regex.Pattern.closure(Pattern.java:2545)
	at java.util.regex.Pattern.sequence(Pattern.java:1656)
	at java.util.regex.Pattern.expr(Pattern.java:1545)
	at java.util.regex.Pattern.compile(Pattern.java:1279)
	at java.util.regex.Pattern.<init>(Pattern.java:1035)
	at java.util.regex.Pattern.compile(Pattern.java:779)
	at java.lang.String.replaceAll(String.java:1663)
	at de.icomps.prototypes.Test.regularExpressionTest(Test.java:57)
	at de.icomps.prototypes.Test.main(Test.java:34)

---------- BEGIN SOURCE ----------
private void regularExpressionTest() {
        text = "The quick brown {animal} jumps over the lazy dog.";
        System.out.println("Original: "+text);
        substring = "{animal}";
        word = "dog";
        result = text.replaceAll(substring, word);
        System.out.println("Replace all '"+substring+"' with "+word+": "+result);
}//regularExpressionTest()
---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
Escape all possible characters in the substring that are specially handled of the regular expression API (as in applications, the participating strings are not known during compile time, so there are no fixed strings with characters to insert a backslash in front of):

	private void regularExpressionTest() {
        String escapers = "\\([{^$|)?*+."; //characters to escape to avoid PatternSyntaxException
        
        text = "The quick brown {animal} jumps over the lazy dog.";
        System.out.println("Original: "+text);
        regExp = "{animal}";
        regExp = escapeChars(regExp, escapers); //escape possible regular expression characters
        word = "dog";
        result = text.replaceAll(regExp, word);
        System.out.println("Replace all '"+regExp+"' with "+word+": "+result);
    }//regularExpressionTest()
    
    private static String escapeChars(String string, String characters) {
        String result = string; //default;
        
        if (string != null && characters != null) {
            StringCharacterIterator sci = new StringCharacterIterator(characters);
            char c = sci.first();
            boolean backslashEscaped = false;
            while (c != CharacterIterator.DONE) {
                if (c == '\\' && !backslashEscaped) { //escape backslash only once
                    result = escape(result, c, '\\'); //escape with leading backslash
                    backslashEscaped = true;
                } else {
                    result = escape(result, c, '\\'); //escape with leading backslash
                }
                c = sci.next();
            }//next character
        }
        
        return result;
    }//escapeSpecialChars()
    
    public static String escape(String string, char character, char escape) {
        String result = string; //default;
        
        if (string != null) {
            StringBuffer sb = new StringBuffer();
            StringCharacterIterator sci = new StringCharacterIterator(string);
            char c = sci.first();
            while (c != CharacterIterator.DONE) {
                if (c == character) {
                    sb.append(escape);
                }
                sb.append(c);
                c = sci.next();
            }//next character
            result = sb.toString();
        }
        
        return result;
    }//escape()
(Incident Review ID: 191785) 
======================================================================

Comments
WORK AROUND Use \Q...\E to literalize a segment of a regular expression. ###@###.### 2003-11-05
05-11-2003

EVALUATION This is already done in Tiger. ###@###.### 2003-11-05
05-11-2003