JDK-4803179 : java.utils.regex.Matcher.appendReplacement replacement string shouldn't allow $g
  • Type: Enhancement
  • Component: core-libs
  • Sub-Component: java.util.regex
  • Affected Version: 1.4.1
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • OS: solaris_7
  • CPU: x86
  • Submitted: 2003-01-14
  • Updated: 2003-06-27
  • Resolved: 2003-06-27
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
5.0 tigerFixed
Related Reports
Relates :  
Description

Name: nt126004			Date: 01/14/2003


FULL PRODUCT VERSION :
java version "1.4.1_01"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.1_01-b01)
Java HotSpot(TM) Client VM (build 1.4.1_01-b01, mixed mode)

FULL OPERATING SYSTEM VERSION :
SunOS 5.7 Generic_106542-22 i86pc i386 i86p



A DESCRIPTION OF THE PROBLEM :
Instead of keeping the cleansiness and robustness of Java
regex vs. Perl programming, this method contains a "trap",
namely the coding of capture groups in the replacement
string.
appendReplacement is meant to replace with calculated
replacement strings (as opposed to hand-typed) - otherwise
methods like replaceAll or replaceFirst should be used.
The benefit of using the Perl-like technique of the $g
shortcut is merely to save a little writing (namely
replacing $2 by " +  matcher.group(2) + ").

The drawback and complication (I call it a trap) you have
introduced in this method disparages the robustness of the
java.utils.regex package: namely, in a calculated
replacement string, one must parse for $ characters and
escape them - that's a special, error-prone treatment,
likely to be forgotten and to provoke ununderstandable
runtime bugs (as the replacement string might change every
time the program runs, and sometimes happen to contain a $
sign).


EXPECTED VERSUS ACTUAL BEHAVIOR :
The $g feature is a non-feature: I claim the only benefit
is to save a few characters' typing. It however is the
source of bugs 4497669 4618713 4621239 4684543 4509697.

My proposal is to abandon this feature.

REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
String regex = "cat";
CharSequence input = "one cat two cats in the yard";
Pattern p = Pattern.compile(regex);
Matcher matcher = p.matcher(input);
StringBuffer sb = new StringBuffer();

while (matcher.find()) {
    matcher.appendReplacement(sb, replacementMethod(matcher.group)));
    }
    matcher.appendTail(sb);
---------- END SOURCE ----------

CUSTOMER WORKAROUND :
Either you can parse and replace all $ in the replacement
string by \$ before replacing, or you can program
the "clean" ($g-feature-less) appendReplacement yourself:

            int append = 0;
            while (matcher.find()) {
                sb.append(input.subSequence(append,
matcher.start()));
                append = matcher.end();
                sb.append(replacementMethod(matcher.group
()));
            }
            sb.append(input.subSequence(append, input.length
()));
(Review ID: 178859) 
======================================================================

Comments
CONVERTED DATA BugTraq+ Release Management Values COMMIT TO FIX: tiger FIXED IN: tiger INTEGRATED IN: tiger tiger-b10
14-06-2004

EVALUATION We will provide a quoting method for replacement strings in Tiger. ###@###.### 2003-05-08
08-05-2003