JDK-4942835 : Enhance treatment of anchors for java.util.regex.Matcher.find(int)
  • Type: Enhancement
  • Component: core-libs
  • Sub-Component: java.util.regex
  • Affected Version: 1.4.2
  • Priority: P4
  • Status: Closed
  • Resolution: Duplicate
  • OS: windows_2000
  • CPU: x86
  • Submitted: 2003-10-23
  • Updated: 2003-10-23
  • Resolved: 2003-10-23
Related Reports
Duplicate :  
Description

Name: rmT116609			Date: 10/23/2003


A DESCRIPTION OF THE REQUEST :
java.util.regex.Matcher.find(int) should honor anchors "^" and "\A" to match at the given starting position.

JUSTIFICATION :
Documentation of Matcher.find(int index) states that the matcher is reset and starts matching at the given offset.  This is a method for efficiently (i.e. without creating another object for the subsequence starting at index) matching from a given starting index on.  This matching then should honor the position given as the starting position.

I'm not sure whether I'd rather regard this a bug.

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
The piece of code should match and print "foo" to the console.

ACTUAL -
The piece of code does not print anything since it does not match.



---------- BEGIN SOURCE ----------
Matcher m = Pattern.compile( "^foo" ).matcher( "barfoo" );

if ( m.find( 3 ) ) {
  System.out.println( m.group() );
}

---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
Create a class implementing CharSequence that delegates to another CharSequence with an offset (implementation provided below).  This way one can make the Matcher believe he matches at the beginning of a sequence.

/**
 *  Class that serves as a proxy for a CharSequence and cuts off an initial
 *  portion. The instance is initialized to the empty string and can be changed
 *  to point to another sequence via {@link #set(CharSequence, int)}.
 *
 * @author     robert.klemme
 * @created    27.06.2003 11:18:23
 * @version    $Id:$
 * @see        java.lang.CharSequence
 */
public class OffsetCharSequence implements CharSequence {

    public int length() {
        return sequence.length() - offset;
    }


    public char charAt( int index ) {
        return sequence.charAt( index + offset );
    }

    public CharSequence subSequence( int start, int end ) {
        return sequence.subSequence( start + offset, end + offset );
    }


    /**
     *  Initialize this instance with the given sequence and offset.
     *
     * @param  seq                        a sequence, not 'null'
     * @param  offset                     an offset >= 0 and <=
     *      seq.length()
     * @throws  NullPointerException      if seq is 'null'.
     * @throws  IllegalArgumentException  if offset is negative or >
     *      seq.length()
     */
    public void set( CharSequence seq, int offset ) {
        if ( offset < 0 || offset > seq.length() ) {
            throw new IllegalArgumentException( "Illegal offset" );
        }

        this.sequence = seq;
        this.offset = offset;
    }



    /**
     *  Creates a string from the visible sub sequence.
     *
     * @return    Description of the Return Value
     */
    public String toString() {
        return subSequence( 0, length() ).toString();
    }


    /**
     *  The sequence to delegate to.
     */
    private CharSequence sequence = "";

    /**
     *  the offset &gt;= 0 and &lt;= squence.length():
     */
    private int offset = 0;
}
(Incident Review ID: 189479) 
======================================================================

Comments
EVALUATION This is already fixed in Tiger as part of the addition of regions. See also 4757029. ###@###.### 2003-10-23
23-10-2003