JDK-6242664 : String.offsetByCodePoints doesn't work for Strings returned by String.substring
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.lang
  • Affected Version: 5.0
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • OS: linux
  • CPU: x86
  • Submitted: 2005-03-18
  • Updated: 2010-08-03
  • Resolved: 2005-09-12
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other JDK 6
5.0u19Fixed 6 b52Fixed
Description
FULL PRODUCT VERSION :
java version "1.5.0_01"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_01-b08)
Java HotSpot(TM) 64-Bit Server VM (build 1.5.0_01-b08, mixed mode)

ADDITIONAL OS VERSION INFORMATION :
Linux hylas 2.6.9-gentoo-r14 #1 Mon Jan 31 13:57:09 EST 2005 x86_64 AMD Athlon(tm) 64 Processor 3200+ AuthenticAMD GNU/Linux


EXTRA RELEVANT SYSTEM CONFIGURATION :
Tested in Netbeans 4.0

A DESCRIPTION OF THE PROBLEM :
If you get a String back from String.substring(), and try to run .offsetByCodePoints(0,1) on it, it will return a code point index that appears to be relative to the source string (the one you called .subtring() on.)

This is incorrect, since the specification of String.substring() says that it returns a new String().


STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Run the sample code included.

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Since I'm running with a basic US locale, I expect the sample code to print

i=1  j=1  k=1

Certainly, there should be no difference between the values printed for j and k.

ACTUAL -
The sample code prints:

i=1  j=4  k=1



ERROR MESSAGES/STACK TRACES THAT OCCUR :
No error codes or exceptions are generated.

REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
package foo;

public class BugTestClass {
    
    public static void main(String args[]) {

        String myString = "abcdef";
        int i = myString.offsetByCodePoints(0,1);
        String sub = myString.substring(3);
        int j = sub.offsetByCodePoints(0,1);
        int k = new String(sub).offsetByCodePoints(0,1);
        System.out.println("i=" + i + "  j=" + j + "  k=" + k);
    }
}

---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
The workaround is to create a new String as in the example for k above, or like so:

String source = "abcdef";
String sub = new String( source.substring(3) );

Now sub.offsetByCodePoints will work as expected.
###@###.### 2005-03-18 09:18:16 GMT

Comments
EVALUATION Offset value in String class needs to be considered.
18-08-2005

WORK AROUND Use Character.offsetByCodePoints(string, index, codePointOffset). ###@###.### 2005-06-03 08:09:51 GMT
03-06-2005

SUGGESTED FIX 1) Change the 3rd parameter of the invocation of Character.offsetByCodePointsImpl from "offset+index" to "index" in String.offsetByCodePoints 2) Make Character.offsetByCodePointsImpl aware of its "start" parameter. Currently indexing into array parameter "a" assumes this array starts at zero rather than at "start." Note the initialization of "x" and the use of "a[x++]", which do not include "start") 3) Add tests of the codepoint methods involving String.substring so they will operate on strings whose "offset" fields will be nonzero. ###@###.### 2005-04-05 13:23:00 GMT
05-04-2005