JDK-4179057 : String UTF-8 Encoding of null char ('\u0000')
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.nio.charsets
  • Affected Version: 1.1.5
  • Priority: P4
  • Status: Closed
  • Resolution: Not an Issue
  • OS: windows_nt
  • CPU: x86
  • Submitted: 1998-10-06
  • Updated: 1999-09-09
  • Resolved: 1999-09-09
Related Reports
Relates :  
Description

Name: tb29552			Date: 10/06/98


/*
The code in String which converts to UTF-8
has two problems:

1. It does not handle the null 
   char ('\u0000') correctly.
   The null char is mapped to the null byte, 
   according to the Unicode standard version 2.0.
   The code here allocated two bytes for it.

2. Why it's not public?  I need it and I have to
   copy the code.
*/

/**
 * Returns the length of this string's UTF encoded form.
 */
int utfLength() {
    int limit = offset + count;
    int utflen = 0;
    for (int i = offset; i < limit; i++) {
        int c = value[i];
        if ((c >= 0x0001) && (c <= 0x007F)) {
            utflen++;
        } else if (c > 0x07FF) {
            utflen += 3;
        } else {
            utflen += 2;
        }
    }
    return utflen;
}

(Review ID: 28739)
======================================================================

Comments
EVALUATION [xueming.shen@Eng 1999-09-09] Well, this method was supposed to be internally used by java/javac, I don't see any reason we should have a "public method" to calculate the length of a String in utf8 format. If you are really curious about why a NULL talks TWO bytes, the following comment may give you some hints... * In JAVA/JAVAC, we deviate slightly from the above. * 1) The null unicode character is represented using the 2-byte format * 2) All UTF strings are null-terminated. * In this way, we do not need to separately maintain a length field for the * UTF string. Again, it's a internal purpose method, we can do whatever we want, not a bug at all.
11-06-2004

WORK AROUND Name: tb29552 Date: 10/06/98 None. ======================================================================
11-06-2004