United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
JDK-4949631 : String.getBytes() does not work on some strings larger than 16MB

Details
Type:
Bug
Submit Date:
2003-11-05
Status:
Resolved
Updated Date:
2004-12-31
Project Name:
JDK
Resolved Date:
2004-11-20
Component:
core-libs
OS:
solaris_9,linux,windows_xp,windows_2000
Sub-Component:
java.nio.charsets
CPU:
x86,generic
Priority:
P3
Resolution:
Fixed
Affected Versions:
1.4.2,1.4.2_02,5.0
Fixed Versions:

Related Reports
Backport:
Backport:
Duplicate:
Duplicate:

Sub Tasks

Description
Name: rmT116609			Date: 11/05/2003


FULL PRODUCT VERSION :
java version "1.4.2"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2-b28)
Java HotSpot(TM) Client VM (build 1.4.2-b28, mixed mode)



FULL OS VERSION :
Linux wks001 2.4.20-19.9 #1 Wed Jul 23 19:06:26 EDT 2003 i686 i686 i386 GNU/Linux
SunOS drip 5.8 Generic_108528-22 sun4u sparc SUNW,UltraAX-i2

A DESCRIPTION OF THE PROBLEM :
When a string gets over a certain length (16777216 characters), calling getBytes() on it will trigger a java.nio.BufferOverflowException for certain string lengths. Adding one character at a time, shows this to be 1 in every 4.

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Create a file of at least 16777216 characters (this is the boundary at which the bug starts to occur). e.g.:

  dd if=/dev/zero of=/tmp/inputfile bs=1024 count=16384

Create a test program to read in this file to a string. Then repeatedly add a character to the string and call getBytes() on it. Each 4th character added will cause a java.nio.BufferOverflowException. See source code example for this.



EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
total length = 16777216
now at total length = 16777217
now at total length = 16777218
now at total length = 16777219
now at total length = 16777220
now at total length = 16777221
now at total length = 16777222
now at total length = 16777223
now at total length = 16777224
now at total length = 16777225
now at total length = 16777226
now at total length = 16777227
now at total length = 16777228
now at total length = 16777229
now at total length = 16777230
...etc...
ACTUAL -
total length = 16777216
now at total length = 16777217
Error at total length = 16777217
java.nio.BufferOverflowException
now at total length = 16777218
now at total length = 16777219
now at total length = 16777220
now at total length = 16777221
Error at total length = 16777221
java.nio.BufferOverflowException
now at total length = 16777222
now at total length = 16777223
now at total length = 16777224
now at total length = 16777225
Error at total length = 16777225
java.nio.BufferOverflowException
now at total length = 16777226
now at total length = 16777227
now at total length = 16777228
now at total length = 16777229
Error at total length = 16777229
java.nio.BufferOverflowException
now at total length = 16777230
...etc...

ERROR MESSAGES/STACK TRACES THAT OCCUR :
Exception in thread "main" java.nio.BufferOverflowException
        at java.nio.charset.CoderResult.throwException(CoderResult.java:259)
        at java.lang.StringCoding$CharsetSE.encode(StringCoding.java:340)
        at java.lang.StringCoding.encode(StringCoding.java:374)
        at java.lang.StringCoding.encode(StringCoding.java:380)
        at java.lang.String.getBytes(String.java:590)
        at TestBuffer2.main(TestBuffer2.java:21)


REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
import java.io.*;

class TestBuffer2 {

	public static void main(String[] args) throws IOException {

		StringBuffer output = new StringBuffer();

		byte[] buf = new byte[102400];
		FileInputStream fis = new FileInputStream("/tmp/inputfile");
		long totalLength=0;
		int bytes = 0;
		while((bytes = fis.read(buf))>0) {
			output.append(new String(buf,0,bytes));
			totalLength+=bytes;
		}
		System.out.println("total length = "+totalLength);

		for (int i = 0; i < 10000; i++) {
			try {
				byte bufferoverflow2[] = output.toString().getBytes();
			} catch (Exception e) {
				System.out.println("Error at total length = "+totalLength);
				System.out.println(e);
			}
			output.append("a");
			totalLength += 1;
			System.out.println("now at total length = "+totalLength);
		}
		System.out.println("Done!\n\n");
	}
}
---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
If you are not concerned with the exact format of your output string (e.g. when using it for HTML or XML purposes), you can hack around the problem like this:

if (output.length() > 16777217 && output.length() % 4 == 1) {
    output.append("\n");
}
(Incident Review ID: 223082) 
======================================================================
###@###.### 2004-11-09 00:34:44 GMT

                                    

Comments
EVALUATION

True.  -- ###@###.### 2003/11/15

Here's a more concise test case:

----------------------------------------------------------
class Bug {
    static void test(int size) {
	try {new String(new char[size]).getBytes();}
	catch (Throwable t) {
	    System.out.println("Failed with size="+size);
	    t.printStackTrace();
	}}

    public static void main(String[] args) throws Exception {
	for (int i = 0; i < 10; i++)
	    test(16777216+i);
    }
}
----------------------------------------------------------

which fails in the same manner.

###@###.### 2004-09-02

Ah yes, 16MB is 24 bits, which is the range of accuracy of a float,
and floats are used for maxBytesPerChar and friends.
We need to be more careful with losing bits near
Integer.MAX_VALUE.

###@###.### 2004-09-05

Analysis reveals that both encoders and decoders have the same bug.
See this program:

class Bug4 {
    public static void main(String[] args) throws Exception {
	try {new String(new char[16777217]).getBytes("ASCII");}
	catch (Throwable t) {t.printStackTrace();}

	try {new String(new byte[16777217],"ASCII");}
	catch (Throwable t) {t.printStackTrace();}
    }
}

###@###.### 2004-09-05
                                     
2004-09-05
CONVERTED DATA

BugTraq+ Release Management Values

COMMIT TO FIX:
mustang


                                     
2004-09-25
SUGGESTED FIX

--- /u/martin/ws/mustang/src/share/classes/java/lang/StringCoding.java	2004-08-27 15:53:50.956130000 -0700
+++ /u/martin/ws/nio/src/share/classes/java/lang/StringCoding.java	2004-09-14 22:21:24.156289000 -0700
@@ -76,6 +76,12 @@
 	return tca;
     }
 
+    private static int scale(int len, float expansionFactor) {
+	// We need to perform double, not float, arithmetic; otherwise
+	// we lose low order bits when len is larger than 2**24.
+	return (int)(len * (double)expansionFactor);
+    }
+
     private static Charset lookupCharset(String csn) {
 	if (Charset.isSupported(csn)) {
 	    try {
@@ -173,7 +179,7 @@
 	}
 
 	char[] decode(byte[] ba, int off, int len) {
-	    int en = (int)(cd.maxCharsPerByte() * len);
+	    int en = scale(len, cd.maxCharsPerByte());
 	    char[] ca = new char[en];
 	    if (len == 0)
 		return ca;
@@ -324,7 +330,7 @@
 	}
 
 	byte[] encode(char[] ca, int off, int len) {
-	    int en = (int)(ce.maxBytesPerChar() * len);
+	    int en = scale(len, ce.maxBytesPerChar());
 	    byte[] ba = new byte[en];
 	    if (len == 0)
 		return ba;
--- /u/martin/ws/mustang/src/share/classes/sun/nio/cs/ext/JISAutoDetect.java	2004-08-27 16:00:37.276131000 -0700
+++ /u/martin/ws/nio/src/share/classes/sun/nio/cs/ext/JISAutoDetect.java	2004-09-14 22:21:25.293210000 -0700
@@ -138,7 +138,9 @@
 		if (! dst.hasRemaining())
 		    return CoderResult.OVERFLOW;
 
-		int cbufsiz = (int) (src.limit() * maxCharsPerByte());
+		// We need to perform double, not float, arithmetic; otherwise
+		// we lose low order bits when src is larger than 2**24.
+		int cbufsiz = (int)(src.limit() * (double)maxCharsPerByte());
 		CharBuffer sandbox = CharBuffer.allocate(cbufsiz);
 
 		// First try ISO-2022-JP, since there is no ambiguity
--- /u/martin/ws/mustang/test/java/lang/StringCoding/Enormous.java	1969-12-31 16:00:00.000000000 -0800
+++ /u/martin/ws/nio/test/java/lang/StringCoding/Enormous.java	2004-09-14 22:21:26.017575000 -0700
@@ -0,0 +1,11 @@
+/* @test @(#)Enormous.java	1.1 04/09/14
+ * @bug 4949631
+ * @summary Check for ability to recode arrays of odd sizes > 16MB
+ */
+
+public class Enormous {
+    public static void main(String[] args) throws Exception {
+	new String(new char[16777217]).getBytes("ASCII");
+	new String(new byte[16777217],"ASCII");
+    }
+}
###@###.### 2004-11-09 00:34:44 GMT
                                     
2004-11-09



Hardware and Software, Engineered to Work Together