JDK-4949631 : String.getBytes() does not work on some strings larger than 16MB
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.nio.charsets
  • Affected Version: 1.4.2,1.4.2_02,5.0
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • OS:
    linux,solaris_9,windows_2000,windows_xp linux,solaris_9,windows_2000,windows_xp
  • CPU: generic,x86
  • Submitted: 2003-11-05
  • Updated: 2021-03-01
  • Resolved: 2004-11-20
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other JDK 6
1.4.2_08Fixed 6 b14Fixed
Related Reports
Duplicate :  
Duplicate :  
Relates :  
Description
Name: rmT116609			Date: 11/05/2003


FULL PRODUCT VERSION :
java version "1.4.2"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2-b28)
Java HotSpot(TM) Client VM (build 1.4.2-b28, mixed mode)



FULL OS VERSION :
Linux wks001 2.4.20-19.9 #1 Wed Jul 23 19:06:26 EDT 2003 i686 i686 i386 GNU/Linux
SunOS drip 5.8 Generic_108528-22 sun4u sparc SUNW,UltraAX-i2

A DESCRIPTION OF THE PROBLEM :
When a string gets over a certain length (16777216 characters), calling getBytes() on it will trigger a java.nio.BufferOverflowException for certain string lengths. Adding one character at a time, shows this to be 1 in every 4.

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Create a file of at least 16777216 characters (this is the boundary at which the bug starts to occur). e.g.:

  dd if=/dev/zero of=/tmp/inputfile bs=1024 count=16384

Create a test program to read in this file to a string. Then repeatedly add a character to the string and call getBytes() on it. Each 4th character added will cause a java.nio.BufferOverflowException. See source code example for this.



EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
total length = 16777216
now at total length = 16777217
now at total length = 16777218
now at total length = 16777219
now at total length = 16777220
now at total length = 16777221
now at total length = 16777222
now at total length = 16777223
now at total length = 16777224
now at total length = 16777225
now at total length = 16777226
now at total length = 16777227
now at total length = 16777228
now at total length = 16777229
now at total length = 16777230
...etc...
ACTUAL -
total length = 16777216
now at total length = 16777217
Error at total length = 16777217
java.nio.BufferOverflowException
now at total length = 16777218
now at total length = 16777219
now at total length = 16777220
now at total length = 16777221
Error at total length = 16777221
java.nio.BufferOverflowException
now at total length = 16777222
now at total length = 16777223
now at total length = 16777224
now at total length = 16777225
Error at total length = 16777225
java.nio.BufferOverflowException
now at total length = 16777226
now at total length = 16777227
now at total length = 16777228
now at total length = 16777229
Error at total length = 16777229
java.nio.BufferOverflowException
now at total length = 16777230
...etc...

ERROR MESSAGES/STACK TRACES THAT OCCUR :
Exception in thread "main" java.nio.BufferOverflowException
        at java.nio.charset.CoderResult.throwException(CoderResult.java:259)
        at java.lang.StringCoding$CharsetSE.encode(StringCoding.java:340)
        at java.lang.StringCoding.encode(StringCoding.java:374)
        at java.lang.StringCoding.encode(StringCoding.java:380)
        at java.lang.String.getBytes(String.java:590)
        at TestBuffer2.main(TestBuffer2.java:21)


REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
import java.io.*;

class TestBuffer2 {

	public static void main(String[] args) throws IOException {

		StringBuffer output = new StringBuffer();

		byte[] buf = new byte[102400];
		FileInputStream fis = new FileInputStream("/tmp/inputfile");
		long totalLength=0;
		int bytes = 0;
		while((bytes = fis.read(buf))>0) {
			output.append(new String(buf,0,bytes));
			totalLength+=bytes;
		}
		System.out.println("total length = "+totalLength);

		for (int i = 0; i < 10000; i++) {
			try {
				byte bufferoverflow2[] = output.toString().getBytes();
			} catch (Exception e) {
				System.out.println("Error at total length = "+totalLength);
				System.out.println(e);
			}
			output.append("a");
			totalLength += 1;
			System.out.println("now at total length = "+totalLength);
		}
		System.out.println("Done!\n\n");
	}
}
---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
If you are not concerned with the exact format of your output string (e.g. when using it for HTML or XML purposes), you can hack around the problem like this:

if (output.length() > 16777217 && output.length() % 4 == 1) {
    output.append("\n");
}
(Incident Review ID: 223082) 
======================================================================
###@###.### 2004-11-09 00:34:44 GMT

Comments
SUGGESTED FIX --- /u/martin/ws/mustang/src/share/classes/java/lang/StringCoding.java 2004-08-27 15:53:50.956130000 -0700 +++ /u/martin/ws/nio/src/share/classes/java/lang/StringCoding.java 2004-09-14 22:21:24.156289000 -0700 @@ -76,6 +76,12 @@ return tca; } + private static int scale(int len, float expansionFactor) { + // We need to perform double, not float, arithmetic; otherwise + // we lose low order bits when len is larger than 2**24. + return (int)(len * (double)expansionFactor); + } + private static Charset lookupCharset(String csn) { if (Charset.isSupported(csn)) { try { @@ -173,7 +179,7 @@ } char[] decode(byte[] ba, int off, int len) { - int en = (int)(cd.maxCharsPerByte() * len); + int en = scale(len, cd.maxCharsPerByte()); char[] ca = new char[en]; if (len == 0) return ca; @@ -324,7 +330,7 @@ } byte[] encode(char[] ca, int off, int len) { - int en = (int)(ce.maxBytesPerChar() * len); + int en = scale(len, ce.maxBytesPerChar()); byte[] ba = new byte[en]; if (len == 0) return ba; --- /u/martin/ws/mustang/src/share/classes/sun/nio/cs/ext/JISAutoDetect.java 2004-08-27 16:00:37.276131000 -0700 +++ /u/martin/ws/nio/src/share/classes/sun/nio/cs/ext/JISAutoDetect.java 2004-09-14 22:21:25.293210000 -0700 @@ -138,7 +138,9 @@ if (! dst.hasRemaining()) return CoderResult.OVERFLOW; - int cbufsiz = (int) (src.limit() * maxCharsPerByte()); + // We need to perform double, not float, arithmetic; otherwise + // we lose low order bits when src is larger than 2**24. + int cbufsiz = (int)(src.limit() * (double)maxCharsPerByte()); CharBuffer sandbox = CharBuffer.allocate(cbufsiz); // First try ISO-2022-JP, since there is no ambiguity --- /u/martin/ws/mustang/test/java/lang/StringCoding/Enormous.java 1969-12-31 16:00:00.000000000 -0800 +++ /u/martin/ws/nio/test/java/lang/StringCoding/Enormous.java 2004-09-14 22:21:26.017575000 -0700 @@ -0,0 +1,11 @@ +/* @test @(#)Enormous.java 1.1 04/09/14 + * @bug 4949631 + * @summary Check for ability to recode arrays of odd sizes > 16MB + */ + +public class Enormous { + public static void main(String[] args) throws Exception { + new String(new char[16777217]).getBytes("ASCII"); + new String(new byte[16777217],"ASCII"); + } +} ###@###.### 2004-11-09 00:34:44 GMT
09-11-2004

CONVERTED DATA BugTraq+ Release Management Values COMMIT TO FIX: mustang
25-09-2004

EVALUATION True. -- ###@###.### 2003/11/15 Here's a more concise test case: ---------------------------------------------------------- class Bug { static void test(int size) { try {new String(new char[size]).getBytes();} catch (Throwable t) { System.out.println("Failed with size="+size); t.printStackTrace(); }} public static void main(String[] args) throws Exception { for (int i = 0; i < 10; i++) test(16777216+i); } } ---------------------------------------------------------- which fails in the same manner. ###@###.### 2004-09-02 Ah yes, 16MB is 24 bits, which is the range of accuracy of a float, and floats are used for maxBytesPerChar and friends. We need to be more careful with losing bits near Integer.MAX_VALUE. ###@###.### 2004-09-05 Analysis reveals that both encoders and decoders have the same bug. See this program: class Bug4 { public static void main(String[] args) throws Exception { try {new String(new char[16777217]).getBytes("ASCII");} catch (Throwable t) {t.printStackTrace();} try {new String(new byte[16777217],"ASCII");} catch (Throwable t) {t.printStackTrace();} } } ###@###.### 2004-09-05
05-09-2004