JDK-8209576 : java.nio.file.Files.writeString writes garbled UTF-16 instead of UTF-8
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.nio.charsets
  • Affected Version: 11
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • OS: linux_ubuntu
  • CPU: x86_64
  • Submitted: 2018-08-14
  • Updated: 2022-08-04
  • Resolved: 2018-08-20
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11 JDK 12
11.0.2Fixed 12 b08Fixed
Related Reports
Relates :  
Description
ADDITIONAL SYSTEM INFORMATION :
openjdk version "11-ea" 2018-09-25
OpenJDK Runtime Environment 18.9 (build 11-ea+25)
OpenJDK 64-Bit Server VM 18.9 (build 11-ea+25, mixed mode)

A DESCRIPTION OF THE PROBLEM :
Certain Unicode characters (see the test) may cause a bug in charset conversion
which may result in a potential data loss. The attached test program outputs:

---- OUTPUT FILE: Files.write.txt
Files.write "Hello" (NOTE: old method)
Files.readAllBytes: "Hello"€ (length = 7)
Files.readAllBytes: [-30, -128, -100, 72, 101, 108, 108, 111, -30, -128, -99]

---- OUTPUT FILE: Files.writeString-ASCII.txt
Files.writeString ASCII
Files.readString: ASCII (length = 5)
Files.readAllBytes: [65, 83, 67, 73, 73]

---- OUTPUT FILE: Files.writeString-Unicode.txt
Files.writeString "Hello"€
Files.readString: ..H.e.l.l.o..  (length = 14)
Files.readAllBytes: [28, 32, 72, 0, 101, 0, 108, 0, 108, 0, 111, 0, 29, 32]

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Text written in UTF-8 as documented
ACTUAL -
Text written in UTF-16 (or something)

---------- BEGIN SOURCE ----------
import java.nio.file.*;
import java.util.Arrays;

public class Test {

	public static void main(String... args) throws Exception {
		final String text = "\u201CHello\u201D"; // <-- quotation char causes problem 
		//final String text = "\u017Cółw"; // some other Unicode chars don't cause this problem

		oldWrite(Path.of("Files.write.txt"), text); // OK
		newWrite(Path.of("Files.writeString-ASCII.txt"), "ASCII"); // OK
		newWrite(Path.of("Files.writeString-Unicode.txt"), text); // <-- BUG
	}

	static void oldWrite(Path output, String text) throws Exception {
		System.out.println();
		System.out.println("---- OUTPUT FILE: " + output);

		System.out.println("Files.write " + text);
		Files.write(output, text.getBytes("UTF-8"));
		String actual = new String(Files.readAllBytes(output), "UTF-8");
		System.out.println("Files.readAllBytes: " + actual + " (length = " + actual.length() + ")");
		System.out.println("Files.readAllBytes: " + Arrays.toString(Files.readAllBytes(output)));
	}

	static void newWrite(Path output, String text) throws Exception {
		System.out.println();
		System.out.println("---- OUTPUT FILE: " + output);

		System.out.println("Files.writeString " + text);
		Files.writeString(output, text); // <-- writes UTF-16 instead of UTF-8
		String actual = Files.readString(output);
		System.out.println("Files.readString: " + actual + " (length = " + actual.length() + ")");
		System.out.println("Files.readAllBytes: " + Arrays.toString(Files.readAllBytes(output)));
	}

}
---------- END SOURCE ----------

FREQUENCY : always



Comments
From [~jeff] (erroneously added to backport) 11u Fix Request: This is TCK-red issue. Currently three JCK testcases are failing and have to be excluded. The fix is a one-line change and will apply cleanly (from the fix in 12) since the source code is identical. The test shall apply cleanly as well since the original test was added in 11
16-10-2018

URL: http://hg.openjdk.java.net/jdk/jdk/rev/8dfed4387312 User: joehw Date: 2018-08-20 17:12:05 +0000
20-08-2018

Reproducible with the attached test case: JDK 11-ea+25 - Fail Output: ---- OUTPUT FILE: Files.write.txt Files.write “Hello” Files.readAllBytes: “Hello” (length = 7) Files.readAllBytes: [-30, -128, -100, 72, 101, 108, 108, 111, -30, -128, -99] ---- OUTPUT FILE: Files.writeString-ASCII.txt Files.writeString ASCII Files.readString: ASCII (length = 5) Files.readAllBytes: [65, 83, 67, 73, 73] ---- OUTPUT FILE: Files.writeString-Unicode.txt Files.writeString “Hello” Files.readString:  H e l l o  (length = 14) Files.readAllBytes: [28, 32, 72, 0, 101, 0, 108, 0, 108, 0, 111, 0, 29, 32]
16-08-2018