JDK-8231717 : Improve performance of charset decoding when charset is always compactable
  • Type: Enhancement
  • Component: core-libs
  • Sub-Component: java.lang
  • Affected Version: 11,14
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2019-10-01
  • Updated: 2021-10-13
  • Resolved: 2019-10-11
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11 JDK 14
11.0.13Fixed 14 b19Fixed
Description
With the move from jdk8 to jdk9+ the internal representation of String's changed from a char[] to a byte[] plus int coder (LATIN1 or UTF16).
At the same time the StringCoding implementation changed to take advantage of this to optimize decoding of ASCII compatible charsets and also specific common charsets UTF_8, ISO_8859_1 & US_ASCII, enabling direct byte array copy optimizations.
However, at the same time the change of this implementation had an adverse performance effect for charset decoding where no ASCII fastpath (byte array copy) was possible, eg.most EBCDIC. The main reason being because now the internal representation was a coded byte[], after the decoder decoded to a char[], it then had to do a char[]->byte[] copy which didn't happen before.

This enhancement is to improve the performance of charset decoding when COMPACT_STRINGS is enabled by taking advantage of the fact that if a charset is "always compactable", ie.every mapping maps to a single <=0xff value, then the SingleByte.decode() can simply map straight to a LATIN1 byte[] rather than to a char[] (followed by a conversion to a LATIN1 byte[]).

Performance benchmarks show up to a 100% performance improvement for typical charset decoding for charsets that fall into this category, with no impact on other charsets.
 
Comments
Fix Request (11u): This change improves charset decoding for a number of encodings that are "Latin1Decodable". Original patch applies cleanly. The patch is small, targets a certain subset of encodings and has been integrated in 14, so the risk is low. PR: https://github.com/openjdk/jdk11u-dev/pull/58 Testing: tier1, tier2, sun/nio/cs on aarch64 and x86. StrCodingBenchmark benchmark shows improvements for target encodings (IBM037 etc.) on both aarch64 and x86.
23-06-2021

URL: https://hg.openjdk.java.net/jdk/jdk/rev/3968bf3673c5 User: rriggs Date: 2019-10-11 14:57:04 +0000
11-10-2019