While making an attempt to replace the ASCII fast loop in `String.encodeUTF8_UTF16` I noticed that altering the shape of the code so that char c is scope local to each loop helps the performance of the method by helping C2 optimize each loop better. I narrowed it down to something as straightforward as this:
```
diff --git a/src/java.base/share/classes/java/lang/String.java b/src/java.base/share/classes/java/lang/String.java
index abb35ebaeb1..f84d60f92cc 100644
--- a/src/java.base/share/classes/java/lang/String.java
+++ b/src/java.base/share/classes/java/lang/String.java
@@ -1284,14 +1284,17 @@ public final class String
int sp = 0;
int sl = val.length >> 1;
byte[] dst = new byte[sl * 3];
- char c;
- while (sp < sl && (c = StringUTF16.getChar(val, sp)) < '\u0080') {
+ while (sp < sl) {
+ char c = StringUTF16.getChar(val, sp);
+ if (c >= '\u0080') {
+ break;
+ }
// ascii fast loop;
dst[dp++] = (byte)c;
sp++;
}
while (sp < sl) {
- c = StringUTF16.getChar(val, sp++);
+ char c = StringUTF16.getChar(val, sp++);
if (c < 0x80) {
dst[dp++] = (byte)c;
} else if (c < 0x800) {
```
Results on a few micros I'm updating to better stress this code --
Baseline:
```
Benchmark (charsetName) Mode Cnt Score Error Units
StringEncode.WithCharset.encodeUTF16 UTF-8 avgt 15 171.853 ± 10.275 ns/op
StringEncode.WithCharset.encodeUTF16LongEnd UTF-8 avgt 15 1991.586 ± 82.234 ns/op
StringEncode.WithCharset.encodeUTF16LongStart UTF-8 avgt 15 8422.458 ± 473.161 ns/op
```
Patch:
```
Benchmark (charsetName) Mode Cnt Score Error Units
StringEncode.WithCharset.encodeUTF16 UTF-8 avgt 15 128.525 ± 6.573 ns/op
StringEncode.WithCharset.encodeUTF16LongEnd UTF-8 avgt 15 1843.455 ± 72.984 ns/op
StringEncode.WithCharset.encodeUTF16LongStart UTF-8 avgt 15 4124.791 ± 308.683 ns/op
```
Going back, this seem to have been an issue with this code since its inception with JEP 254 in JDK 9.
The micro encodeUTF16LongEnd encodes a longer string which is mostly ASCII but with an non-ASCII codepoint at the end. This exaggerates the usefulness of the ascii loop. encodeUTF16LongStart tests the same string but with the non-ASCII codepoint moved to the front. This stresses the non-ascii loop. We see that the patch above helps in general, but mainly improves the microbenchmark that spends its time in the second loop.
There's likely a compiler bug hiding in plain sight here where the potentially uninitialized local `char c` messes up the loop optimization of the second loop. I think the above patch is reasonable to put back into the JDK while we investigate if/how C2 can better handle this pattern.