JDK-8278518 : String(byte[], int, int, Charset) constructor and String.translateEscapes() miss bounds check elimination
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 17,18,19
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2021-12-09
  • Updated: 2022-02-10
  • Resolved: 2022-01-27
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 19
19 b08Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Description
First this was spotted by Amir Hadadi in https://stackoverflow.com/questions/70272651/missing-bounds-checking-elimination-in-string-constructor

It looks like in the following code

while (offset < sl) {
    int b1 = bytes[offset];
    if (b1 >= 0) {
        dst[dp++] = (byte)b1;
        offset++;                     // <---
        continue;
    }
    if ((b1 == (byte)0xc2 || b1 == (byte)0xc3) &&
            offset + 1 < sl) {
        int b2 = bytes[offset + 1];
        if (!isNotContinuation(b2)) {
            dst[dp++] = (byte)decode2(b1, b2);
            offset += 2;
            continue;
        }
    }
    // anything not a latin1, including the repl
    // we have to go with the utf16
    break;
}

bounds check elimination is not executed when accessing byte array via bytes[offset].

The reason, I guess, is that offset variable is modified within the loop (marked with arrow).

Possible fix for this could be changing:

while (offset < sl)      --->   while (offset >= 0 && offset < sl)

However the best is to invest in C2 optimization to handle all such cases.

The following benchmark demonstrates good improvement:

@State(Scope.Thread)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public class StringConstructorBenchmark {
  private byte[] array;

  @Setup
  public void setup() {
    String str = "Quizdeltagerne spiste jordbær med fløde, mens cirkusklovnen. Я"; // Latin1 ending with Russian
    array = str.getBytes(StandardCharsets.UTF_8);
  }

  @Benchmark
  public String newString()  {
      return new String(array, 0, array.length, StandardCharsets.UTF_8);
  }
}

//baseline
Benchmark                             Mode  Cnt    Score   Error  Units
StringConstructorBenchmark.newString  avgt   50  173,092 ± 3,048  ns/op

//patched
Benchmark                             Mode  Cnt    Score   Error  Units
StringConstructorBenchmark.newString  avgt   50  126,908 ± 2,355  ns/op

The same is observed in String.translateEscapes() for the same String as in the benchmark above:

//baseline
Benchmark                                    Mode  Cnt   Score   Error  Units
StringConstructorBenchmark.translateEscapes  avgt  100  53,627 ± 0,850  ns/op

//patched
Benchmark                                    Mode  Cnt   Score   Error  Units
StringConstructorBenchmark.translateEscapes  avgt  100  48,087 ± 1,129  ns/op
Comments
Changeset: 0dba1707 Author: Roland Westrelin <roland@openjdk.org> Date: 2022-01-27 08:44:58 +0000 URL: https://git.openjdk.java.net/jdk/commit/0dba1707910734d03c318424764b8682b028a8e0
27-01-2022

A pull request was submitted for review. URL: https://git.openjdk.java.net/jdk/pull/7034 Date: 2022-01-11 16:06:27 +0000
11-01-2022

I looked at this. As I mentioned in the PR, the benchmark is broken. With the benchmark fixed I get: StringBenchmark.safeDecoding avgt 5 142.667 ± 52.041 ns/op StringBenchmark.unsafeDecoding avgt 5 123.170 ± 77.454 ns/op The loop has 2 backedges. One is frequently taken, the other is not. The loop head is cloned for the least frequent backedge by ciTypeFlow. C2 then builds an outer Loop with the most frequently taken backedge and an inner counted loop with the least frequently taken backedge. Preventing ciTypeFlow from cloning any loop head causes c2 to split the loop head with 2 backedges and to pick the most frequent backedge for the inner loop, which becomes a counted loop and is fully optimized. That leads to the following performance numbers: StringBenchmark.safeDecoding avgt 5 65.404 ± 0.723 ns/op StringBenchmark.unsafeDecoding avgt 5 73.004 ± 12.800 ns/op So it would seem that in the case of multiple backedges, ciTypeFlow should not get in the way and leave c2 choose the inner loop.
21-12-2021

My guess is that it is related to how checkBoundsOffCount() checks for offset < 0: if ((length | fromIndex | size) < 0 || size > length - fromIndex) using binary OR to combine three values.
17-12-2021

Reassigned to hotspot/compiler
17-12-2021