Bug ID: JDK-8269230 C2: main loop in micro benchmark never executed

Versions (Unresolved/Resolved/Fixed)

The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.

JDK 17	JDK 18
17 b30Fixed	18Fixed

The attached benchmark has interesting results:

```
Benchmark                               (size)  Mode  Cnt    Score   Error  Units
TestLoadBytes.arrayScalar                 1024  avgt   10  241.256 ? 1.028  ns/op
TestLoadBytes.arrayScalarConst            1024  avgt   10  244.251 ? 5.218  ns/op
TestLoadBytes.bufferNativeScalar          1024  avgt   10  262.128 ? 1.251  ns/op
TestLoadBytes.bufferNativeScalarConst     1024  avgt   10  250.552 ? 2.710  ns/op
TestLoadBytes.segmentNativeScalar         1024  avgt   10  722.670 ? 6.427  ns/op
TestLoadBytes.segmentNativeScalarConst    1024  avgt   10  253.419 ? 3.043  ns/op
```

Access using segment is almost 4x slower than using byte buffers. When investigating the generated compiled code, it seems like all the time is spent in the post-loop, and that the main loop (which seems to unroll correctly) is never executed.

Changeset: c67a7b03 Author: Roland Westrelin <roland@openjdk.org> Date: 2021-07-01 07:41:22 +0000 URL: https://git.openjdk.java.net/jdk17/commit/c67a7b039de0dbb379123fb49780ae5b246dcf74
01-07-2021
I agree that the micro benchmark should be added, possibly as part of the PR.
30-06-2021
I don't see this test in test/micro/org/openjdk/bench/jdk/incubator/foreign/ I think we should add it as part of PR.
30-06-2021
Paul or Maurizio can you point me to code in foreign memory API which produces this code pattern? NM, you attached the test case.
29-06-2021
Okay, I got it. Thank you for providing this information because without it it looks like corner case. I will do review of the PR.
29-06-2021
I can attempt a fix in Java, but I somewhat disagree that this is not a bug - in the sense that C2 is effectively generating dead code, which seems buggy behavior. As for this being a corner case, well, I can understand, except that, as [~psandoz] said, checks like these are the basis of workarounds we have all over the implementation of the Panama memory access API, which are required because we don't have full support for RCE over long loops [1] - so our API is kind of stuck between two bad places at the moment. [1] - https://github.com/openjdk/jdk/pull/2045
29-06-2021
[~psandoz] Yes, if you can find a way to fix it in Java it would be nice. The pattern `(i < max_jint && i > min_jint)`, where `i` is Integer, is very confusing for RCE (Range check elimination) code and we may need time to investigate and fix it.
29-06-2021
[~kvn] it's not so rare because it affects loops over memory segments of the foreign memory API, which underneath works around the current C2 code generation limitations for long loops. Perhaps there is another way to work around this in Java?
29-06-2021
This is performance issue for very very corner case: for (int i = 0; i < limit; i++) { if (!(i < max_jint && i > min_jint)) { uncommon_trap(); } } The suggested fix may affect other cases. And this is not a bug IMHO - it is RFE which we should spend more time to investigate. I would suggest to defer it to JDK 18.
29-06-2021
[~mcimadamore], this should affect 18 as well, correct?
23-06-2021
ILW = suboptimal code, one benchmark, no workaround = MMH = P3
23-06-2021
[~roland], could you take a look at this?
23-06-2021

Relates :	JDK-8308660 - C2 compilation hits 'node must be dead' assert
Relates :	JDK-8272372 - Performance regression in memory access API