JDK-8269230 : C2: main loop in micro benchmark never executed
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 17,18,repo-panama
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2021-06-23
  • Updated: 2024-01-05
  • Resolved: 2021-07-01
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 17 JDK 18
17 b30Fixed 18Fixed
Related Reports
Relates :  
Relates :  
Description
The attached benchmark has interesting results:

```
Benchmark                               (size)  Mode  Cnt    Score   Error  Units
TestLoadBytes.arrayScalar                 1024  avgt   10  241.256 ? 1.028  ns/op
TestLoadBytes.arrayScalarConst            1024  avgt   10  244.251 ? 5.218  ns/op
TestLoadBytes.bufferNativeScalar          1024  avgt   10  262.128 ? 1.251  ns/op
TestLoadBytes.bufferNativeScalarConst     1024  avgt   10  250.552 ? 2.710  ns/op
TestLoadBytes.segmentNativeScalar         1024  avgt   10  722.670 ? 6.427  ns/op
TestLoadBytes.segmentNativeScalarConst    1024  avgt   10  253.419 ? 3.043  ns/op
```

Access using segment is almost 4x slower than using byte buffers. When investigating the generated compiled code, it seems like all the time is spent in the post-loop, and that the main loop (which seems to unroll correctly) is never executed.
Comments
Changeset: c67a7b03 Author: Roland Westrelin <roland@openjdk.org> Date: 2021-07-01 07:41:22 +0000 URL: https://git.openjdk.java.net/jdk17/commit/c67a7b039de0dbb379123fb49780ae5b246dcf74
01-07-2021

I agree that the micro benchmark should be added, possibly as part of the PR.
30-06-2021

I don't see this test in test/micro/org/openjdk/bench/jdk/incubator/foreign/ I think we should add it as part of PR.
30-06-2021

Paul or Maurizio can you point me to code in foreign memory API which produces this code pattern? NM, you attached the test case.
29-06-2021

Okay, I got it. Thank you for providing this information because without it it looks like corner case. I will do review of the PR.
29-06-2021

I can attempt a fix in Java, but I somewhat disagree that this is not a bug - in the sense that C2 is effectively generating dead code, which seems buggy behavior. As for this being a corner case, well, I can understand, except that, as [~psandoz] said, checks like these are the basis of workarounds we have all over the implementation of the Panama memory access API, which are required because we don't have full support for RCE over long loops [1] - so our API is kind of stuck between two bad places at the moment. [1] - https://github.com/openjdk/jdk/pull/2045
29-06-2021

[~psandoz] Yes, if you can find a way to fix it in Java it would be nice. The pattern `(i < max_jint && i > min_jint)`, where `i` is Integer, is very confusing for RCE (Range check elimination) code and we may need time to investigate and fix it.
29-06-2021

[~kvn] it's not so rare because it affects loops over memory segments of the foreign memory API, which underneath works around the current C2 code generation limitations for long loops. Perhaps there is another way to work around this in Java?
29-06-2021

This is performance issue for very very corner case: for (int i = 0; i < limit; i++) { if (!(i < max_jint && i > min_jint)) { uncommon_trap(); } } The suggested fix may affect other cases. And this is not a bug IMHO - it is RFE which we should spend more time to investigate. I would suggest to defer it to JDK 18.
29-06-2021

[~mcimadamore], this should affect 18 as well, correct?
23-06-2021

ILW = suboptimal code, one benchmark, no workaround = MMH = P3
23-06-2021

[~roland], could you take a look at this?
23-06-2021