JDK-8272372 : Performance regression in memory access API
  • Type: Bug
  • Component: core-libs
  • Affected Version: 17
  • Priority: P3
  • Status: Resolved
  • Resolution: Won't Fix
  • Submitted: 2021-08-12
  • Updated: 2021-12-08
  • Resolved: 2021-12-08
Related Reports
Relates :  
Relates :  
Relates :  
Description
Following the correctness fix in JDK-8266371, a number of performance regressions have been observed throughout the memory access benchmarks. All the regression have some characteristics in common:

* they disappear when tiered compilation is enabled
* they disappear when small segment optimization is disabled
* they exhibit as the benchmark starting off fast, but then suddenly slowing down by 4x or so (probably the effect of a bad recompilation).

The most affected benchmark is UnrolledAccess.handle_loop which is 5x slower than UnrolledAccess.handle_loop_static.

The fix for 8269230 should have helped, but it didn't fix all cases.

Until we can remove small segment optimizations, the more reliable fix is to use `<=` instead of `<` and `>=` instead of `>` inside AbstractMemorySegment::checkBounds:

diff --git a/src/jdk.incubator.foreign/share/classes/jdk/internal/foreign/AbstractMemorySegmentImpl.java b/src/jdk.incubator.foreign/share/classes/jdk/internal/foreign/AbstractMemorySegmentImpl.java
index de95a2c5d87..ecb918d778e 100644
--- a/src/jdk.incubator.foreign/share/classes/jdk/internal/foreign/AbstractMemorySegmentImpl.java
+++ b/src/jdk.incubator.foreign/share/classes/jdk/internal/foreign/AbstractMemorySegmentImpl.java
@@ -398,8 +398,8 @@ public abstract non-sealed class AbstractMemorySegmentImpl extends MemorySegment
 
     private void checkBounds(long offset, long length) {
         if (isSmall() &&
-                offset < Integer.MAX_VALUE && length < Integer.MAX_VALUE &&
-                offset > Integer.MIN_VALUE && length > Integer.MIN_VALUE) {
+                offset <= Integer.MAX_VALUE && length <= Integer.MAX_VALUE &&
+                offset >= Integer.MIN_VALUE && length >= Integer.MIN_VALUE) {
             checkBoundsSmall((int)offset, (int)length);
         } else {
             if (length < 0 ||

Comments
Fixing a regression in an incubating API is not high-priority enough to fix in an update release. Moreover, further real world validation revealed that the observed regression is not at all as pronounced as the benchmark shows. This is addressed in 18.
08-12-2021

Benchmark output - note how the first few iterations are significantly faster than the remaining ones: ``` # Run progress: 0.00% complete, ETA 00:00:22 # Fork: 1 of 3 WARNING: Using incubator modules: jdk.incubator.foreign # Warmup Iteration 1: 0.545 us/op # Warmup Iteration 2: 0.354 us/op # Warmup Iteration 3: 2.368 us/op # Warmup Iteration 4: 2.351 us/op # Warmup Iteration 5: 2.347 us/op Iteration 1: 2.349 us/op Iteration 2: 2.376 us/op Iteration 3: 2.375 us/op Iteration 4: 2.370 us/op Iteration 5: 2.357 us/op Iteration 6: 2.376 us/op Iteration 7: 2.382 us/op Iteration 8: 2.383 us/op Iteration 9: 2.378 us/op Iteration 10: 2.354 us/op # Run progress: 33.33% complete, ETA 00:00:15 # Fork: 2 of 3 WARNING: Using incubator modules: jdk.incubator.foreign # Warmup Iteration 1: 0.595 us/op # Warmup Iteration 2: 0.371 us/op # Warmup Iteration 3: 2.359 us/op # Warmup Iteration 4: 2.357 us/op # Warmup Iteration 5: 2.358 us/op Iteration 1: 2.352 us/op Iteration 2: 2.356 us/op Iteration 3: 2.351 us/op Iteration 4: 2.351 us/op Iteration 5: 2.351 us/op Iteration 6: 2.370 us/op Iteration 7: 2.364 us/op Iteration 8: 2.348 us/op Iteration 9: 2.350 us/op Iteration 10: 2.352 us/op # Run progress: 66.67% complete, ETA 00:00:07 # Fork: 3 of 3 WARNING: Using incubator modules: jdk.incubator.foreign # Warmup Iteration 1: 0.528 us/op # Warmup Iteration 2: 0.358 us/op # Warmup Iteration 3: 2.349 us/op # Warmup Iteration 4: 2.330 us/op # Warmup Iteration 5: 2.334 us/op Iteration 1: 2.331 us/op Iteration 2: 2.357 us/op Iteration 3: 2.352 us/op Iteration 4: 2.335 us/op Iteration 5: 2.333 us/op Iteration 6: 2.333 us/op Iteration 7: 2.341 us/op Iteration 8: 2.330 us/op Iteration 9: 2.329 us/op Iteration 10: 2.336 us/op ```
12-08-2021