JDK-8367158 : C2: create better fill and copy benchmarks, taking alignment into account
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 26
  • Priority: P4
  • Status: Open
  • Resolution: Unresolved
  • Submitted: 2025-09-09
  • Updated: 2025-11-26
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdUnresolved
Related Reports
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Description
First investigation into benchmarks done here:
https://github.com/openjdk/jdk/pull/26747#issuecomment-3269114783 / JDK-8365290.

It seems to me that people are making decisions about fill and copy intrinsics on benchmarks that are noisy and don't properly control for alignment - that can give us misleading results.

It turns out that we barely have any fill and copy benchmarks that really test automatic alignment.

We should also compare to auto-vectorization performance.

We should test Array.fill, System.arraycopy, but also some MemorySegment bulk operations. Then also compare to naive loops, both with intrinsics enabled and disabled: -XX:-OptimizeFill

Also look at JDK-8299808, and the discussion there.

We could take a similar approach as in JDK-8355094 with:
test/micro/org/openjdk/bench/vm/compiler/VectorAutoAlignment.java

We should also go through the benchmarks mentioned in
https://github.com/openjdk/jdk/pull/26747#issuecomment-3269114783
and see if they still behave as the comments in them suggest:
- alignment assumptions
- performance assumptions / comparison with SuperWord, especially after JDK-8324751.

This is also a really good way to better understand the performance of auto-vectorization (SuperWord) on small iteration counts. This is where the intrinsics are currently much better than auto-vectorization. See also JDK-8344085. But it is possible that auto-vectorization is actually faster with large iteration counts.

For MemorySegment, we already have:
- ./test/micro/org/openjdk/bench/java/lang/foreign/BulkOps.java
- ./test/micro/org/openjdk/bench/java/lang/foreign/SegmentBulkFill.java
- ./test/micro/org/openjdk/bench/java/lang/foreign/SegmentBulkCopy.java

We also should make sure to check fill for zero separately, some platforms are much faster when they zero out memory.

We should also check the impact of Lilliput / CompactObjectHeaders, as those change the alignment of some element types.

We should also benchmark Oop copy / fill. Auto-vectorization could pay off here too, though it would be harder because of GC barriers in the vectorized LoadP and StoreP.
./java -XX:CompileCommand=compileonly,TestOopCopy::copy* -XX:CompileCommand=printcompilation,TestOopCopy::copy* -Xbatch TestOopCopy.java
Comments
Draft: https://github.com/openjdk/jdk/pull/27315
31-10-2025