JDK-8327209 : C2 MemorySegment: missing RCE and vectorization
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 23
  • Priority: P4
  • Status: Open
  • Resolution: Unresolved
  • Submitted: 2024-03-04
  • Updated: 2025-05-05
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdUnresolved
Related Reports
Relates :  
Relates :  
Sub Tasks
JDK-8356176 :  
JDK-8356184 :  
JDK-8356185 :  
Description
I run it with:
java -XX:CompileCommand=compileonly,Test3::* -XX:CompileCommand=printcompilation,Test3::* -XX:CompileCommand=TraceAutoVectorization,Test3::*,SW_REJECTIONS,PRECONDITIONS -XX:+TraceNewVectors -XX:+TraceLoopOpts -Xbatch Test3.java

We see that some of the cases vectorize, and some do not. One reason seems to be that we do not always remove the RangeChecks. Another failure is in SuperWord, where just don't create packs. Have not yet investigated what is wrong there.
Comments
I split this issue into subtasks since it contains problems with distinct causes.
05-05-2025

I found another case, not sure if it is related or not. java -Xbatch -XX:CompileCommand=compileonly,TestLoop::test* -XX:CompileCommand=printcompilation,TestLoop::test* -XX:+TraceNewVectors -XX:+TraceLoopOpts TestLoop.java The issue is with "test10". It never loses the "rc" marking, it never removes all RangeChecks from the main loop. That leads to this pattern: - Counted - PreMainPost - Unroll 4x - Peel (why???) - Main-loop re-discovered as Counted - PreMainPost again - Unroll 2x - Peel (why?) - PreMainPost ... And we end up with this nasty nested structure from a single initial loop: Loop: N0/N0 has_sfpt Loop: N2090/N2087 predicated sfpts={ 2170 } Loop: N2502/N2501 predicated counted [1,int),+1 (4 iters) pre rc Loop: N2851/N2848 limit_check sfpts={ 3056 } Loop: N3792/N3796 predicated counted [4,int),+4 (4 iters) pre rc Loop: N4251/N4248 limit_check sfpts={ 4395 } Loop: N4723/N4729 counted [8,int),+8 (4 iters) pre rc Loop: N4504/N4503 limit_check sfpts={ 4506 } Loop: N4505/N1855 limit_check counted [int,int),+8 (40920 iters) main rc has_sfpt strip_mined Loop: N4609/N4615 counted [int,int),+8 (4 iters) post rc Loop: N3560/N3564 limit_check counted [int,int),+4 (4 iters) post rc Loop: N2406/N2405 limit_check counted [int,int),+1 (4 iters) post rc In other cases, we get something like this (test11), with a nice loop-nest for long-iv, pre-main-post, and even vectorization: Loop: N0/N0 has_sfpt Loop: N2012/N2009 predicated sfpts={ 2123 } Loop: N2572/N2574 predicated counted [1,int),+1 (4 iters) pre Loop: N2260/N2259 limit_check sfpts={ 2262 } Loop: N3561/N1789 limit_check counted [int,int),+64 (40920 iters) main vector has_sfpt strip_mined Loop: N3162/N3165 limit_check counted [int,int),+8 (8 iters) post vector Loop: N2441/N2443 limit_check counted [int,int),+1 (4 iters) post
07-04-2025