JDK-8348096 : C2 SuperWord: investigate failed RCE for MemorySegment test
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 25
  • Priority: P4
  • Status: Open
  • Resolution: Unresolved
  • Submitted: 2025-01-20
  • Updated: 2025-04-24
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdUnresolved
Related Reports
Relates :  
Relates :  
Relates :  
Description
In the attached Test.java, there are 2 methods, one that loads things via arguments (vectorizes) and one that loads things via fields (does not vectorize).

It seems the issue is missing RCE. Investigate the difference.
Is a case related to JDK-8331659 and JDK-8347545.

emanuel@emanuel-oracle:/oracle-work/jdk-fork1/build/linux-x64-debug/jdk/bin$ ./java -XX:CompileCommand=compileonly,Test::test* -XX:CompileCommand=printcompilation,Test::* -XX:+TraceNewVectors -XX:CompileCommand=TraceAutoVectorization,Test::test*,PRECONDITIONS Test.java
CompileCommand: compileonly Test.test* bool compileonly = true
CompileCommand: PrintCompilation Test.* bool PrintCompilation = true
CompileCommand: TraceAutoVectorization Test.test* const char* TraceAutoVectorization = 'PRECONDITIONS'
9301  107 %     3       Test::testFields @ 6 (63 bytes)
9322  108       3       Test::testFields (63 bytes)
9451  109 %     3       Test::testArgs @ 2 (56 bytes)
9471  110       3       Test::testArgs (56 bytes)
9482  111 %     4       Test::testFields @ 6 (63 bytes)
9505  114 %     4       Test::testArgs @ 2 (56 bytes)
9525  115       4       Test::testFields (63 bytes)

VLoop::check_preconditions
          Loop: N4056/N1701  limit_check counted [int,int),+8 (38044 iters)  main rc  has_sfpt rce strip_mined
 4056  CountedLoop  === 4056 3348 1701  [[ 4056 4058 4059 4067 4068 ]] inner stride: 8 main of N4056 strip mined !orig=[3349],[3035],[2749],[2378],[2243],[2142],[1982],[1876],[1793],[1749],[138] !jvms: Test::testFields @ bci:19 (line 20)
VLoop::check_preconditions: fails because of control flow.
  cl_exit 3344 3344  CountedLoopEnd  === 2248 3343  [[ 4077 1701 ]] [lt] P=0.999974, C=114000.000000 !orig=[3036],[3010],[2746]
  cl_exit->in(0) 2248 2248  IfTrue  === 2247  [[ 2230 2225 2223 1609 882 3344 2135 2137 ]] #1 !orig=[2152],[1322],[1596] !jvms: AbstractMemorySegmentImpl::checkBounds @ bci:16 (line 382) AbstractMemorySegmentImpl::checkAccess @ bci:9 (line 336) AbstractMemorySegmentImpl::checkEnclosingLayout @ bci:10 (line 341) VarHandleSegmentAsBytes::checkSegment @ bci:18 (line 83) VarHandleSegmentAsBytes::set @ bci:10 (line 113) 0x00000000651b0400::invokeStatic @ bci:20 0x00000000651b1400::invoke @ bci:37 VarHandleGuards::guard_LJI_V @ bci:134 (line 1017) AbstractMemorySegmentImpl::set @ bci:10 (line 679) Test::testFields @ bci:50 (line 22)
  lpt->_head 4056 4056  CountedLoop  === 4056 3348 1701  [[ 4056 4058 4059 4067 4068 ]] inner stride: 8 main of N4056 strip mined !orig=[3349],[3035],[2749],[2378],[2243],[2142],[1982],[1876],[1793],[1749],[138] !jvms: Test::testFields @ bci:19 (line 20)
          Loop: N4056/N1701  limit_check counted [int,int),+8 (38044 iters)  main rc  has_sfpt rce strip_mined
VLoop::check_preconditions: failed: control flow in loop not allowed
9632  116       4       Test::testArgs (56 bytes)

VLoop::check_preconditions
      Loop: N2221/N1615  limit_check counted [int,int),+2 (45467 iters)  main has_sfpt strip_mined
 2221  CountedLoop  === 2221 1946 1615  [[ 2214 2217 2220 2221 857 2225 1559 1942 ]] inner stride: 2 main of N2221 strip mined !orig=[1947],[1756],[1655],[1650],[132] !jvms: Test::testArgs @ bci:13 (line 29)

VLoop::check_preconditions
      Loop: N3072/N1615  limit_check counted [int,int),+64 (45467 iters)  main has_sfpt strip_mined
 3072  CountedLoop  === 3072 1946 1615  [[ 2879 2881 2882 2884 2885 2887 2888 2890 2891 2893 2894 2896 2897 2899 2900 2902 2903 2905 2906 2908 2909 2911 2912 2914 2915 2917 2918 2920 2921 2923 2924 2926 2927 2929 2930 2932 2933 2935 2936 2938 2939 2941 2942 2944 2945 2947 2948 2950 2952 2954 2955 2957 2958 2960 2961 2963 2964 2966 2967 2969 2970 2972 2973 3036 3071 3072 2655 3075 2657 2658 2660 2661 2663 2664 2666 2667 2669 2670 2672 2673 2675 2676 2678 2680 2682 2683 2685 2686 2688 2689 2691 2692 2694 2695 2697 2698 2700 2701 2732 2217 2510 2512 2513 2515 2516 2518 2519 2521 2522 2524 2525 2527 2528 2530 2531 2547 1942 2405 2407 2408 2410 2411 2413 2414 2422 857 2318 2320 2321 2325 1559 2214 ]] inner stride: 64 main of N3072 strip mined !orig=[2752],[2559],[2425],[2328],[2221],[1947],[1756],[1655],[1650],[132] !jvms: Test::testArgs @ bci:13 (line 29)
TraceNewVectors [AutoVectorization]:  3262  Replicate  === _ 62  [[ ]]  #vectorz<B,64>
TraceNewVectors [AutoVectorization]:  3263  LoadVector  === 3072 3075 3037  [[ ]]  @rawptr:BotPTR, idx=Raw; mismatched #vectorz<B,64> (does not depend only on test, unknown control)
TraceNewVectors [AutoVectorization]:  3264  AddVB  === _ 3263 3262  [[ ]]  #vectorz<B,64>
TraceNewVectors [AutoVectorization]:  3265  StoreVector  === 3072 3075 3037 3264  [[ ]]  @rawptr:BotPTR, idx=Raw; mismatched  Memory: @rawptr:BotPTR, idx=Raw;

VLoop::check_preconditions
      Loop: N4148/N1615  limit_check counted [int,int),+4096 (45467 iters)  main vector has_sfpt strip_mined
 4148  CountedLoop  === 4148 1946 1615  [[ 3988 3990 3991 3993 3994 3996 3997 3999 4000 4002 4003 4005 4006 4008 4009 4011 4012 4014 4015 4017 4018 4020 4021 4023 4024 4026 4027 4029 4030 4032 4033 4035 4036 4038 4039 4041 4042 4044 4045 4047 4048 4050 4051 4053 4054 4056 4057 4059 4060 4062 4063 4065 4066 4068 4069 4071 4072 4074 4075 4077 4078 4080 4081 4145 4148 3759 4151 4184 3761 3762 3764 3765 3767 3768 3770 3771 3773 3774 3776 3777 3779 3780 3782 3783 3785 3786 3788 3789 3791 3792 3794 3795 3797 3798 3800 3801 3803 3804 3836 3330 3610 3612 3613 3615 3616 3618 3619 3621 3622 3624 3625 3627 3628 3630 3631 3647 3263 3501 3503 3504 3506 3507 3509 3510 3518 1942 3412 3414 3415 3419 3265 3328 ]] inner stride: 4096 main of N4148 strip mined !orig=[3839],[3650],[3521],[3422],[3333],[3072],[2752],[2559],[2425],[2328],[2221],[1947],[1756],[1655],[1650],[132] !jvms: Test::testArgs @ bci:13 (line 29)
VLoop::check_preconditions: failed: loop already vectorized

Comments
An obvious fix is to take into account loop level when doing PhaseIdealLoop::split_thru_phi, i.e. don't count the win in the outer loop level. This still leaves some patterns such as `i - 1` but it is a harder decision to call because in those cases the split does reduce some work while splitting `| 7` through `Phi(0, iv + 16)` does absolutely nothing.
31-01-2025

A possibly related issue with ops getting pushed through phi: test/hotspot/jtreg/compiler/vectorization/TestPopulateIndex.java for (int i = 0; i < count; i++) { dst[i] = src[i] * (i | 7); } Looks like the OrI gets pushed through the phi, which destroys the pattern we try to match for.
31-01-2025

The Phi has been altered by "PhaseIdealLoop::split_thru_phi", as expected. But there seem to be some special exceptions in "split_thru_phi", like "split_thru_phi_could_prevent_vectorization". Maybe we are missing an exception in this case too? 1) At this point, the loop is still LoopNode, and not CountedLoop, so we don't skip it for that reason. 2) "split_thru_phi_could_prevent_vectorization": Only works for CountedLoop too. 3) There is a special check like this: // Do not clone the trip counter through on a CountedLoop // (messes up the canonical shape). if (((n_blk->is_CountedLoop() || (n_blk->is_Loop() && n_blk->as_Loop()->is_loop_nest_inner_loop())) && n->Opcode() == Op_AddI) || (n_blk->is_LongCountedLoop() && n->Opcode() == Op_AddL)) { return n; } Why does it fail? Well it is not yet a counted loop. And also not is_loop_nest_inner_loop. Though it will be later... Wow this looks very fragile! What we have: n_blk: 1749 Loop === 1749 133 1701 [[ 1749 1714 141 155 ]] inner !orig=[138] !jvms: Test::testFields @ bci:19 (line 29) n: 145 AddL === _ 141 144 [[ 855 813 158 175 175 576 576 870 568 562 562 562 562 581 576 576 ]] !jvms: Test::testFields @ bci:22 (line 29) More details: 24 LoadI === _ 7 23 [[ 1668 59 1635 1599 925 106 117 128 1326 144 175 1313 1272 562 576 813 855 870 ]] @java/lang/Class (java/io/Serializable,java/lang/constant/Constable,java/lang/reflect/AnnotatedElement,java/lang/invoke/TypeDescriptor,java/lang/reflect/GenericDeclaration,java/lang/reflect/Type,java/lang/invoke/TypeDescriptor$OfField):exact+120 *, name=zeroInvarI, idx=4; #int !jvms: Test::testFields @ bci:0 (line 27) 1634 AddL === _ 141 483 [[ 1668 1635 1698 1668 141 ]] !jvms: Test::testFields @ bci:57 (line 28) 25 ConL === 0 [[ 1326 1313 59 59 1272 576 141 88 562 92 554 106 117 128 ]] #long:0 1749 Loop === 1749 133 1701 [[ 1749 1714 141 155 ]] inner !orig=[138] !jvms: Test::testFields @ bci:19 (line 29) 144 ConvI2L === _ 24 [[ 145 ]] #long:minint..maxint:www !jvms: Test::testFields @ bci:21 (line 29) 141 Phi === 1749 25 1634 [[ 1599 1634 145 175 1326 1313 562 576 813 855 870 925 1272 ]] #long !jvms: Test::testFields @ bci:19 (line 29) 145 AddL === _ 141 144 [[ 855 813 158 175 175 576 576 870 568 562 562 562 562 581 576 576 ]] !jvms: Test::testFields @ bci:22 (line 29) But again: why does the other case work here?
27-01-2025

Ok, I'm now trying to see why the "testArgs" creates an int-RangeCheck. I see that the CmpU (instead of CmpUL) is generated during "PhaseIdealLoop::create_loop_nest", in "PhaseIdealLoop::transform_long_range_checks". Now looking at "testFields": -> Looks like "range_checks" list is empty, even though we do have RangeChecks: (rr) p inner_head->dump_bfs(1,0,"-#c") dist dump --------------------------------------------- 0 1876 Loop === 1876 1852 1701 [[ 1876 1868 1714 141 570 1770 ]] !orig=[1793],[1749],[138] !jvms: Test::testFields @ bci:19 (line 29) 1 570 RangeCheck === 1876 569 [[ 571 574 ]] P=0.999999, C=-1.000000 !jvms: AbstractMemorySegmentImpl::checkBounds @ bci:16 (line 382) AbstractMemorySegmentImpl::checkAccess @ bci:9 (line 336) AbstractMemorySegmentImpl::checkEnclosingLayout @ bci:10 (line 341) VarHandleSegmentAsBytes::checkSegment @ bci:18 (line 83) VarHandleSegmentAsBytes::get @ bci:10 (line 104) 0x00000000091ae800::invokeStatic @ bci:18 0x00000000091b0000::invoke @ bci:35 VarHandleGuards::guard_LJ_I @ bci:80 (line 1002) AbstractMemorySegmentImpl::get @ bci:8 (line 673) Test::testFields @ bci:31 (line 30) Seems like in "PhaseIdealLoop::extract_long_range_checks" we collect the long range checks. The loop seems to contain this one: 570 RangeCheck === 1793 569 wiith 568 CmpUL === _ 1770 567 We look at it in the loop, check "is_range_check_if": -> returns false. exp = Phi(ConvI2L(LoadI of zeroInvarI), ..complicated..) #long I'm having a suspicion that the ConvI2L(LoadI) was moved past the phi, and this confuses the pattern matching logic. The iv-phi is: iv = Phi( 0L, iv + 1) exp is: invarL = ConvI2L(LoadI of zeroInvarI) exp = Phi(invarL, (pre_loop_phi + invarL) + 1L) Ok, let's go look again at the "testArgs" example to compare the form there.... Aha, here "is_scaled_iv_plus_offset" can return true, because the AddL has not been split through the Phi: (rr) p exp->dump_bfs(2,0,"#") dist dump --------------------------------------------- 2 11 Parm === 3 [[ 1584 138 551 53 1276 122 111 100 ]] Parm1: int !jvms: Test::testArgs @ bci:-1 (line 37) 2 1583 AddL === _ 135 454 [[ 1584 1652 135 ]] !orig=[1651],... !jvms: Test::testArgs @ bci:50 (line 37) 2 22 ConL === 0 [[ 1276 111 53 53 122 551 135 82 100 86 ]] #long:0 2 1655 LongCountedLoop === 1655 127 1615 [[ 1655 545 135 1628 ]] inner !orig=[1650],[132] !jvms: Test::testArgs @ bci:13 (line 38) 1 138 ConvI2L === _ 11 [[ 139 ]] #long:minint..maxint:www !jvms: Test::testArgs @ bci:15 (line 38) 1 135 Phi === 1655 22 1583 [[ 551 1583 139 1276 ]] #long:0..max-1:www #tripcount !jvms: Test::testArgs @ bci:13 (line 38) 0 139 AddL === _ 135 138 [[ 551 551 551 551 556 543 ]] !jvms: Test::testArgs @ bci:16 (line 38)
27-01-2025

Thanks [~qamai] for having a first look! Here is the graph of the RangeCheck in testArgs: 7 1777 MinL === _ 542 1772 [[ 1921 ]] !orig=[1776] 7 1964 AddL === _ 1873 138 [[ 1962 ]] !orig=[1759],[1791] !jvms: Test::testArgs @ bci:16 (line 38) 7 138 ConvI2L === _ 11 [[ 1970 1964 1962 1968 ]] #long:minint..maxint:www !orig=[1935],[1835],556 !jvms: Test::testArgs @ bci:15 (line 38) 6 22 ConL === 0 [[ 1782 111 53 53 122 551 1920 82 100 86 1734 1739 ]] #long:0 6 1921 MaxL === _ 1962 1777 [[ 1782 ]] !orig=1920,[1767] 6 1962 Phi === 1733 138 1964 [[ 1833 1877 1921 1770 1920 1785 1919 ]] #long !orig=[1759],[1791] !jvms: Test::testArgs @ bci:16 (line 38) 5 1920 MaxL === _ 1962 22 [[ 1785 1783 ]] !orig=[1767] 5 1782 MaxL === _ 1921 22 [[ 1783 ]] !orig=[1781] 5 1785 SubL === _ 1962 1920 [[ 1786 ]] !orig=[1816] 5 1939 AddI === _ 1749 62 [[ 1749 1940 ]] !orig=[1750] 5 62 ConI === 0 [[ 879 95 106 117 1749 1939 1906 1898 1890 1824 1929 ]] #int:1 4 1783 SubL === _ 1782 1920 [[ 1784 ]] !orig=[1814] 4 1786 ConvL2I === _ 1785 [[ 1788 1856 ]] #int !orig=[1817] 4 1749 Phi === 1947 62 1939 [[ 1939 1754 1788 ]] #int:1..max-3:www #tripcount !orig=[1787] 3 1784 ConvL2I === _ 1783 [[ 1789 1856 ]] #int:>=0:www !orig=[1815] 3 1788 AddI === _ 1749 1786 [[ 1789 ]] !orig=[1819] 2 1789 CmpU === _ 1788 1784 [[ 544 ]] !orig=[1820] 1 544 Bool === _ 1789 [[ 545 ]] [lt] !orig=[1270] !jvms: AbstractMemorySegmentImpl::checkBounds @ bci:16 (line 382) AbstractMemorySegmentImpl::checkAccess @ bci:9 (line 336) AbstractMemorySegmentImpl::checkEnclosingLayout @ bci:10 (line 341) VarHandleSegmentAsBytes::checkSegment @ bci:18 (line 83) VarHandleSegmentAsBytes::get @ bci:10 (line 104) 0x00000000b71ae800::invokeStatic @ bci:18 0x00000000b71b0000::invoke @ bci:35 VarHandleGuards::guard_LJ_I @ bci:80 (line 1002) AbstractMemorySegmentImpl::get @ bci:8 (line 673) Test::testArgs @ bci:25 (line 39) 0 545 RangeCheck === 1947 544 [[ 546 1865 ]] P=0.999999, C=-1.000000 !jvms: AbstractMemorySegmentImpl::checkBounds @ bci:16 (line 382) AbstractMemorySegmentImpl::checkAccess @ bci:9 (line 336) AbstractMemorySegmentImpl::checkEnclosingLayout @ bci:10 (line 341) VarHandleSegmentAsBytes::checkSegment @ bci:18 (line 83) VarHandleSegmentAsBytes::get @ bci:10 (line 104) 0x00000000b71ae800::invokeStatic @ bci:18 0x00000000b71b0000::invoke @ bci:35 VarHandleGuards::guard_LJ_I @ bci:80 (line 1002) AbstractMemorySegmentImpl::get @ bci:8 (line 673) Test::testArgs @ bci:25 (line 39) (rr) p iv->dump() 1749 Phi === 1947 62 1939 [[ 1939 1754 1788 ]] #int:1..max-3:www #tripcount !orig=[1787] (rr) p range->dump() 1784 ConvL2I === _ 1783 [[ 1789 1856 ]] #int:>=0:www !orig=[1815] (rr) p offset->dump() 1786 ConvL2I === _ 1785 [[ 1788 1856 ]] #int !orig=[1817] (rr) p scale 1 int iv = 0 Loop: int x = (iv + offset) int y = range RangeCheck(x <u y) iv = iv + 1 What is interesting: the long loop and long-rangecheck has been completely converted to an int-loop and int-rangecheck ------------------------------------------------- testFields 6 84 ConL === 0 [[ 85 ]] #long:16 6 55 CheckCastPP === 1717 44 [[ 85 85 576 576 576 1262 1262 1326 1326 1326 652 652 716 716 ]] #jdk/internal/foreign/NativeMemorySegmentImpl (java/lang/foreign/MemorySegment,java/lang/foreign/SegmentAllocator,java/util/function/BiFunction):NotNull:exact * Oop:jdk/internal/foreign/NativeMemorySegmentImpl (java/lang/foreign/MemorySegment,java/lang/foreign/SegmentAllocator,java/util/function/BiFunction):NotNull:exact * !orig=[172] !jvms: Test::testFields @ bci:10 (line 28) 6 3198 AddL === _ 3199 3332 [[ 3197 4208 ]] !orig=[2731],[2319],[2193],[1988] 6 4202 CastLL === 2246 1993 [[ 4208 ]] #long:-6442450922..max-14:www !orig=[4178],[4136],[4093],[2245],[2147],[581] !jvms: AbstractMemorySegmentImpl::checkBounds @ bci:16 (line 382) AbstractMemorySegmentImpl::checkAccess @ bci:9 (line 336) AbstractMemorySegmentImpl::checkEnclosingLayout @ bci:10 (line 341) VarHandleSegmentAsBytes::checkSegment @ bci:18 (line 83) VarHandleSegmentAsBytes::get @ bci:10 (line 104) 0x00000000971ae800::invokeStatic @ bci:18 0x00000000971b0000::invoke @ bci:35 VarHandleGuards::guard_LJ_I @ bci:80 (line 1002) AbstractMemorySegmentImpl::get @ bci:8 (line 673) Test::testFields @ bci:31 (line 30) 6 4058 Phi === 4056 3764 3341 [[ 4217 4057 4086 4087 4145 3341 4186 4213 4215 ]] #int:8..max-12:www #tripcount !orig=3028,[3067] 5 85 AddP === _ 55 55 84 [[ 86 ]] Oop:jdk/internal/foreign/NativeMemorySegmentImpl (java/lang/foreign/MemorySegment,java/lang/foreign/SegmentAllocator,java/util/function/BiFunction):NotNull:exact+16 * !orig=[548] !jvms: AbstractMemorySegmentImpl::byteSize @ bci:1 (line 210) Test::testFields @ bci:10 (line 28) 5 4208 AddL === _ 4202 3198 [[ 4138 ]] !orig=[4179],[4136],[4093],[2245],[2147],[581] !jvms: AbstractMemorySegmentImpl::checkBounds @ bci:16 (line 382) AbstractMemorySegmentImpl::checkAccess @ bci:9 (line 336) AbstractMemorySegmentImpl::checkEnclosingLayout @ bci:10 (line 341) VarHandleSegmentAsBytes::checkSegment @ bci:18 (line 83) VarHandleSegmentAsBytes::get @ bci:10 (line 104) 0x00000000971ae800::invokeStatic @ bci:18 0x00000000971b0000::invoke @ bci:35 VarHandleGuards::guard_LJ_I @ bci:80 (line 1002) AbstractMemorySegmentImpl::get @ bci:8 (line 673) Test::testFields @ bci:31 (line 30) 5 4057 ConvI2L === _ 4058 [[ 4045 4047 4055 4138 ]] #long:8..maxint-12:www !orig=[3292] !jvms: AbstractMemorySegmentImpl::checkBounds @ bci:16 (line 382) AbstractMemorySegmentImpl::checkAccess @ bci:9 (line 336) AbstractMemorySegmentImpl::checkEnclosingLayout @ bci:10 (line 341) VarHandleSegmentAsBytes::checkSegment @ bci:18 (line 83) VarHandleSegmentAsBytes::get @ bci:10 (line 104) 0x00000000971ae800::invokeStatic @ bci:18 0x00000000971b0000::invoke @ bci:35 VarHandleGuards::guard_LJ_I @ bci:80 (line 1002) AbstractMemorySegmentImpl::get @ bci:8 (line 673) Test::testFields @ bci:31 (line 30) 4 86 LoadL === _ 7 85 [[ 88 92 1942 567 1856 1855 ]] @jdk/internal/foreign/AbstractMemorySegmentImpl (java/lang/foreign/MemorySegment,java/lang/foreign/SegmentAllocator,java/util/function/BiFunction)+16 *, name=length, idx=7; #long !orig=[549] !jvms: AbstractMemorySegmentImpl::byteSize @ bci:1 (line 210) Test::testFields @ bci:10 (line 28) 4 2333 ConL === 0 [[ 3867 3969 2530 2569 2457 2804 2857 2908 3170 3196 3113 3395 3448 3499 3613 3637 3765 3789 4095 4054 4185 3918 ]] #long:4 4 4138 AddL === _ 4057 4208 [[ 4135 4095 ]] !orig=[4093],[2245],[2147],[581] !jvms: AbstractMemorySegmentImpl::checkBounds @ bci:16 (line 382) AbstractMemorySegmentImpl::checkAccess @ bci:9 (line 336) AbstractMemorySegmentImpl::checkEnclosingLayout @ bci:10 (line 341) VarHandleSegmentAsBytes::checkSegment @ bci:18 (line 83) VarHandleSegmentAsBytes::get @ bci:10 (line 104) 0x00000000971ae800::invokeStatic @ bci:18 0x00000000971b0000::invoke @ bci:35 VarHandleGuards::guard_LJ_I @ bci:80 (line 1002) AbstractMemorySegmentImpl::get @ bci:8 (line 673) Test::testFields @ bci:31 (line 30) 3 567 CastLL === 97 86 [[ 576 2489 1326 3121 1883 1885 2002 2004 2060 2062 2491 3141 2493 3143 2218 2220 2401 2433 2465 2485 2481 3057 3089 3588 3590 3740 3742 4029 4031 ]] #long:>=0:www !jvms: AbstractMemorySegmentImpl::checkBounds @ bci:16 (line 382) AbstractMemorySegmentImpl::checkAccess @ bci:9 (line 336) AbstractMemorySegmentImpl::checkEnclosingLayout @ bci:10 (line 341) VarHandleSegmentAsBytes::checkSegment @ bci:18 (line 83) VarHandleSegmentAsBytes::get @ bci:10 (line 104) 0x00000000971ae800::invokeStatic @ bci:18 0x00000000971b0000::invoke @ bci:35 VarHandleGuards::guard_LJ_I @ bci:80 (line 1002) AbstractMemorySegmentImpl::get @ bci:8 (line 673) Test::testFields @ bci:31 (line 30) 3 4095 AddL === _ 4138 2333 [[ 4089 2220 ]] !orig=[2245],[2147],[581] !jvms: AbstractMemorySegmentImpl::checkBounds @ bci:16 (line 382) AbstractMemorySegmentImpl::checkAccess @ bci:9 (line 336) AbstractMemorySegmentImpl::checkEnclosingLayout @ bci:10 (line 341) VarHandleSegmentAsBytes::checkSegment @ bci:18 (line 83) VarHandleSegmentAsBytes::get @ bci:10 (line 104) 0x00000000971ae800::invokeStatic @ bci:18 0x00000000971b0000::invoke @ bci:35 VarHandleGuards::guard_LJ_I @ bci:80 (line 1002) AbstractMemorySegmentImpl::get @ bci:8 (line 673) Test::testFields @ bci:31 (line 30) 2 2220 CmpUL === _ 4095 567 [[ 2219 ]] !orig=[2132],[1319] !jvms: AbstractMemorySegmentImpl::checkBounds @ bci:16 (line 382) AbstractMemorySegmentImpl::checkAccess @ bci:9 (line 336) AbstractMemorySegmentImpl::checkEnclosingLayout @ bci:10 (line 341) VarHandleSegmentAsBytes::checkSegment @ bci:18 (line 83) VarHandleSegmentAsBytes::set @ bci:10 (line 113) 0x00000000971b0400::invokeStatic @ bci:20 0x00000000971b1400::invoke @ bci:37 VarHandleGuards::guard_LJI_V @ bci:134 (line 1017) AbstractMemorySegmentImpl::set @ bci:10 (line 679) Test::testFields @ bci:50 (line 31) 1 2219 Bool === _ 2220 [[ 2247 ]] [lt] !orig=[2131],[1320] !jvms: AbstractMemorySegmentImpl::checkBounds @ bci:16 (line 382) AbstractMemorySegmentImpl::checkAccess @ bci:9 (line 336) AbstractMemorySegmentImpl::checkEnclosingLayout @ bci:10 (line 341) VarHandleSegmentAsBytes::checkSegment @ bci:18 (line 83) VarHandleSegmentAsBytes::set @ bci:10 (line 113) 0x00000000971b0400::invokeStatic @ bci:20 0x00000000971b1400::invoke @ bci:37 VarHandleGuards::guard_LJI_V @ bci:134 (line 1017) AbstractMemorySegmentImpl::set @ bci:10 (line 679) Test::testFields @ bci:50 (line 31) 0 2247 RangeCheck === 2246 2219 [[ 2248 4074 ]] P=0.999999, C=-1.000000 !orig=[2151],[1321] !jvms: AbstractMemorySegmentImpl::checkBounds @ bci:16 (line 382) AbstractMemorySegmentImpl::checkAccess @ bci:9 (line 336) AbstractMemorySegmentImpl::checkEnclosingLayout @ bci:10 (line 341) VarHandleSegmentAsBytes::checkSegment @ bci:18 (line 83) VarHandleSegmentAsBytes::set @ bci:10 (line 113) 0x00000000971b0400::invokeStatic @ bci:20 0x00000000971b1400::invoke @ bci:37 VarHandleGuards::guard_LJI_V @ bci:134 (line 1017) AbstractMemorySegmentImpl::set @ bci:10 (line 679) Test::testFields @ bci:50 (line 31) Looks like we have an int loop, but the RangeCheck is still a long-rangecheck. RangeCheck(adr <uL byteSize) This is the trip-count: 4058 Phi === 4056 3764 3341 [[ 4217 4057 4086 4087 4145 3341 4186 4213 4215 ]] #int:8..max-12:www #tripcount !orig=3028,[3067] I'll have to investigate more...
23-01-2025

The reason is that the range check has the form: int i0 = LoadI(zeroInvarI); long l0 = ConvI2L(i0); LongCountedLoop { long lhs = Phi(l0, (iv + l0) + 1); rangecheck(lhs, rhs); } This lhs does not have the correct form to be recognized by IdealLoopTree::is_range_check_if so that it can be transformed to a CmpU when the LongCountedLoop is transformed into a loop nest. Either, we can be more versatile when recognizing is_range_check_if pattern, or to try to make the pattern of testFields to be the same as of testArgs, which is: long lhs = l0 + iv (which is l0 + Phi(0, iv + 1))
20-01-2025

./java -XX:CompileCommand=compileonly,Test::test* -XX:CompileCommand=printcompilation,Test::* -XX:+TraceNewVectors -XX:CompileCommand=TraceAutoVectorization,Test::test*,PRECONDITIONS -XX:+TraceLoopOpts -XX:-UseOnStackReplacement Test.java I can see that the testArgs manages to do this before PreMainPost, and testFields does not do it: Predicate RC Loop: N1947/N1615 limit_check profile_predicated predicated counted [1,int),+1 (40041 iters) has_sfpt rce strip_mined 49647 109 4 Test::testFields (63 bytes) Loop: N0/N0 has_sfpt Loop: N1749/N1701 limit_check profile_predicated predicated sfpts={ 1635 } Counted Loop: N1793/N1701 limit_check profile_predicated predicated sfpts={ 1635 } Loop: N0/N0 has_sfpt Loop: N1793/N1701 limit_check profile_predicated predicated sfpts={ 1635 } Predicate IC Loop: N1793/N1701 limit_check profile_predicated predicated sfpts={ 1635 } Predicate IC Loop: N1793/N1701 limit_check profile_predicated predicated sfpts={ 1635 } Predicate IC Loop: N1793/N1701 limit_check profile_predicated predicated sfpts={ 1635 } Predicate IC Loop: N1793/N1701 limit_check profile_predicated predicated sfpts={ 1635 } Predicate IC Loop: N1793/N1701 limit_check profile_predicated predicated sfpts={ 1635 } Predicate IC Loop: N1793/N1701 limit_check profile_predicated predicated sfpts={ 1635 } Loop: N0/N0 has_sfpt Loop: N1793/N1701 limit_check profile_predicated predicated sfpts={ 1635 } Peel Loop: N1876/N1701 sfpts={ 1635 } Exceeding node budget: 0 < 103 Counted Loop: N1982/N1701 counted [1,int),+1 (-1 iters) Loop: N0/N0 has_sfpt Loop: N1852/N1849 limit_check profile_predicated predicated sfpts={ 1895 } Loop: N1981/N1980 limit_check profile_predicated predicated Loop: N1982/N1701 limit_check profile_predicated predicated counted [1,int),+1 (-1 iters) has_sfpt strip_mined PreMainPost Loop: N1982/N1701 limit_check profile_predicated predicated counted [1,int),+1 (40041 iters) has_sfpt strip_mined vs 49718 112 4 Test::testArgs (56 bytes) Counted Loop: N1655/N1615 limit_check profile_predicated predicated sfpts={ 1584 } Loop: N0/N0 has_sfpt Loop: N1655/N1615 limit_check profile_predicated predicated sfpts={ 1584 } Loop: N0/N0 has_sfpt Loop: N1655/N1615 limit_check profile_predicated predicated sfpts={ 1584 } Predicate IC Loop: N1655/N1615 limit_check profile_predicated predicated sfpts={ 1584 } Predicate IC Loop: N1655/N1615 limit_check profile_predicated predicated sfpts={ 1584 } Predicate IC Loop: N1655/N1615 limit_check profile_predicated predicated sfpts={ 1584 } Predicate IC Loop: N1655/N1615 limit_check profile_predicated predicated sfpts={ 1584 } Predicate IC Loop: N1655/N1615 limit_check profile_predicated predicated sfpts={ 1584 } Loop: N0/N0 has_sfpt Loop: N1655/N1615 limit_check profile_predicated predicated sfpts={ 1584 } Peel Loop: N1756/N1615 sfpts={ 1584 } Exceeding node budget: 0 < 152 Counted Loop: N1947/N1615 counted [1,int),+1 (-1 iters) Loop: N0/N0 has_sfpt Loop: N1733/N1730 limit_check profile_predicated predicated sfpts={ 1838 } Loop: N1946/N1945 limit_check profile_predicated predicated Loop: N1947/N1615 limit_check profile_predicated predicated counted [1,int),+1 (-1 iters) has_sfpt strip_mined Predicate RC Loop: N1947/N1615 limit_check profile_predicated predicated counted [1,int),+1 (40041 iters) has_sfpt rce strip_mined Loop: N0/N0 has_sfpt Loop: N1733/N1730 limit_check profile_predicated predicated sfpts={ 1838 } Loop: N1946/N1945 limit_check profile_predicated predicated sfpts={ 1948 } Loop: N1947/N1615 limit_check profile_predicated predicated counted [1,int),+1 (40041 iters) has_sfpt strip_mined PreMainPost Loop: N1947/N1615 limit_check profile_predicated predicated counted [1,int),+1 (40041 iters) has_sfpt strip_mined
20-01-2025