JDK-8313689 : C2: compiler/c2/irTests/scalarReplacement/AllocationMergesTests.java fails intermittently with -XX:-TieredCompilation
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 22
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2023-08-03
  • Updated: 2024-01-02
  • Resolved: 2023-08-22
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 22
22 b12Fixed
Related Reports
Relates :  
Description
On some machines (x64 and aarch), compiler/c2/irTests/scalarReplacement/AllocationMergesTests.java is failing due to allocations that could not have been removed with -XX:-TieredCompilation

Playing around with different warm-ups (i.e. -DWarmup=1000,2000,10000 etc.) I get a different amount of failures. This suggests that on some machines, the number of warm-ups is enough for the test to work while on others it's not. However, when choosing a very high number of warm-ups (i.e. 10000), I even got 5 failures. 

We should check the root cause of being unable to remove allocations with a different number of warm-up iterations and fix the test/code accordingly.

Output:

Compilation of Failed Method
----------------------------
1) Compilation of "int compiler.c2.irTests.scalarReplacement.AllocationMergesTests.testNoEscapeWithLoadInLoop_C2(boolean,int,int)":
> Phase "PrintOptoAssembly":
----------------------- MetaData before Compile_id = 374 ------------------------
{method}
 - this oop:          0x00007f1a134210d8
 - method holder:     'compiler/c2/irTests/scalarReplacement/AllocationMergesTests'
 - constants:         0x00007f1a1341c000 constant pool [641] {0x00007f1a1341c000} for 'compiler/c2/irTests/scalarReplacement/AllocationMergesTests' cache=0x00007f1a13424780
 - access:            0x0  
 - flags:             0x5080   queued_for_compilation  dont_inline  has_loops_flag_init 
 - name:              'testNoEscapeWithLoadInLoop_C2'
 - signature:         '(ZII)I'
 - max stack:         5
 - max locals:        4
 - size of params:    4
 - method size:       14
 - vtable index:      13
 - i2i entry:         0x00007f1a98b50a40
 - adapters:          AHE@0x00007f1aa42061b0: 0xbaaa i2c: 0x00007f1a98bb8600 c2i: 0x00007f1a98bb86f7 c2iUV: 0x00007f1a98bb86c5 c2iNCI: 0x00007f1a98bb8731
 - compiled entry     0x00007f1a98bb86f7
 - code size:         8
 - code start:        0x00007f1a134210c0
 - code end (excl):   0x00007f1a134210c8
 - method data:       0x00007f1a13499d20
 - checked ex length: 0
 - linenumber start:  0x00007f1a134210c8
 - localvar length:   0

------------------------ OptoAssembly for Compile_id = 374 -----------------------
#
#  int ( compiler/c2/irTests/scalarReplacement/AllocationMergesTests:NotNull *, int, int, int )
#
000     N273: #	out( B1 ) <- BLOCK HEAD IS JUNK  Freq: 1
000     movl    rscratch1, [j_rarg0 + oopDesc::klass_offset_in_bytes()]	# compressed klass
	decode_klass_not_null rscratch1, rscratch1
	cmpq    rax, rscratch1	 # Inline cache check
	jne     SharedRuntime::_ic_miss_stub
	nop	# nops to align entry point

        nop 	# 4 bytes pad for loops and calls

020     B1: #	out( B12 B2 ) <- BLOCK HEAD IS JUNK  Freq: 1
020     # stack bang (304 bytes)
	pushq   rbp	# Save rbp
	subq    rsp, #64	# Create frame

03a     movl    [rsp + #12], R8	# spill
03f     movl    [rsp + #8], RCX	# spill
043     movq    [rsp + #0], RSI	# spill
047     movl    [rsp + #16], RDX	# spill
04b     testl   RDX, RDX
04d     je     B12  P=0.100000 C=-1.000000

053     B2: #	out( B13 B3 ) <- in( B1 )  Freq: 0.9
053     # TLS is in R15
053     movq    RAX, [R15 + #456 (32-bit)]	# ptr
05a     movq    R10, RAX	# spill
05d     addq    R10, #24	# ptr
061     cmpq    R10, [R15 + #472 (32-bit)]	# raw ptr
068     jae,u   B13  P=0.000100 C=-1.000000

06e     B3: #	out( B4 ) <- in( B2 )  Freq: 0.89991
06e     movq    [R15 + #456 (32-bit)], R10	# ptr
075     PREFETCHW [R10 + #192 (32-bit)]	# Prefetch allocation into level 1 cache and mark modified
07d     movq    [RAX], #1	# long
084     movl    [RAX + #8 (8-bit)], narrowklass: precise compiler/c2/irTests/scalarReplacement/AllocationMergesTests$Point: 0x00007f19e81bddc0:Constant:exact *	# compressed klass ptr
08b     movl    [RAX + #12 (8-bit)], R12	# int (R12_heapbase==0)
08f     movq    [RAX + #16 (8-bit)], R12	# long (R12_heapbase==0)

093     B4: #	out( B16 B5 ) <- in( B14 B3 )  Freq: 0.9
093     
093     MEMBAR-storestore (empty encoding)
093     movq    RBP, RAX	# spill
096     # checkcastPP of RBP
096     movq    RSI, RBP	# spill
099     movl    RDX, [rsp + #12]	# spill
09d     movl    RCX, [rsp + #8]	# spill
        nop 	# 2 bytes pad for loops and calls
0a3     call,static  compiler.c2.irTests.scalarReplacement.AllocationMergesTests$Point::<init>
        # compiler.c2.irTests.scalarReplacement.AllocationMergesTests::testNoEscapeWithLoadInLoop @ bci:24 (line 949) L[0]=rsp + #0 L[1]=rsp + #16 L[2]=rsp + #8 L[3]=rsp + #12 L[4]=#ScObj0 L[5]=#0 L[6]=_ STK[0]=RBP
        # ScObj0 compiler/c2/irTests/scalarReplacement/AllocationMergesTests$Point={ [x :0]=rsp + #8, [y :1]=rsp + #12 }
        # compiler.c2.irTests.scalarReplacement.AllocationMergesTests::testNoEscapeWithLoadInLoop_C2 @ bci:4 (line 962) L[0]=rsp + #0 L[1]=rsp + #16 L[2]=rsp + #8 L[3]=rsp + #12
        # OopMap {rbp=Oop [0]=Oop off=168/0xa8}

0b0     B5: #	out( B6 ) <- in( B4 )  Freq: 0.899982
        # Block is sole successor of call
0b0     movl    R10, [RBP + #16 (8-bit)]	# int ! Field: compiler/c2/irTests/scalarReplacement/AllocationMergesTests$Point.y
0b4     movl    RAX, [RBP + #12 (8-bit)]	# int ! Field: compiler/c2/irTests/scalarReplacement/AllocationMergesTests$Point.x

0b7     B6: #	out( B8 ) <- in( B5 B12 )  Freq: 0.999982
0b7     leal    R11, [RAX + R10]
0bb     leal    R8, [R11 + #3342]
0c2     movl    R9, #3343	# int
0c8     jmp,s   B8
        nop 	# 6 bytes pad for loops and calls

0d0     B7: #	out( B8 ) <- in( B8 ) top-of-loop Freq: 903.513
0d0     movl    R9, RCX	# spill

0d3     B8: #	out( B7 B9 ) <- in( B6 B7 ) Loop( B8-B7 inner main of N10) Freq: 904.513
0d3     leal    RBX, [R9 + R11]
0d7     addl    R8, RBX	# int
0da     addl    R8, RBX	# int
0dd     addl    R8, RBX	# int
0e0     addl    R8, RBX	# int
0e3     addl    R8, RBX	# int
0e6     addl    R8, RBX	# int
0e9     addl    R8, RBX	# int
0ec     addl    R8, RBX	# int
0ef     addl    R8, RBX	# int
0f2     addl    R8, RBX	# int
0f5     addl    R8, RBX	# int
0f8     addl    R8, RBX	# int
0fb     addl    R8, RBX	# int
0fe     addl    R8, RBX	# int
101     addl    R8, RBX	# int
104     addl    R8, RBX	# int
107     addl    R8, RBX	# int
10a     addl    R8, RBX	# int
10d     addl    R8, RBX	# int
110     addl    R8, RBX	# int
113     addl    R8, RBX	# int
116     addl    R8, RBX	# int
119     addl    R8, RBX	# int
11c     addl    R8, RBX	# int
11f     addl    R8, RBX	# int
122     addl    R8, RBX	# int
125     addl    R8, RBX	# int
128     addl    R8, RBX	# int
12b     addl    R8, RBX	# int
12e     addl    R8, RBX	# int
131     addl    R8, RBX	# int
134     addl    R8, RBX	# int
137     addl    R8, #496	# int
13e     leal    RCX, [R9 + #32]
142     cmpl    RCX, #4207
148     jl,s   B7	# loop end  P=0.998894 C=20781.000000

14a     B9: #	out( B10 ) <- in( B8 )  Freq: 0.999982
14a     # castII of R9
14a     addl    R9, #32	# int

14e     B10: #	out( B10 B11 ) <- in( B9 B10 ) Loop( B10-B10 inner post of N309) Freq: 1.99996
14e     leal    RCX, [R11 + R9]
152     addl    R8, RCX	# int
155     incl    R9	# int
        nop 	# 8 bytes pad for loops and calls
160     cmpl    R9, #4234
167     jl,s   B10	# loop end  P=0.500000 C=20781.000000

169     B11: #	out( N273 ) <- in( B10 )  Freq: 0.999982
169     addl    RAX, R8	# int
16c     addl    RAX, R10	# int
16f     addq    rsp, 64	# Destroy frame
	popq    rbp
	cmpq    rsp, poll_offset[r15_thread] 
	ja      #safepoint_stub	# Safepoint: poll for GC

181     ret

182     B12: #	out( B6 ) <- in( B1 )  Freq: 0.1
182     movl    R10, R8	# spill
185     movl    RAX, RCX	# spill
187     jmp     B6

18c     B13: #	out( B15 B14 ) <- in( B2 )  Freq: 9.00149e-05
18c     movq    RSI, precise compiler/c2/irTests/scalarReplacement/AllocationMergesTests$Point: 0x00007f19e81bddc0:Constant:exact *	# ptr
196     movq    RBP, [rsp + #0]	# spill
        nop 	# 1 bytes pad for loops and calls
19b     call,static  wrapper for: _new_instance_Java
        # compiler.c2.irTests.scalarReplacement.AllocationMergesTests::testNoEscapeWithLoadInLoop @ bci:18 (line 949) L[0]=RBP L[1]=rsp + #16 L[2]=rsp + #8 L[3]=rsp + #12 L[4]=#ScObj0 L[5]=#0 L[6]=_
        # ScObj0 compiler/c2/irTests/scalarReplacement/AllocationMergesTests$Point={ [x :0]=rsp + #8, [y :1]=rsp + #12 }
        # compiler.c2.irTests.scalarReplacement.AllocationMergesTests::testNoEscapeWithLoadInLoop_C2 @ bci:4 (line 962) L[0]=RBP L[1]=rsp + #16 L[2]=rsp + #8 L[3]=rsp + #12
        # OopMap {rbp=Oop [0]=Oop off=416/0x1a0}

1a8     B14: #	out( B4 ) <- in( B13 )  Freq: 9.00131e-05
        # Block is sole successor of call
1a8     jmp     B4

1ad     B15: #	out( B17 ) <- in( B13 )  Freq: 9.00149e-10
1ad     # exception oop is in rax; no code emitted
1ad     movq    RSI, RAX	# spill
1b0     jmp,s   B17

1b2     B16: #	out( B17 ) <- in( B4 )  Freq: 9e-06
1b2     # exception oop is in rax; no code emitted
1b2     movq    RSI, RAX	# spill

1b5     B17: #	out( N273 ) <- in( B16 B15 )  Freq: 9.0009e-06
1b5     addq    rsp, 64	# Destroy frame
	popq    rbp

1ba     jmp     rethrow_stub

--------------------------------------------------------------------------------

STDERR:

Command Line:
/scratch/chagedor/jdk/open/jdk-22/fastdebug/bin/java -DReproduce=true -cp /scratch/chagedor/jdk/open/JTwork/classes/compiler/c2/irTests/scalarReplacement/AllocationMergesTests.d:/scratch/chagedor/jdk/open/test/hotspot/jtreg/compiler/c2/irTests/scalarReplacement:/scratch/chagedor/jdk/open/JTwork/classes/test/lib:/scratch/chagedor/jdk/open/JTwork/classes:/home/chagedor/jtreg/lib/javatest.jar:/home/chagedor/jtreg/lib/jtreg.jar:/home/chagedor/jtreg/lib/junit-platform-console-standalone-1.9.2.jar:/home/chagedor/jtreg/lib/testng-7.3.0.jar:/home/chagedor/jtreg/lib/jcommander-1.78.jar:/home/chagedor/jtreg/lib/guice-4.2.3.jar -Djava.library.path=. -Xbootclasspath/a:. -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI -DWarmup=2000 -XX:+CreateCoredumpOnCrash -ea -esa -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation -Dir.framework.server.port=37699 -XX:+UnlockDiagnosticVMOptions -XX:+ReduceAllocationMerges -XX:+TraceReduceAllocationMerges -XX:+DeoptimizeALot -XX:CompileCommand=exclude,*::dummy* -XX:+PrintCompilation -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation -XX:CompilerDirectivesFile=test-vm-compile-commands-pid-13910.log -XX:CompilerDirectivesLimit=421 -XX:-OmitStackTraceInFastThrow -DShouldDoIRVerification=true -XX:-BackgroundCompilation -XX:CompileCommand=quiet compiler.lib.ir_framework.test.TestVM compiler.c2.irTests.scalarReplacement.AllocationMergesTests

One or more @IR rules failed:

Failed IR Rules (1) of Methods (1)
----------------------------------
1) Method "int compiler.c2.irTests.scalarReplacement.AllocationMergesTests.testNoEscapeWithLoadInLoop_C2(boolean,int,int)" - [Failed IR rules: 1]:
   * @IR rule 1: "@compiler.lib.ir_framework.IR(applyIfCPUFeatureAnd={}, phase={DEFAULT}, applyIf={}, applyIfCPUFeatureOr={}, applyIfCPUFeature={}, counts={}, applyIfAnd={}, failOn={"_#ALLOC#_"}, applyIfOr={}, applyIfNot={})"
     > Phase "PrintOptoAssembly":
       - failOn: Graph contains forbidden nodes:
         * Constraint 1: "(.*precise .*\R((.*(?i:mov|mv|xorl|nop|spill).*|\s*)\R)*.*(?i:call,static).*wrapper for: _new_instance_Java)"
           - Matched forbidden node:
             * 18c     movq    RSI, precise compiler/c2/irTests/scalarReplacement/AllocationMergesTests$Point: 0x00007f19e81bddc0:Constant:exact *	# ptr
               196     movq    RBP, [rsp + #0]	# spill
                       nop 	# 1 bytes pad for loops and calls
               19b     call,static  wrapper for: _new_instance_Java

Comments
Changeset: 02ef859f Author: Cesar Soares Lucas <cslucas@openjdk.org> Committer: Tobias Hartmann <thartmann@openjdk.org> Date: 2023-08-22 07:58:51 +0000 URL: https://git.openjdk.org/jdk/commit/02ef859f79cbc2e6225998001af299ba36fe991b
22-08-2023

A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/15367 Date: 2023-08-21 18:21:38 +0000
21-08-2023

Hello, again Christian. I was able to reproduce all failures locally and most of them were due to RAM optimization bailing out because of the IR graph shape. The use of random values for control flow conditions was causing the IR graph to be in a random shape. I'm going to create a PR that fixes all the issues I found.
21-08-2023

Thanks, [~cslucas]!
04-08-2023

Hi, [~chagedorn]. Thank you for pinging me. I'll take a look today.
03-08-2023

ILW = Single IR test failure, only intermittently on some machines in tier2, no workaround = MMH = P3
03-08-2023

Hi [~cslucas], can you have a look at that?
03-08-2023