JDK-8332765 : Test compiler/loopopts/superword/TestAlignVectorFuzzer.java still times out after JDK-8327978
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 23
  • Priority: P4
  • Status: Open
  • Resolution: Unresolved
  • CPU: x86_64
  • Submitted: 2024-05-23
  • Updated: 2024-05-27
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 23
23Unresolved
Related Reports
Relates :  
Relates :  
Relates :  
Relates :  
Description
This is the same failure as reported in JDK-8323641 and JDK-8327978.

I suspect that it was a regression of JDK-8324517: CastII nodes took up 90%+ of compile time, and compile time was extremely high 100sec+.
AVX512 seems required to reach the required unrolling factor.
Now this change was backed-out, and may be re-introduced in JDK-8332827.

Test: compiler/loopopts/superword/TestAlignVectorFuzzer.java#VerifyAlignVector-NoTieredCompilation-Xbatch

ITERATION 2 of 5. Test testDUBCFH, time allowance: 110626, until timeout: 418626
62649  403    b        compiler.loopopts.superword.TestAlignVectorFuzzer::testDUBCFH (1237 bytes)
Timeout signalled after 480 seconds
TEST PASSED: hit maximal time allownance during iteration 2, time allowance: -383253, until timeout: -75253

Comments
I am not saying that JDK-8324517 was necessarily the root cause, it may just have triggered a underlying issue.
27-05-2024

I just pulled from master, and this benchmark passes/is fast again. That confirms it for me that this was due to JDK-8324517. [~roland] This will probably come back when you work on JDK-8332827, I sent you an email as well.
27-05-2024

I attached a Test.java, where I can cleanly reproduce the timemout: timeout -s SIGKILL 30 ./java -XX:LoopUnrollLimit=250 -XX:CompileCommand=printcompilation,Test::* -XX:-TieredCompilation -Xbatch Test.java
27-05-2024

Build search told me it comes from JDK-8324517, i.e. this change made it much slower. It is now backout-ed, and supposed to be REDO-ne: JDK-8332827 I think it is plausible that it was due to JDK-8324517, since that patch touched the Cast code, and I see in my benchmarking that we spend excessive time in Cast code.
27-05-2024

I reduced the test a bit, and see this for CITime: C2 Compile Time: 185.757 s IdealLoop: 182.414 s AutoVectorize: 3.357 s This is what I get from perf: + 98.74% 0.00% C2 CompilerThre libpthread-2.31.so [.] start_thread + 98.74% 0.00% C2 CompilerThre libjvm.so [.] thread_native_entry + 98.74% 0.00% C2 CompilerThre libjvm.so [.] Thread::call_run + 98.74% 0.00% C2 CompilerThre libjvm.so [.] JavaThread::thread_main_inner + 98.74% 0.00% C2 CompilerThre libjvm.so [.] CompileBroker::compiler_thread_loop + 98.73% 0.00% C2 CompilerThre libjvm.so [.] CompileBroker::invoke_compiler_on_method + 98.73% 0.00% C2 CompilerThre libjvm.so [.] C2Compiler::compile_method + 98.71% 0.00% C2 CompilerThre libjvm.so [.] Compile::Compile + 97.33% 0.00% C2 CompilerThre libjvm.so [.] Compile::Optimize + 97.07% 0.00% C2 CompilerThre libjvm.so [.] PhaseIdealLoop::optimize + 97.02% 0.00% C2 CompilerThre libjvm.so [.] Compile::optimize_loops + 94.39% 0.92% C2 CompilerThre libjvm.so [.] ConstraintCastNode::dominating_cast + 93.45% 0.14% C2 CompilerThre libjvm.so [.] Type::meet_helper + 93.44% 0.09% C2 CompilerThre libjvm.so [.] ConstraintCastNode::higher_equal_types + 92.03% 0.39% C2 CompilerThre libjvm.so [.] Type::hashcons + 74.30% 63.09% C2 CompilerThre libjvm.so [.] Type::Initialize_shared(Compile*)::{lambda(void const*, void const*)#1}::_FUN + 64.79% 0.82% C2 CompilerThre libjvm.so [.] VerifyMeetResult::meet + 51.77% 0.02% C2 CompilerThre libjvm.so [.] PhaseIterGVN::optimize + 51.73% 0.02% C2 CompilerThre libjvm.so [.] PhaseIterGVN::transform_old + 51.27% 0.00% C2 CompilerThre libjvm.so [.] CastIINode::Identity + 45.54% 0.00% C2 CompilerThre libjvm.so [.] PhaseIdealLoop::build_and_optimize + 43.73% 0.01% C2 CompilerThre libjvm.so [.] PhaseIdealLoop::split_if_with_blocks + 43.65% 17.49% C2 CompilerThre libjvm.so [.] Dict::Insert + 43.15% 0.00% C2 CompilerThre libjvm.so [.] PhaseIdealLoop::split_if_with_blocks_pre + 40.21% 0.06% C2 CompilerThre libjvm.so [.] Type::check_symmetrical + 12.58% 9.68% C2 CompilerThre libjvm.so [.] TypeInt::eq It seems there is a LOT of time spent on Cast nodes during loopopts... - 97.07% 0.00% C2 CompilerThre libjvm.so [.] PhaseIdealLoop::optimize - PhaseIdealLoop::optimize - 51.54% PhaseIterGVN::optimize - 51.51% PhaseIterGVN::transform_old - 51.27% CastIINode::Identity - 51.23% ConstraintCastNode::dominating_cast - 50.74% ConstraintCastNode::higher_equal_types - 50.68% Type::meet_helper + 21.39% Type::check_symmetrical + 15.76% Type::hashcons + 13.39% VerifyMeetResult::meet - 45.52% PhaseIdealLoop::build_and_optimize - 43.73% PhaseIdealLoop::split_if_with_blocks - 43.15% PhaseIdealLoop::split_if_with_blocks_pre - 43.11% ConstraintCastNode::dominating_cast - 42.67% ConstraintCastNode::higher_equal_types - 42.62% Type::meet_helper + 18.72% Type::check_symmetrical + 12.46% Type::hashcons + 11.34% VerifyMeetResult::meet + 0.58% PhaseIdealLoop::split_if_with_blocks_post + 1.73% PhaseIdealLoop::auto_vectorize I think I need to extract a test, and do a build-search with that to find the root cause - to check if this is indeed a regression or just an edge case that never worked.
24-05-2024

Actually, I think it really only times out with AVX512, since it creates a larger unrolling factor than with AVX2. But even with AVX2, I see that the same compilation already takes 16sec on my laptop. With higher unrolling for AVX512, this time becomes higher. Investigating more...
23-05-2024

Ah right, JDK-8332537 keeps track of the memory limit issue. So some of the failures were just incorrectly linked then.
23-05-2024

It does not seem to depend on AVX512, also works with AVX2, to get a timeout. It seems it might not even be a SuperWord issue, it seems it just gets incredibly slow during unrolling. Investigating... ------------------ About the MemLimit issue: [~kvn] Disabled the MemLimit with JDK-8332538, until this is fixed JDK-8332537, which is supposed to fix the excessive memory usage.
23-05-2024

Investigating the timeout issue first: ~/Documents/jtreg/bin/jtreg -va -s -jdk:/oracle-work/jdk-fork4/build/linux-x64-debug/jdk -javaoptions:"-Djdk.test.lib.random.seed=-1938832907645781393 -XX:UseAVX=3" -J-Djavatest.maxOutputSize=10000000 /oracle-work/jdk-fork4/open/test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVectorFuzzer.java This reproduces the timeout on code state of Wed May 22 08:50:15 2024 +0000 But it seems not to have reproduces on older code: Wed May 15 13:16:08 2024 +0000 Maybe it is a recent regression, but will have a look at the root cause now.
23-05-2024

JDK-8327978 was supposed to fx this but that fix is included in the failing build. We also seem to hit the memory limit: [...] hand_unrolling3: 1 ITERATION 2 of 5. Test testUUBBBH, time allowance: 158604, until timeout: 466604 13981 351 b compiler.loopopts.superword.TestAlignVectorFuzzer::testUUBBBH (1253 bytes) # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/workspace/open/src/hotspot/share/compiler/compilationMemoryStatistic.cpp:559), pid=3826037, tid=3826053 # fatal error: c2 compiler/loopopts/superword/TestAlignVectorFuzzer::testUUBBBH(([B[B[B)[Ljava/lang/Object;): Hit MemLimit (limit: 1073741824 now: 1073746392) But the failures were linked to JDK-8331337. ILW = Same as JDK-8327978 = P4
23-05-2024