Bug ID: JDK-8258225 compiler/c2/cr6340864/TestIntVect.java runs faster in interpreter

Type: Bug
Component: hotspot
Sub-Component: compiler
Affected Version: 15,16,17

Priority: P3
Status: Resolved
Resolution: Fixed
CPU: aarch64

Submitted: 2020-12-14
Updated: 2021-12-16
Resolved: 2020-12-17

JDK 16	JDK 17
16Fixed	17 b03Fixed

When roaming my test results I noticed the run times of this test stood out. The numbers seemed unreasonable - 35 minutes for all 5 test cases, when it takes well less than a minute interpreted on my laptop.

On aarch64 a single test case take about 60 seconds with -Xint. But running with default settings (+Tiered) takes 400 seconds.

These tests being slow seems to have at least partially been solved by "C1: Relax inlining checks for not yet initialized classes".
16-12-2021
https://bugs.openjdk.java.net/browse/JDK-8267806 seems to have reduced the runtime of these tests significantly on x64. I can see one test go from 10 seconds to 3-4, and the reduction comes from much shorted Emit LIR time.
10-12-2021
Changeset: cb5a6b1a Author: Nils Eliasson <neliasso@openjdk.org> Date: 2020-12-17 17:50:53 +0000 URL: https://git.openjdk.java.net/jdk/commit/cb5a6b1a
17-12-2020
Conclusion: This is a debug-build problem only, caused by IR::Verify being expensive, and being run for every block by BlockMerger::try_merge. TestByteVect and TestShortVect are not affected because they hit the HugeMethodLimit Removing -Xbatch also help because then the program can make progress while the compiler threads are busy verifying the IR. Solution: Will exclude the test-method from all the test. Suggestion: Would be nice with some feedback when huge methods are ignored from compilation. The IR::verify needs to be improved so that we don't exclude methods that are fine in product-builds.
17-12-2020
BlockMerger - iterates all the blocks. For every block it will call verify at least three times - twice in block_do and at least once in try_merge. So all blocks are iterated preorder and postorder at least three times for every block. virtual void block_do(BlockBegin* block) { _hir->verify(); // repeat since the same block may merge again while (try_merge(block)) { _hir->verify(); } }
16-12-2020
It's actually IR::verify in optimize_blocks that spends most of the time, but LinearScan is using a lot too. Yes - the test() method should be excluded. But I also need to figure out why the test() method isn't compiled in TestByteVect and TestShortVect. And why only aarch64 is seeing this problem.
16-12-2020
Baseline: Accumulated compiler times ---------------------------------------------------------- Total compilation time : 272,790 s Standard compilation : 8,108 s, Average : 0,012 s Bailed out compilation : 0,000 s, Average : 0,000 s On stack replacement : 264,682 s, Average : 2,302 s Invalidated : 0,000 s, Average : 0,000 s C1 Compile Time: 272,513 s Setup time: 0,002 s Build HIR: 171,405 s Parse: 7,193 s Optimize blocks: 157,277 s GVN: 0,617 s Null checks elim: 0,145 s Range checks elim: 4,465 s Other: 1,707 s Emit LIR: 64,469 s LIR Gen: 1,209 s Linear Scan: 63,253 s Other: 0,007 s Code Emission: 33,495 s Code Installation: 3,140 s Other: 0,002 s Total compiled methods : 770 methods Standard compilation : 655 methods On stack replacement : 115 methods Total compiled bytecodes : 1000695 bytes Standard compilation : 64568 bytes On stack replacement : 936127 bytes Average compilation speed : 3668 bytes/s nmethod code size : 14788096 bytes nmethod total size : 5607104 bytes With test()-method excluded: Accumulated compiler times ---------------------------------------------------------- Total compilation time : 0,888 s Standard compilation : 0,854 s, Average : 0,001 s Bailed out compilation : 0,000 s, Average : 0,000 s On stack replacement : 0,033 s, Average : 0,001 s Invalidated : 0,000 s, Average : 0,000 s C1 Compile Time: 0,875 s Setup time: 0,002 s Build HIR: 0,481 s Parse: 0,263 s Optimize blocks: 0,105 s GVN: 0,014 s Null checks elim: 0,007 s Range checks elim: 0,057 s Other: 0,035 s Emit LIR: 0,251 s LIR Gen: 0,039 s Linear Scan: 0,209 s Other: 0,002 s Code Emission: 0,094 s Code Installation: 0,046 s Other: 0,001 s Total compiled methods : 705 methods Standard compilation : 654 methods On stack replacement : 51 methods Total compiled bytecodes : 52337 bytes Standard compilation : 51019 bytes On stack replacement : 1318 bytes Average compilation speed : 58967 bytes/s nmethod code size : 869528 bytes nmethod total size : 386944 bytes
16-12-2020
I remember that C1 linear scan RA had trouble with these tests. I agree with excluding test() method from compilation. Please consider doing the same for other similar tests in compiler/c2/cr6340864/
15-12-2020
I think I know whats going on. It's a single threaded test, running with Xbatch, so only a single compiler thread is doing work. The test method is huge and gets OSR-compiled for every test loop (about 75.) Turning off OSR compiles make the test program as fast as the interpreter. (so probably running in the interpreter). Excluding the main test-loop from compilation makes the problem go away.
15-12-2020
Running with -XX:-TieredCompilation gives fast results. So only tiered and c1-only is affected.
14-12-2020
I got strange numbers with c1 too. For byte and short: Interpreter 60-70 sec. c1 and c2: about 5 sec For int and long: Interpreter tar 60-70 sec. c1 and c2: about 300-400 sec So the interpreter is equally slow on all types, but c1 and c2 is up to 80 times slower on ints and longs.
14-12-2020
I am also suspecting compiler/codegen/TestCharVect2.java that show very long run times. Not confirmed.
14-12-2020
ILW = Slow execution of jitted code, single test on aarch64, disable compilation of affected method = MMM = P3
14-12-2020
This seem to apply to compiler/c2/cr6340864/TestLongVect.java too TestByteVect and TestShortVect run quickly, and are both much faster than with -Xint. TestIntVect and TestLongVect are very slow - each subtest take an increasing amount of time. With -Xint they are faster than compiled code. There are about 75 subtest, but I don't think they are all wrong. Too me it looks like the number of iterations are increasing.
14-12-2020

Relates :	JDK-8258603 - c1 IR::verify is expensive
Relates :	JDK-8267806 - C1: Relax inlining checks for not yet initialized classes