JDK-8258225 : compiler/c2/cr6340864/TestIntVect.java runs faster in interpreter
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 15,16,17
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • CPU: aarch64
  • Submitted: 2020-12-14
  • Updated: 2021-12-16
  • Resolved: 2020-12-17
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 16 JDK 17
16Fixed 17 b03Fixed
Related Reports
Relates :  
Relates :  
Description
When roaming my test results I noticed the run times of this test stood out. The numbers seemed unreasonable - 35 minutes for all 5 test cases, when it takes well less than a minute interpreted on my laptop.

On aarch64 a single test case take about 60 seconds with -Xint. But running with default settings (+Tiered) takes 400 seconds. 
Comments
These tests being slow seems to have at least partially been solved by "C1: Relax inlining checks for not yet initialized classes".
16-12-2021

https://bugs.openjdk.java.net/browse/JDK-8267806 seems to have reduced the runtime of these tests significantly on x64. I can see one test go from 10 seconds to 3-4, and the reduction comes from much shorted Emit LIR time.
10-12-2021

Changeset: cb5a6b1a Author: Nils Eliasson <neliasso@openjdk.org> Date: 2020-12-17 17:50:53 +0000 URL: https://git.openjdk.java.net/jdk/commit/cb5a6b1a
17-12-2020

Conclusion: This is a debug-build problem only, caused by IR::Verify being expensive, and being run for every block by BlockMerger::try_merge. TestByteVect and TestShortVect are not affected because they hit the HugeMethodLimit Removing -Xbatch also help because then the program can make progress while the compiler threads are busy verifying the IR. Solution: Will exclude the test-method from all the test. Suggestion: Would be nice with some feedback when huge methods are ignored from compilation. The IR::verify needs to be improved so that we don't exclude methods that are fine in product-builds.
17-12-2020

BlockMerger - iterates all the blocks. For every block it will call verify at least three times - twice in block_do and at least once in try_merge. So all blocks are iterated preorder and postorder at least three times for every block. virtual void block_do(BlockBegin* block) { _hir->verify(); // repeat since the same block may merge again while (try_merge(block)) { _hir->verify(); } }
16-12-2020

It's actually IR::verify in optimize_blocks that spends most of the time, but LinearScan is using a lot too. Yes - the test() method should be excluded. But I also need to figure out why the test() method isn't compiled in TestByteVect and TestShortVect. And why only aarch64 is seeing this problem.
16-12-2020

Baseline: Accumulated compiler times ---------------------------------------------------------- Total compilation time : 272,790 s Standard compilation : 8,108 s, Average : 0,012 s Bailed out compilation : 0,000 s, Average : 0,000 s On stack replacement : 264,682 s, Average : 2,302 s Invalidated : 0,000 s, Average : 0,000 s C1 Compile Time: 272,513 s Setup time: 0,002 s Build HIR: 171,405 s Parse: 7,193 s Optimize blocks: 157,277 s GVN: 0,617 s Null checks elim: 0,145 s Range checks elim: 4,465 s Other: 1,707 s Emit LIR: 64,469 s LIR Gen: 1,209 s Linear Scan: 63,253 s Other: 0,007 s Code Emission: 33,495 s Code Installation: 3,140 s Other: 0,002 s Total compiled methods : 770 methods Standard compilation : 655 methods On stack replacement : 115 methods Total compiled bytecodes : 1000695 bytes Standard compilation : 64568 bytes On stack replacement : 936127 bytes Average compilation speed : 3668 bytes/s nmethod code size : 14788096 bytes nmethod total size : 5607104 bytes With test()-method excluded: Accumulated compiler times ---------------------------------------------------------- Total compilation time : 0,888 s Standard compilation : 0,854 s, Average : 0,001 s Bailed out compilation : 0,000 s, Average : 0,000 s On stack replacement : 0,033 s, Average : 0,001 s Invalidated : 0,000 s, Average : 0,000 s C1 Compile Time: 0,875 s Setup time: 0,002 s Build HIR: 0,481 s Parse: 0,263 s Optimize blocks: 0,105 s GVN: 0,014 s Null checks elim: 0,007 s Range checks elim: 0,057 s Other: 0,035 s Emit LIR: 0,251 s LIR Gen: 0,039 s Linear Scan: 0,209 s Other: 0,002 s Code Emission: 0,094 s Code Installation: 0,046 s Other: 0,001 s Total compiled methods : 705 methods Standard compilation : 654 methods On stack replacement : 51 methods Total compiled bytecodes : 52337 bytes Standard compilation : 51019 bytes On stack replacement : 1318 bytes Average compilation speed : 58967 bytes/s nmethod code size : 869528 bytes nmethod total size : 386944 bytes
16-12-2020

I remember that C1 linear scan RA had trouble with these tests. I agree with excluding test() method from compilation. Please consider doing the same for other similar tests in compiler/c2/cr6340864/
15-12-2020

I think I know whats going on. It's a single threaded test, running with Xbatch, so only a single compiler thread is doing work. The test method is huge and gets OSR-compiled for every test loop (about 75.) Turning off OSR compiles make the test program as fast as the interpreter. (so probably running in the interpreter). Excluding the main test-loop from compilation makes the problem go away.
15-12-2020

Running with -XX:-TieredCompilation gives fast results. So only tiered and c1-only is affected.
14-12-2020

I got strange numbers with c1 too. For byte and short: Interpreter 60-70 sec. c1 and c2: about 5 sec For int and long: Interpreter tar 60-70 sec. c1 and c2: about 300-400 sec So the interpreter is equally slow on all types, but c1 and c2 is up to 80 times slower on ints and longs.
14-12-2020

I am also suspecting compiler/codegen/TestCharVect2.java that show very long run times. Not confirmed.
14-12-2020

ILW = Slow execution of jitted code, single test on aarch64, disable compilation of affected method = MMM = P3
14-12-2020

This seem to apply to compiler/c2/cr6340864/TestLongVect.java too TestByteVect and TestShortVect run quickly, and are both much faster than with -Xint. TestIntVect and TestLongVect are very slow - each subtest take an increasing amount of time. With -Xint they are faster than compiled code. There are about 75 subtest, but I don't think they are all wrong. Too me it looks like the number of iterations are increasing.
14-12-2020