JDK-8173112 : A lot of tests time out while executed with specific TieredStopAtLevel
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 9,10,11,12,13,14,15,16,17
  • Priority: P3
  • Status: Closed
  • Resolution: Not an Issue
  • Submitted: 2017-01-20
  • Updated: 2022-08-16
  • Resolved: 2021-07-14
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 18
18Resolved
Related Reports
Duplicate :  
Duplicate :  
Duplicate :  
Relates :  
Relates :  
Relates :  
Relates :  
Description
There compiler atr testing includes execution of various tests with all tiered levels. There are a lot of timeouts. Mostly when level = 3.
It seems that timeout should be increased or list of tests adjusted.

Please note that failure rules could match all bugs. So might be it make a sense to add test-ignore.
Comments
There are only a few tests time out in JDK 18 ATR. Should be covered by separate more specific issues.
14-07-2021

I'll try to clarify the parts of this fix a little bit. The JVMCI changes are just to address the problem that if you mark a method as not OSR compilable at tier 4, you will never switch out of tier 3. In regular programs this might not be hugely noticeable but in the context of scimark with lots of threads it might get into a state where it never makes progress. That was the cause of the hang instead of just seeing slow performance as you would expect. From the Graal side, the problem was that we were treating an unreachable OSR entry point as a permanent error. Again this was because the profile counters and the counters for triggering an OSR were separate and inconsistent. The backedge total counter might say this is hot while the profile information says that we've never reached this block when you have multiple loops in the same method. This is caused by the incrementing by 0 resetting counter back to 0. I didn't want to touch this part because it requires different LIR or a new LIR operation to make it conditional. So instead which treat an unreached OSR entry point as a transient error and just hope we'll see correct profiles when it tries again.
30-05-2019

Yes, this bug about tier3 is slow. But it could be fixed by falling back to level1 if level4 is not available. We still need counters in level3 updated incrementally to know when to fall back.
29-05-2019

The description of the bug says the problem is slow execution with TieredStopAtLevel=3 though.
29-05-2019

We are not addressing level3 code performance. The issue is that we stack in level3 code for much longer than it should be.
29-05-2019

I don't think it's related to this particular bug though. It would affect the time it takes to go from level 3 to 4, but it won't particularly affect the speed of execution at level 3.
29-05-2019

A quick fix can add another counter to MethodCounters and increment it when it's a forward branch. And we'd just ignore it.
29-05-2019

What can probably be done is to make the cmove select addresses of the memory location to add to instead of selecting the value to add. So we need some global blackhole address perhaps per thread that we'd increment if it's a forward branch.
29-05-2019

Should we use access_atomic_add_at() then? Or it is too expensive?
29-05-2019

You can't safely express jumps within a basic block. Although it might be possible with some register allocator hackery to prevent spills on the sides of the diamond.
29-05-2019

Here is C1 code in question: http://hg.openjdk.java.net/jdk/jdk/file/ae908641e726/src/hotspot/share/c1/c1_LIRGenerator.cpp#l3280
29-05-2019

[~never] Tom, why not fix LIRGenerator::increment_backedge_counter_conditionally() to avoid adding 0 and generate jump instead?
29-05-2019

Ouch, if some hot methods intermittently get stuck in tier 3 code then that could definitely trigger situations very similar to the worst hangs/timeouts we've seen historically, even on product builds running with OOTB tiered compilation settings.
29-05-2019

I am going port next Tom's change in Lab's JVMCI repo which applicable to our JDK code too: https://github.com/graalvm/graal-jvmci-8/commit/b12b6430c5d75fc959a391f4effcdd7dd18d8b06 [~never] Tom Rodriguez added a comment - 06/May/19: "The timeouts seems to be a symptom of being stuck in tier 3 code which ends up hammering on the MDO. Some experiments I did with -XX:-TieredStopAtLevel=3 showed that sometimes it just ran forever without making any progress. Presumably the general cache and NUMA overhead of 72 threads hammering on the same MDO just brought things to a halt. In the end this seems likely to be a deopt/recompile loop that's leads to marking something not compilable, like the earlier report. I could see something like that in my local runs. I'll investigate that along with why we seem stuck in tier3. If it's not compilable we should revert to tier 1." Tom Rodriguez added a comment - 6 days ago: "I finally understand what's happening. We aren't compiling the OSR entry point because a longer path leading to the OSR entry point doesn't appear to be taken. This shouldn't be possible since we are in level 3 code but I think we are occasionally hiding updates because of the way LIRGenerator::increment_backedge_counter_conditionally works. Depending on the result of the test it needs to only update the counter if it takes the branch. This is accomplished by either adding 0 or step. The problem is that if you read 0 and add 0 then you can overwrite another threads write of 1 back to 0. Since so many threads are writing to this MDO that's not just a theoretical possibility. It's a transient condition so a non-permanent bailout is probably the right solution to this. I noticed that when I fail with an error the rertry compilation actually ends up succeeding since the profile has been updated and the program runs normally."
29-05-2019

Just thinking out loud, but if tests are multi-threaded and/or more than one test is being run on the same JVM along with TierStopAtLevel=3 - and to some extent =2 - then you could be hitting a variant of JDK-8134940 That is: threads could be spending a lot of time contending for the opportunity to update method data counters of common methods, and in extreme cases the cache coherency traffic can cause program threads to make very little progress, if at all. We know that running some SPECjvm2008 benchmarks at TierStopAtLevel=3 can make it hang effectively forever on a dual-socket SandyBridge machine.
29-05-2019

Here is the example of hs-promotion-comp run: https://java-dev.se.oracle.com:10067/mdash/jobs/lmesnik-long-20190201-1940-262547?search=result.status%3AEXECUTED_WITH_FAILURE Unfortunately, it contains timeouts caused by JDK-8207367, JDK-8200756 and other possible bugs.
06-02-2019

These failures reproduced in ATR testing only. It is needed to adjust the list of tests executed with different tiered levels.
21-12-2018

no such failures have been reported from recent regular testing, closing as CNR.
27-07-2018

Some of tests time out with client and Xcomp only on linux-arm32: http://aurora.us.oracle.com/functional/faces/RunDetails.xhtml?names=2336758.PROMOTION.VM-95
17-07-2017

This CR is matching a lot of unrelated failures. For example: [2017-05-03T07:38:07.05] # ERROR: ex03t001.c, 204: Expected ClassUnload event was not received. [2017-05-03T07:38:32.26] # Test level exit status: 97 I'm not sure why ExitCode 97 and other non-timeout related RULEs were added here. Also, we need to be real careful with lumping multiple test failures under the same "timeout" bug. They often are unrelated failures.
03-05-2017

RULE "java/lang/Long/ToString.java" StatusError Program `...java' timed out (timeout set to ...ms, elapsed time including timeout handling was ...ms). RULE "closed/com/oracle/jfr/jvm/TestLargeJavaEvent512k.java" StatusError Program `...java' timed out (timeout set to ...ms, elapsed time including timeout handling was ...ms). RULE "java/util/Map/Collisions.java" StatusError Program `...java' timed out (timeout set to ...ms, elapsed time including timeout handling was ...ms). RULE "nsk/jdi/BScenarios/hotswap/tc05x002" Timeout none RULE "nsk/jvmti/scenarios/events/EM02/em02t005" ExitCode 97 RULE "gc/gctests/WeakReference/weak006" Timeout none RULE "heapdump/OnOOMToFileMetaspace" Exception java.lang.OutOfMemoryError: Metaspace
20-01-2017

RULE "java/util/Map/InPlaceOpsCollisions.java" StatusError Program `...java' timed out (timeout set to ...ms, elapsed time including timeout handling was ...ms). RULE "java/util/regex/RegExTest.java" StatusError Program `...java' timed out (timeout set to ...ms, elapsed time including timeout handling was ...ms). RULE "java/util/stream/boottest/java.base/java/util/stream/SpinedBufferTest.java" StatusError Program `...java' timed out (timeout set to ...ms, elapsed time including timeout handling was ...ms). RULE "java/util/stream/test/org/openjdk/tests/java/util/stream/CollectorsTest.java" StatusError Program `...java' timed out (timeout set to ...ms, elapsed time including timeout handling was ...ms). RULE "java/util/stream/test/org/openjdk/tests/java/util/stream/CountLargeTest.java" StatusError Program `...java' timed out (timeout set to ...ms, elapsed time including timeout handling was ...ms) RULE "java/util/stream/test/org/openjdk/tests/java/util/stream/FlatMapOpTest.java" StatusError Program `...java' timed out (timeout set to ...ms, elapsed time including timeout handling was ...ms). RULE "java/util/stream/test/org/openjdk/tests/java/util/stream/InfiniteStreamWithLimitOpTest.java" StatusError Program `...java' timed out (timeout set to ...ms, elapsed time including timeout handling was ...ms). RULE "java/util/stream/test/org/openjdk/tests/java/util/stream/SliceOpTest.java" StatusError Program `...java' timed out (timeout set to ...ms, elapsed time including timeout handling was ...ms). RULE "java/util/stream/test/org/openjdk/tests/java/util/stream/StreamBuilderTest.java" StatusError Program `...java' timed out (timeout set to ...ms, elapsed time including timeout handling was ...ms). RULE "java/util/stream/test/org/openjdk/tests/java/util/stream/StreamLinkTest.java" StatusError Program `...java' timed out (timeout set to ...ms, elapsed time including timeout handling was ...ms). RULE "java/util/stream/test/org/openjdk/tests/java/util/stream/WhileOpTest.java" StatusError Program `...java' timed out (timeout set to ...ms, elapsed time including timeout handling was ...ms).
20-01-2017

Following tests failed because of internal timeout RULE "nsk/jvmti/GarbageCollectionStart/gcstart001" Timeout none RULE "nsk/jvmti/scenarios/events/EM02/em02t002" ExitCode 97 RULE "nsk/jvmti/scenarios/events/EM02/em02t003" ExitCode 97 RULE "nsk/jvmti/scenarios/events/EM02/em02t005" Timeout none RULE "nsk/jvmti/scenarios/events/EM02/em02t006" ExitCode 97 RULE "nsk/jvmti/scenarios/events/EM07/em07t002" ExitCode 97 RULE "nsk/jvmti/scenarios/extension/EX03/ex03t001" ExitCode 97
20-01-2017

RULE "java/math/BigInteger/BigIntegerTest.java" StatusError Program `...java' timed out (timeout set to ...ms, elapsed time including timeout handling was ...ms). RULE "java/math/BigInteger/PrimeTest.java" StatusError Program `...java' timed out (timeout set to ...ms, elapsed time including timeout handling was ...ms)
20-01-2017

RULE "closed/compiler/c1/4477197/AVTest.java" StatusError Program `...java' timed out (timeout set to ...ms, elapsed time including timeout handling was ...ms). RULE "runtime/memory/LargePages/TestLargePagesFlags.java" StatusError Program `...java' timed out (timeout set to ...ms, elapsed time including timeout handling was ...ms). RULE "tools/jlink/basic/BasicTest.java" StatusError test was interrupted! (timeout?) RULE "tools/jlink/plugins/CompressorPluginTest.java" StatusError Program `...java' timed out (timeout set to ...ms, elapsed time including timeout handling was ...ms). RULE "closed/java/lang/Character/UnicodeCasingTest.java" StatusError Program `...java' timed out (timeout set to ...ms, elapsed time including timeout handling was ...ms). RULE "closed/java/lang/String/UnicodeCasingTest.java" StatusError Program `...java' timed out (timeout set to ...ms, elapsed time including timeout handling was ...ms). RULE "java/lang/reflect/PublicMethods/PublicMethodsTest.java" StatusError Program `...java' timed out (timeout set to ...ms, elapsed time including timeout handling was ...ms).
20-01-2017

RULE "gc/gctests/StringInternSyncWithGC" Timeout none RULE "gc/gctests/StringInternSyncWithGC2" Timeout none
20-01-2017

RULE "nsk/monitoring/ThreadMXBean/GetThreadAllocatedBytes/doubleAllocationTest_directly_string" Timeout none RULE "nsk/monitoring/ThreadMXBean/GetThreadAllocatedBytes/doubleAllocationTest_proxy_custom_string" Timeout none RULE "nsk/monitoring/ThreadMXBean/GetThreadAllocatedBytes/doubleAllocationTest_proxy_default_string" Timeout none RULE "nsk/monitoring/ThreadMXBean/GetThreadAllocatedBytes/doubleAllocationTest_server_custom_string" Timeout none RULE "nsk/monitoring/ThreadMXBean/GetThreadAllocatedBytes/doubleAllocationTest_server_default_string" Timeout none RULE "nsk/monitoring/ThreadMXBean/GetThreadAllocatedBytes/equalThreadsTest_directly_string" Timeout none RULE "nsk/monitoring/ThreadMXBean/GetThreadAllocatedBytes/equalThreadsTest_proxy_custom_string" Timeout none RULE "nsk/monitoring/ThreadMXBean/GetThreadAllocatedBytes/equalThreadsTest_proxy_default_string" Timeout none RULE "nsk/monitoring/ThreadMXBean/GetThreadAllocatedBytes/equalThreadsTest_server_custom_string" Timeout none RULE "nsk/monitoring/ThreadMXBean/GetThreadAllocatedBytes/equalThreadsTest_server_default_string" Timeout none
20-01-2017

RULE "compiler/codegen/Test7100757.java" StatusError Program `...java' timed out (timeout set to ...ms, elapsed time including timeout handling was ...ms). RULE "compiler/intrinsics/unsafe/DirectByteBufferTest.java" StatusError Program `...java' timed out (timeout set to ...ms, elapsed time including timeout handling was ...ms). RULE "compiler/intrinsics/unsafe/HeapByteBufferTest.java" StatusError Program `...java' timed out (timeout set to ...ms, elapsed time including timeout handling was ...ms). RULE "gc/arguments/TestMaxMinHeapFreeRatioFlags.java" StatusError Program `...java' timed out (timeout set to ...ms, elapsed time including timeout handling was ...ms). RULE "gc/arguments/TestShrinkHeapInSteps.java" StatusError Program `...java' timed out (timeout set to ...ms, elapsed time including timeout handling was ...ms).
20-01-2017

Link to the report: http://aurora.us.oracle.com/functional/faces/ChessBoard.xhtml?reportName=J2SEFailures&parameters=%5Bjdkproduct%5DJava%28TM%29%25%5Brelease%5Dnone%5Bbuild%5D%25%5Bvmproduct%5DJava%25VM%25%5Bvmrelease%5D%25%5Bvmbuild%5D%25%5Bcomponent%5D%25%5Btags%5D%25%5BstartDate%5DSep+9%2C+2009+10%3A00%3A00+PM+UTC%5BendDate%5DSep+9%2C+2020+10%3A00%3A00+PM+UTC%5BrunNames%5D%27%27%5BbatchNames%5D1926005.PROMOTION.VM%2C1926009.PROMOTION.VM%2C1926010.PROMOTION.VM%2C1926008.PROMOTION.VM%2C1926002.PROMOTION.VM%2C1926007.PROMOTION.VM%2C1926006.PROMOTION.VM%2C1926003.PROMOTION.VM%2C1926004.PROMOTION.VM%5BbatchNameFilterRegExp%5D%25%5BrunNameFilterRegExp%5D%25%5BtestsuiteNameFilterRegExp%5D%25%5BtsrFilter%5D%28tsr.total+%3E+0+or+tsr.invalid+%3D+0%29%5Bst_id%5D%27%27%5BstRun%5D%27%27%5BstBatch%5D%27%27%5Bbundles%5Dnone%5Bstatuses%5Dfailure.status+%3D+com.sun.aurora.data.entity.reporting.Status.FAILED+AND+failure.isValid+%3D+true+AND+%28%3Amatches%29%5Bmatches%5D%28NOT+failure.isKnown+%3D+true%29&filter=
20-01-2017

[~iignatyev], could you please take a look or re-assign? Thanks!
20-01-2017

ILW = Lots of tests time out during testing, with TieredStopAtLevel=3, no workaround = MMH = P3
20-01-2017