JDK-8243455 : Many SA tests can fail due to trying to get the stack trace of an active method
  • Type: Bug
  • Component: hotspot
  • Sub-Component: svc-agent
  • Affected Version: 15,16,17
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2020-04-23
  • Updated: 2023-07-25
  • Resolved: 2021-03-12
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 17
17 b14Fixed
Related Reports
Blocks :  
Duplicate :  
Duplicate :  
Duplicate :  
Duplicate :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Sub Tasks
JDK-8262271 :  
Description
There are a number of SA tests that can fail (although only very rarely) due to the expectation of finding the stack trace for the LingeredApp "main" thread in the test output after requesting SA to dump all thread stacks. Usually the test searches for LingeredApp.main in the output, although there are some tests that vary somewhat.

The problem is that main thread is not always blocked, and when it's not blocked the stack may be in a state that prevents SA from being able to produce a stack trace. So the stack trace for the main method ends up empty, or in very very rare occasions includes an invalid frame with the notation "Compiled frame; information may be imprecise".

These are the test that are affected:

serviceability/sa/ClhsdbCDSJstackPrintAll.java
serviceability/sa/ClhsdbDumpheap
serviceability/sa/ClhsdbFindPC.java
serviceability/sa/ClhsdbJdis.java
serviceability/sa/ClhsdbJstack.java
serviceability/sa/ClhsdbPrintAs.java
serviceability/sa/ClhsdbSource
serviceability/sa/ClhsdbWhere.java
serviceability/sa/sadebugd/DebugdConnectTest.java
sun/tools/jhsdb/HeapDumpTest.java


They can be made to fail much more readily by modifying LingeredApp.main() so it only sleeps for 1ms instead of 1000ms. This way the test is much more apt to request the stack trace when the main thread's stack is in an state that prevents SA from getting the stack trace (or more precisely, getting the current frame).

This bug is the cause of a few outstanding bugs that will be closed as duplicates:

JDK-8211923: [Testbug] serviceability/sa/ClhsdbFindPC.java ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 1
JDK-8240781: serviceability/sa/ClhsdbJdis.java fails with "java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 1"
JDK-8242411: serviceability/sa/ClhsdbCDSJstackPrintAll.java fails with Test ERROR java.lang.RuntimeException: 'LingeredApp.main' missing from stdout/stderr

The AIOOBE is due to the test trying to first split the output at the "LingeredApp.main" symbol, and then access String[1] of the result, which doesn't exist since no splitting was done due to "LingeredApp.main" missing from the output. 

[Note, the following tests used to be in the above list, but I moved them out since they don't query the LingeredApp "main" thread, nor any other thread that is not stable.]

serviceability/sa/TestJhsdbJstackLock.java
serviceability/sa/TestJhsdbJstackMixed.java 
Comments
Changeset: 43524cc4 Author: Chris Plummer <cjplummer@openjdk.org> Date: 2021-03-12 23:35:47 +0000 URL: https://git.openjdk.java.net/jdk/commit/43524cc4
12-03-2021

Here's a log file snippet for the jdk-16+24-1551-tier1 sighting: serviceability/sa/ClhsdbCDSJstackPrintAll.java stderr: [ + jstack -v Deadlock Detection: No deadlocks found. "main" #1 prio=5 tid=0x0000fffeb40268e0 nid=0x292a runnable [0x0000000000000000] java.lang.Thread.State: RUNNABLE JavaThread state: _thread_in_java Locked ownable synchronizers: - None "Reference Handler" #2 daemon prio=10 tid=0x0000fffeb43767c0 nid=0x2939 waiting on condition [0x0000fffe80ffe000] java.lang.Thread.State: RUNNABLE JavaThread state: _thread_blocked - java.lang.ref.Reference.waitForReferencePendingList() @bci=0, pc=0x0000fffe9da23c2c, Method*=0x00000008003f07b0 (Interpreted frame) - java.lang.ref.Reference.processPendingReferences() @bci=0, line=241, pc=0x0000fffe9da1f7e4, Method*=0x00000008003f0900 (Interpreted frame) - java.lang.ref.Reference$ReferenceHandler.run() @bci=0, line=213, pc=0x0000fffe9da1f7e4, Method*=0x0000000800175468 (Interpreted frame) Locked ownable synchronizers: - None "Finalizer" #3 daemon prio=8 tid=0x0000fffeb4376da0 nid=0x293a in Object.wait() [0x0000fffe80dfe000] java.lang.Thread.State: WAITING (on object monitor) JavaThread state: _thread_blocked - java.lang.Object.wait(long) @bci=0, pc=0x0000fffe9da23c2c, Method*=0x0000000800458c48 (Interpreted frame) - waiting on <0x00000006b7c02d30> (a java.lang.ref.ReferenceQueue$Lock) - java.lang.ref.ReferenceQueue.remove(long) @bci=59, line=155, pc=0x0000fffe9da1f7e4, Method*=0x00000008001ea548 (Interpreted frame) - locked <0x00000006b7c02d30> (a java.lang.ref.ReferenceQueue$Lock) - java.lang.ref.ReferenceQueue.remove() @bci=2, line=176, pc=0x0000fffe9da1f580, Method*=0x00000008001ea5a8 (Interpreted frame) - java.lang.ref.Finalizer$FinalizerThread.run() @bci=37, line=170, pc=0x0000fffe9da1f580, Method*=0x0000000800422a70 (Interpreted frame) Locked ownable synchronizers: - None "Signal Dispatcher" #4 daemon prio=9 tid=0x0000fffeb4396fc0 nid=0x293b runnable [0x0000000000000000] java.lang.Thread.State: RUNNABLE JavaThread state: _thread_blocked Locked ownable synchronizers: - None "Common-Cleaner" #12 daemon prio=8 tid=0x0000fffeb4447ff0 nid=0x2943 in Object.wait() [0x0000fffe53bde000] java.lang.Thread.State: TIMED_WAITING (on object monitor) JavaThread state: _thread_blocked - java.lang.Object.wait(long) @bci=0, pc=0x0000fffe9da23c2c, Method*=0x0000000800458c48 (Interpreted frame) - waiting on <0x00000006b7c4b298> (a java.lang.ref.ReferenceQueue$Lock) - java.lang.ref.ReferenceQueue.remove(long) @bci=59, line=155, pc=0x0000fffe9da1f7e4, Method*=0x00000008001ea548 (Interpreted frame) - locked <0x00000006b7c4b298> (a java.lang.ref.ReferenceQueue$Lock) - jdk.internal.ref.CleanerImpl.run() @bci=45, line=140, pc=0x0000fffe9da1f580, Method*=0x00000008002794c0 (Interpreted frame) - java.lang.Thread.run() @bci=11, line=831, pc=0x0000fffe9da1fd64, Method*=0x000000080002bea8 (Interpreted frame) - jdk.internal.misc.InnocuousThread.run() @bci=20, line=134, pc=0x0000fffe9da1f7e4, Method*=0x00000008003d6520 (Interpreted frame) Locked ownable synchronizers: - None ] exitValue = -1 java.lang.RuntimeException: Test ERROR java.lang.RuntimeException: 'LingeredApp.main' missing from stdout/stderr at ClhsdbCDSJstackPrintAll.main(ClhsdbCDSJstackPrintAll.java:117) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:564) at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:127) at java.base/java.lang.Thread.run(Thread.java:831) Caused by: java.lang.RuntimeException: 'LingeredApp.main' missing from stdout/stderr at jdk.test.lib.process.OutputAnalyzer.shouldMatch(OutputAnalyzer.java:308) at ClhsdbLauncher.runCmd(ClhsdbLauncher.java:163) at ClhsdbLauncher.run(ClhsdbLauncher.java:200) at ClhsdbCDSJstackPrintAll.main(ClhsdbCDSJstackPrintAll.java:113) ... 6 more JavaTest Message: Test threw exception: java.lang.RuntimeException: Test ERROR java.lang.RuntimeException: 'LingeredApp.main' missing from stdout/stderr JavaTest Message: shutting down test STATUS:Failed.`main' threw exception: java.lang.RuntimeException: Test ERROR java.lang.RuntimeException: 'LingeredApp.main' missing from stdout/stderr ----------rerun:(34/7242)*----------
11-11-2020

I bumped this from P4 -> P3 since this intermittent failure has now been seen in a Tier1 job set.
04-07-2020

Here are some of the more common failure messages: java.lang.RuntimeException: Test ERROR java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 1 java.lang.RuntimeException: Test ERROR java.lang.RuntimeException: 'public static void main' missing from stdout/stderr java.lang.RuntimeException: Test ERROR (with -Xcomp=true) java.lang.RuntimeException: 'LingeredApp.main' missing from stdout/stderr java.lang.RuntimeException: Test ERROR (with -Xcomp=false) java.lang.RuntimeException: 'LingeredApp.main' missing from stdout/stderr java.lang.RuntimeException: Test ERROR java.lang.RuntimeException: 'LingeredAppWithExtendedChars.main' missing from stdout/stderr
23-04-2020

There are two solutions I can think of: (1) Allow the tests to pass even if the main thread is not dumped. I don't like this idea because it means if there is a real issue that is turning up frequently, the tests won't catch it. (2) Have these tests use a special LingeredApp that has an idle thread (one that is blocked on monitor). In order not to lose test SA test coverage, a new test will be added that stresses getting stack traces from an active thread. A number of other bugs have turned up the past few years due to SA throwing exceptions while trying to do a stack dump for an active thread, such as the infamous "illegal bci" assert. We don't want to lose this type of stress testing. I'll start work on (2) since I think it's the better solution.
23-04-2020