JDK-8316401 : sun/tools/jhsdb/JStackStressTest.java failed with "InternalError: We should have found a thread that owns the anonymous lock"
  • Type: Bug
  • Component: hotspot
  • Sub-Component: svc-agent
  • Affected Version: 21,22,repo-lilliput-17
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • OS: linux
  • CPU: x86_64
  • Submitted: 2023-09-18
  • Updated: 2024-04-08
  • Resolved: 2023-09-29
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 21 JDK 22
21.0.2Fixed 22 b18Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Description
The following test failed in an adhoc Tier5 job that is testing:

    JDK-8315880 Change LockingMode default from LM_LEGACY to LM_LIGHTWEIGHT

Here's a snippet from the log file:

#section:main
----------messages:(7/346)----------
command: main JStackStressTest
reason: User specified action: run main/timeout=240 JStackStressTest 
started: Sun Sep 17 22:21:56 UTC 2023
Mode: othervm [test requires non-default system module]
Additional options from @modules: --add-modules jdk.hotspot.agent,java.management
finished: Sun Sep 17 22:22:08 UTC 2023
elapsed time (seconds): 11.96
----------configuration:(3/61)----------
Boot Layer
  add modules: jdk.hotspot.agent java.management

----------System.out:(135/8723)----------
Starting Jshell
[JShell]:/opt/mach5/mesos/work_dir/jib-master/install/2023-09-17-1736580.daniel.daugherty.8315880_for_jdk22.git/linux-x64-debug.jdk/jdk-22/fastdebug/bin/jshell -J-Xmx768m -J-XX:MaxRAMPercentage=4.16667 -J-Dtest.boot.jdk=/opt/mach5/mesos/work_dir/jib-master/install/jdk/20/36/bundles/linux-x64/jdk-20_linux-x64_bin.tar.gz/jdk-20 -J-Djava.io.tmpdir=/opt/mach5/mesos/work_dir/slaves/cd627e65-f015-4fb1-a1d2-b6c9b8127f98-S179455/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/647b6357-7ad2-47b6-a884-93007503a691/runs/fb791c2d-470c-40ea-99a6-ffd196fc0b7a/testoutput/test-support/jtreg_open_test_jdk_svc_tools/tmp -J-ea -J-esa
[2023-09-17T22:21:56.889643057Z] Gathering output for process 590799
###### Starting jstack iteration 1 against 590799
[2023-09-17T22:21:57.092332068Z] Gathering output for process 590855
[2023-09-17T22:21:59.775374274Z] Waiting for completion for process 590855
[2023-09-17T22:21:59.775620992Z] Waiting for completion finished for process 590855
Output and diagnostic info for process 590855 was saved into 'pid-590855-output.log'
jhsdb jstack stdout:
Attaching to process ID 590799, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 22-internal-2023-09-17-1736580.daniel.daugherty.8315880forjdk22.git
Deadlock Detection:

No deadlocks found.

"main" #1 prio=5 tid=0x00007f5b6802b560 nid=590802 runnable [0x00007f5b6de81000]
   java.lang.Thread.State: RUNNABLE
   JavaThread state: _thread_in_vm
 - java.util.stream.Nodes.builder() @bci=0, line=179 (Interpreted frame)
 - java.util.stream.Nodes.builder(long, java.util.function.IntFunction) @bci=26, line=169 (Interpreted frame)
 - java.util.stream.ReferencePipeline.makeNodeBuilder(long, java.util.function.IntFunction) @bci=2, line=135 (Interpreted frame)
 - java.util.stream.AbstractPipeline.evaluate(java.util.Spliterator, boolean, java.util.function.IntFunction) @bci=23, line=573 (Interpreted frame)
 - java.util.stream.AbstractPipeline.evaluateToArrayNode(java.util.function.IntFunction) @bci=74, line=260 (Interpreted frame)
 - java.util.stream.ReferencePipeline.toArray(java.util.function.IntFunction) @bci=4, line=616 (Interpreted frame)
 - jdk.internal.jshell.tool.JShellTool$Options.selectOptions(java.util.function.Predicate) @bci=35, line=326 (Interpreted frame)
 - jdk.internal.jshell.tool.JShellTool$Options.remoteVmOptions() @bci=6, line=330 (Interpreted frame)
 - jdk.internal.jshell.tool.JShellTool.resetState() @bci=148, line=1106 (Interpreted frame)
 - jdk.internal.jshell.tool.JShellTool.start(java.lang.String[]) @bci=44, line=955 (Interpreted frame)
 - jdk.internal.jshell.tool.JShellToolBuilder.start(java.lang.String[]) @bci=5, line=261 (Interpreted frame)
 - jdk.internal.jshell.tool.JShellToolProvider.main(java.lang.String[]) @bci=4, line=120 (Interpreted frame)

<snip>

jhsdb jstack stderr:

###### End of all output for iteration 1 which took 2711ms
[2023-09-17T22:21:59.781166361Z] Waiting for completion for process 590855
[2023-09-17T22:21:59.781288311Z] Waiting for completion finished for process 590855
###### Starting jstack iteration 2 against 590799
[2023-09-17T22:21:59.799234929Z] Gathering output for process 591292
[2023-09-17T22:22:01.895444885Z] Waiting for completion for process 591292
[2023-09-17T22:22:01.895737860Z] Waiting for completion finished for process 591292
Output and diagnostic info for process 591292 was saved into 'pid-591292-output.log'
jhsdb jstack stdout:
Attaching to process ID 590799, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 22-internal-2023-09-17-1736580.daniel.daugherty.8315880forjdk22.git
Deadlock Detection:


jhsdb jstack stderr:
java.lang.InternalError: We should have found a thread that owns the anonymous lock
	at jdk.hotspot.agent/sun.jvm.hotspot.runtime.Threads.owningThreadFromMonitor(Threads.java:244)
	at jdk.hotspot.agent/sun.jvm.hotspot.runtime.DeadlockDetector.print(DeadlockDetector.java:88)
	at jdk.hotspot.agent/sun.jvm.hotspot.runtime.DeadlockDetector.print(DeadlockDetector.java:39)
	at jdk.hotspot.agent/sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:62)
	at jdk.hotspot.agent/sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:45)
	at jdk.hotspot.agent/sun.jvm.hotspot.tools.JStack.run(JStack.java:67)
	at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.startInternal(Tool.java:278)
	at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.start(Tool.java:241)
	at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.execute(Tool.java:134)
	at jdk.hotspot.agent/sun.jvm.hotspot.tools.JStack.runWithArgs(JStack.java:90)
	at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.runJSTACK(SALauncher.java:302)
	at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.main(SALauncher.java:500)

###### End of all output for iteration 2 which took 2112ms
[2023-09-17T22:22:01.896570437Z] Waiting for completion for process 591292
[2023-09-17T22:22:01.896724810Z] Waiting for completion finished for process 591292
[2023-09-17T22:22:01.896844778Z] Waiting for completion for process 591292
[2023-09-17T22:22:01.896938326Z] Waiting for completion finished for process 591292
[2023-09-17T22:22:01.910460800Z] Waiting for completion for process 591292
[2023-09-17T22:22:01.910652533Z] Waiting for completion finished for process 591292
[JShell] |  Welcome to JShell -- Version 22-internal
[JShell] |  For an introduction type: /help intro
[JShell] |  Goodbye
jshell Output: |  Welcome to JShell -- Version 22-internal
|  For an introduction type: /help intro
|  Goodbye

jshell> 
----------System.err:(40/2464)----------
 stdout: [Attaching to process ID 590799, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 22-internal-2023-09-17-1736580.daniel.daugherty.8315880forjdk22.git
Deadlock Detection:

];
 stderr: [java.lang.InternalError: We should have found a thread that owns the anonymous lock
	at jdk.hotspot.agent/sun.jvm.hotspot.runtime.Threads.owningThreadFromMonitor(Threads.java:244)
	at jdk.hotspot.agent/sun.jvm.hotspot.runtime.DeadlockDetector.print(DeadlockDetector.java:88)
	at jdk.hotspot.agent/sun.jvm.hotspot.runtime.DeadlockDetector.print(DeadlockDetector.java:39)
	at jdk.hotspot.agent/sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:62)
	at jdk.hotspot.agent/sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:45)
	at jdk.hotspot.agent/sun.jvm.hotspot.tools.JStack.run(JStack.java:67)
	at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.startInternal(Tool.java:278)
	at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.start(Tool.java:241)
	at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.execute(Tool.java:134)
	at jdk.hotspot.agent/sun.jvm.hotspot.tools.JStack.runWithArgs(JStack.java:90)
	at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.runJSTACK(SALauncher.java:302)
	at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.main(SALauncher.java:500)
]
 exitValue = 1

[JShell] jshell> 
java.lang.RuntimeException: Test ERROR java.lang.RuntimeException: Expected to get exit value of [0], exit value is: [1]
	at JStackStressTest.testjstack(JStackStressTest.java:78)
	at JStackStressTest.main(JStackStressTest.java:107)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
	at java.base/java.lang.reflect.Method.invoke(Method.java:580)
	at com.sun.javatest.regtest.agent.MainWrapper$MainTask.run(MainWrapper.java:138)
	at java.base/java.lang.Thread.run(Thread.java:1570)
Caused by: java.lang.RuntimeException: Expected to get exit value of [0], exit value is: [1]
	at jdk.test.lib.process.OutputAnalyzer.shouldHaveExitValue(OutputAnalyzer.java:490)
	at JStackStressTest.testjstack(JStackStressTest.java:70)
	... 5 more

JavaTest Message: Test threw exception: java.lang.RuntimeException: Test ERROR java.lang.RuntimeException: Expected to get exit value of [0], exit value is: [1]
JavaTest Message: shutting down test

STATUS:Failed.`main' threw exception: java.lang.RuntimeException: Test ERROR java.lang.RuntimeException: Expected to get exit value of [0], exit value is: [1]
----------rerun:(38/7573)*----------

This error mesg from jhsdb:

> jhsdb jstack stderr:
> java.lang.InternalError: We should have found a thread that owns the
> anonymous lock

could indicate a bug in the LM_LIGHTWEIGHT locking code.
I have not seen any sightings of this test failing like this in
the Oracle CIs. I'm starting this bug off in hotspot/runtime for
initial triage.
Comments
A pull request was submitted for review. URL: https://git.openjdk.org/lilliput-jdk17u/pull/71 Date: 2024-04-05 15:31:29 +0000
05-04-2024

[jdk21u-fix-request] Approval Request from Roman Kennke The fix relaxes an exception in the SA related to lightweight locking to print a warning instead. The fix is in mainline since end of september and has not accrued a bug-tail. Lightweight locking has been implemented in JDK21 but is not enabled by default.
24-11-2023

A pull request was submitted for review. URL: https://git.openjdk.org/jdk21u/pull/402 Date: 2023-11-24 13:25:26 +0000
24-11-2023

Changeset: bd918f49 Author: Roman Kennke <rkennke@openjdk.org> Date: 2023-09-29 11:14:13 +0000 URL: https://git.openjdk.org/jdk/commit/bd918f49d29bcbc699e07b4ef8d23cfe1abd32df
29-09-2023

Yep sorry that is what I meant.
28-09-2023

> So I think all we can do here in this JBS issue is change the error message to something less dramatic. There error message isn't the problem. It's the exception being thrown. So the fix is don't throw the exception and instead provide just the error (or warning) message.
28-09-2023

Thanks [~dlong] I had missed that! As Chris says this is just an intermediate state that we happen to observe. I'm wondering though whether we can detect that a GC is in progress and so report the owner can't be located rather than we have an error. In theory we could teach the SA to look for forwarding pointers but I'm not sure it is worth the effort - we need to examine the internals of every object the lockStacks. And we would need to check in both directions. It may also be something that is GC specific. So I think all we can do here in this JBS issue is change the error message to something less dramatic.
28-09-2023

[~dholmes]For EventSetImpl objects, elementData and pkt fields being the same is sure sign that they are the same objects. [~dlong] Well, not missing, but GC hasn't gotten to it yet. This is the type of situation that can arise what when SA inspects the JVM while it's in the middle of a GC.
27-09-2023

Nice catch. The mark word of the first object has the same value 0xd050ce33 as the second object, so it appears the first is being forwarded to the second. Does this mean there is a missing resolve() call?
27-09-2023

[~cjplummer] had a theory that we may be inspecting things during a GC and so the target oop may have been moved, but the move is not yet reflected in the oop stored in the lockStack - hence it is not found. This seems to be the case. I added some diagnostics to print information about the target object and the threads and their lockstacks at the time of the failure. Here is the object we are looking for: Oop for com/sun/tools/jdi/EventSetImpl @ 0x00000000d07fd688 (object size = 40) - _mark: {0} :3494956595 - _metadata._compressed_klass: {8} :InstanceKlass for com/sun/tools/jdi/EventSetImpl - modCount: {12} :0 - elementData: {20} :ObjArray @ 0x00000000d004ac18 - size: {16} :0 - vm: {28} :Oop for com/sun/tools/jdi/VirtualMachineImpl @ 0x00000000d076a9f0 - pkt: {32} :Oop for com/sun/tools/jdi/Packet @ 0x00000000d07fd5f8 - suspendPolicy: {24} :0 - internalEventSet: {36} :null and here is the lockStack[0] entry for the "JDI Internal Event Handler" Oop for com/sun/tools/jdi/EventSetImpl @ 0x00000000d050ce30 (object size = 40) - _mark: {0} :2143279601282 - _metadata._compressed_klass: {8} :InstanceKlass for com/sun/tools/jdi/EventSetImpl - modCount: {12} :0 - elementData: {20} :ObjArray @ 0x00000000d004ac18 - size: {16} :0 - vm: {28} :Oop for com/sun/tools/jdi/VirtualMachineImpl @ 0x00000000d076a9f0 - pkt: {32} :Oop for com/sun/tools/jdi/Packet @ 0x00000000d07fd5f8 - suspendPolicy: {24} :0 - internalEventSet: {36} :null Whilst the "addresses" of these two oops are different, the fields appear to have the same values, indicating they are in fact the same object. Also note that the oop we are looking for has a markWord that indicates that the oop has been marked by the GC.
27-09-2023

Ah! Now I understand. So not at a safepoint but in a fixed state. In that case this is not a SA issue as the anonymous owner should indeed be found.
26-09-2023

There is no snapshot. The process is suspended and its thread and memory state are analyzed. There is no execution of the process while this is done. If SA resumes the process (which is what SA calls a "detach"), then SA drops all knowledge it had of the process that it learned while it was suspended, and will start from scratch the next time the process is attached to and suspended again.
26-09-2023

If the thread snapshot and heap snapshot are not taken together atomically - which they are not for a live process - then the two can easily be inconsistent with each other. In that case any "errors" should not produce warnings because there is nothing (likely) wrong with the application code and so nothing for the end user to fix based on the warning. The flaw is in the SA methodology when applied to live processes.
26-09-2023

It is not (usually) at a safepoint, and it does (usually) still work well. However, the whole point of this test is to at least sometime have the JVM in a state where it can't work, and make sure there is some sort of reasonable or graceful failure. The SA jstack code is suppose to catch these failures and print a warning message, and not allow the failure exception to propagate, resulting in an exit(1). Note SA is mostly meant for debugging a hung or crashed process (core dump) where there is little else that can be done but attempt to get data from a JVM whose current state my be inconsistent. This is very different from the jcmd jstack or attach API stackdump (dumpstack?) support, which is done on a running JVM that can be forced to safepoint (if it is not stuck) before dumping the stack. If the process is stuck, then you are back to using SA.
26-09-2023

[~cjplummer] surely jstack does its thread dump and deadlock detection at a safepoint? It could not reasonably work otherwise.
26-09-2023

SA makes a best effort to do whatever the user requested, with the understanding that there may be inaccuracies or errors (exceptions) due to the inconsistent state of the JVM. So it has a "try and and prepare to fail approach". If it had a "don't try because it might not be safe approach", most of the SA functionality would be eliminated.
25-09-2023

A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/15907 Date: 2023-09-25 16:16:51 +0000
25-09-2023

Ok, that is interesting to know. So I think that this particular code path should not iterate threads to begin with. If it encounters an anon-locked object, it should just say so (or ignore it). Iterating threads outside of a safepoint might not only lead to wrong results, IOW, threads would change their locked state at any time, but even worse, they may go out of existence or new threads may appear, possibly leading to crashes or other errors. It looks to me like this code is only ever called from printing code. And since it's unprecise to begin with for the above mentioned reasons, even with the old stack-locking impl, we could simply remove the line that throws the exception and return null instead. It looks to me like the old stack-locking code (in owningThreadFromMonitor(Address)) also iterates threads. Is that safe (enough)? In particular, when new threads are added to the threads list or threads cease to exist?
25-09-2023

This particular test is one that does attach to an active process, and then does a stack dump. SA is expected to have issues on occasion with the stack dump, but SA to deal with them gracefully, and not produce "Error" messages or exit with an exception. So if anything the issue here might just be that the error message and exception is unexpected, but the error itself is not. SA might just need to learn to expect this issue during the deadlock detection.
25-09-2023

I should add that in order to avoid many test failures due to this, we try to make sure the test's debuggee process has reach some sort of stable point, and is hopefully idle at the time of the attach.
25-09-2023

[~rkennke] > I am trying to reproduce. In the meantime, can you confirm if the relevant SA code runs only at a safepoint? Because if not, then this would be a race. I can confirm that that is NOT the case. SA can attach to the JVM at any point during the JVM's execution. Hotspot data structures can possibly be in an inconsistent state when the attach is done (or the core is dump). SA attempts to at least detect errors and catch exceptions that arise from this, but often falls on it's face when the timing of the attach is "unfortunate".
25-09-2023

Is there anything special needed to reproduce the failure? I'm trying: CONF=linux-x86_64-server-fastdebug make run-test TEST='sun/tools/jhsdb/JStackStressTest.java' But several attempts did not fail.
25-09-2023

I am trying to reproduce. In the meantime, can you confirm if the relevant SA code runs only at a safepoint? Because if not, then this would be a race.
25-09-2023

FWIW we are starting to see a number of new bugs being filed with regard to broken locking with the new locking mode ref: JDK-8316746, JDK-8316751.
24-09-2023

[~mseledtsov] > it looks like SA might need to know about anonymous owners for light-weight locking mode. Based on this part of the error mesg: > We should have found a thread that owns the anonymous lock I think we have at least some support for LM_LIGHTWEIGHT locking in SA. "anonymous lock" is a concept that came in with that code IIRC...
22-09-2023

The fix for: JDK-8315880 Change LockingMode default from LM_LEGACY to LM_LIGHTWEIGHT was integrated in jdk-22+17-1240 and this failure in the JDK22 CI happened after that integration.
21-09-2023

I don't believe this relates to the reason the test was problem listed. More likely it has to do with using LM_LIGHTWEIGHT. I wonder if this is related to the problem seen in JDK-8316417, which is currently out for review. Although JDK-8316417 was only noticed by observation, not any test failure, I wonder if using LM_LIGHTWEIGHT has made it start to cause a problem with this test, and fix for JDK-8316417 will fix this CR also. UPDATE: Actually now I don't think this issue will be addressed by JDK-8316417. I think Mikhailo is closer to pointing out the underlying issue.
19-09-2023

Runtime Triage: change the sub-component to SA because it looks like SA might need to know about anonymous owners for light-weight locking mode.
19-09-2023

Thanks for the heads up!
18-09-2023

Note this test was only just removed from the ProblemList - see JDK-8316148
18-09-2023