Bug ID: JDK-8290043 serviceability/attach/ConcAttachTest.java failed "guarantee(!CheckJNICalls) failed: Attached JNI thread exited without being detached"

JDK-8290043 : serviceability/attach/ConcAttachTest.java failed "guarantee(!CheckJNICalls) failed: Attached JNI thread exited without being detached"

Type: Bug
Component: hotspot
Sub-Component: runtime
Affected Version: 17,20,21

Priority: P3
Status: Resolved
Resolution: Fixed
OS: linux
CPU: x86_64,aarch64

Submitted: 2022-07-09
Updated: 2025-02-11
Resolved: 2025-01-14

Versions (Unresolved/Resolved/Fixed)

The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.

JDK 24	JDK 25
24.0.1Fixed	25 b06Fixed

Related Reports

Relates :	JDK-8252921 - NMT overwrite memory type for region assert when building dynamic archive
Relates :	JDK-8198226 - os::attempt_reserve_memory_at records reserved memory twice
Relates :	JDK-8002273 - NMT to report JNI memory leaks when -Xcheck:jni is on
Relates :	JDK-8225690 - Multiple AttachListener threads can be created

Sub Tasks

JDK-8292061 :

ProblemList serviceability/attach/ConcAttachTest.java on linux-all - Resolved

Description

The following test failed in the JDK20 CI:

serviceability/attach/ConcAttachTest.java

Here's a snippet from the log file:

----------stdout:(21/1864)----------
# To suppress the following error report, specify this argument
# after -XX: or in .hotspotrc:  SuppressErrorAt=/virtualMemoryTracker.cpp:363
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/opt/mach5/mesos/work_dir/slaves/0c72054a-24ab-4dbb-944f-97f9341a1b96-S8422/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/6e89d8b9-911d-425b-9920-73a9bd0caece/runs/93ec339e-22f4-47ee-8e3a-ddf8e7c3dd20/workspace/open/src/hotspot/share/services/virtualMemoryTracker.cpp:363), pid=1381980, tid=1391565
#  guarantee(!CheckJNICalls) failed: Attached JNI thread exited without being detached
#
# JRE version: Java(TM) SE Runtime Environment (20.0+6) (fastdebug build 20-ea+6-268)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 20-ea+6-268, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0x1b60ff7]  VirtualMemoryTracker::add_reserved_region(unsigned char*, unsigned long, NativeCallStack const&, MEMFLAGS)+0x1037
#
# Core dump will be written. Default location: /opt/mach5/mesos/work_dir/slaves/0c72054a-24ab-4dbb-944f-97f9341a1b96-S19281/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/6eacd903-d30d-438a-ba0a-9ab1aca97499/runs/7ce86dcf-cf49-464a-ac9e-92e9c6ca9f68/testoutput/test-support/jtreg_open_test_hotspot_jtreg_tier1/scratch/3/core
#
# An error report file with more information is saved as:
# /opt/mach5/mesos/work_dir/slaves/0c72054a-24ab-4dbb-944f-97f9341a1b96-S19281/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/6eacd903-d30d-438a-ba0a-9ab1aca97499/runs/7ce86dcf-cf49-464a-ac9e-92e9c6ca9f68/testoutput/test-support/jtreg_open_test_hotspot_jtreg_tier1/scratch/3/hs_err_pid1381980.log
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
#
result: Error. Agent communication error: java.io.EOFException; check console log for any additional details


Here's the crashing thread's stack:

---------------  T H R E A D  ---------------

Current thread (0x00007f73c003e1b0):  JavaThread "process reaper" daemon [_thread_new, id=1391565, stack(0x00007f73fc8e9000,0x00007f73fc90f000)]

Stack: [0x00007f73fc8e9000,0x00007f73fc90f000],  sp=0x00007f73fc90dce0,  free space=147k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x1b60ff7]  VirtualMemoryTracker::add_reserved_region(unsigned char*, unsigned long, NativeCallStack const&, MEMFLAGS)+0x1037
V  [libjvm.so+0x1a9f10f]  ThreadStackTracker::new_thread_stack(void*, unsigned long, NativeCallStack const&)+0x4f
V  [libjvm.so+0x1a876f8]  Thread::register_thread_stack_with_NMT()+0x88
V  [libjvm.so+0x1a88ded]  Thread::call_run()+0x7d
V  [libjvm.so+0x174bf14]  thread_native_entry(Thread*)+0x104

Comments

[~epavlova] that test is mentioned a couple of times already.
11-02-2025
One more test crashed because of this: java/foreign/TestUpcallAsync.java
10-02-2025
Fix Request: JDK 24u This fix addresses an accounting error with NMT in relation to thread stacks. In a debug build this can lead to assertion failures, and in a product build it can produce incorrect/misleading NMT data. The underlying cause and fix were quite straight-forward: the NMT accounting was missing for JNI detaching threads. The fix is low-risk and has been baking in mainline for a little while now. The backport is clean. Thanks.
22-01-2025
A pull request was submitted for review. Branch: master URL: https://git.openjdk.org/jdk24u/pull/26 Date: 2025-01-22 02:34:47 +0000
22-01-2025
This bug was introduced by JDK-8252921 which moved the unregister call from the thread destructor (because it was not called for all threads) to the post_run() method. But this totally overlooked native threads that attach and detach! This is easily fixed by moving the unregister call to the end of JavaThread::exit. [EDIT: actual fix was different as there was a broader problem.] I think the problem was exacerbated/obscured by the fact the code allows for recursive registration (see JDK-8198226), which means that the duplicate thread stacks when a native thread re-attached, was (incorrectly) considered to be a recursive definition - which it was not. This means that if we do have a case of a native thread not detaching before exiting and the OS reuses the stack, then we will only detect it if it is of a different size - that is a bug that needs fixing. But it is unclear to me whether "recursive registration" is actually a thing these days as JDK-8198226 was closed with: "pd_attempt_reserve_memory_at no longer calls os::reserve_memory in any implementation. Therefore there is no longer any double accounting, so I'm closing this as Not an Issue."
14-01-2025
Changeset: 9b1bed0a Branch: master Author: David Holmes <dholmes@openjdk.org> Date: 2025-01-14 19:49:55 +0000 URL: https://git.openjdk.org/jdk/commit/9b1bed0aa416c615a81d429e2f1f33bc4f679109
14-01-2025
A pull request was submitted for review. Branch: master URL: https://git.openjdk.org/jdk/pull/22924 Date: 2025-01-06 09:33:31 +0000
06-01-2025
Okay this is at least in part a NMT issue. With additional logging I can see that the same section of virtual memory is being reserved over and over: [1.019s][debug][nmt,thread] Add reserved region 'Thread Stack' (0x0000000171af8000, 1048576) test TestUpcallAsync.testUpcallsAsync(0, "f0_V__", VOID, [], []): success [1.067s][debug][nmt,thread] Add reserved region 'Thread Stack' (0x0000000171af8000, 1048576) test TestUpcallAsync.testUpcallsAsync(17, "f0_V_S_DI", VOID, [STRUCT], [DOUBLE, INT]): success [1.104s][debug][nmt,thread] Add reserved region 'Thread Stack' (0x0000000171af8000, 1048576) test TestUpcallAsync.testUpcallsAsync(34, "f0_V_S_IDF", VOID, [STRUCT], [INT, DOUBLE, FLOAT]): success [1.114s][debug][nmt,thread] Add reserved region 'Thread Stack' (0x0000000171af8000, 1048576) test TestUpcallAsync.testUpcallsAsync(51, "f0_V_S_FDD", VOID, [STRUCT], [FLOAT, DOUBLE, DOUBLE]): success [1.126s][debug][nmt,thread] Add reserved region 'Thread Stack' (0x0000000171af8000, 1048576) test TestUpcallAsync.testUpcallsAsync(68, "f0_V_S_DDP", VOID, [STRUCT], [DOUBLE, DOUBLE, POINTER]): success [1.136s][debug][nmt,thread] Add reserved region 'Thread Stack' (0x0000000171af8000, 1048576) test TestUpcallAsync.testUpcallsAsync(85, "f0_V_S_PPI", VOID, [STRUCT], [POINTER, POINTER, INT]): success [1.148s][debug][nmt,thread] Add reserved region 'Thread Stack' (0x0000000171af8000, 1048576) test TestUpcallAsync.testUpcallsAsync(102, "f0_V_IS_FF", VOID, [INT, STRUCT], [FLOAT, FLOAT]): success [1.162s][debug][nmt,thread] Add reserved region 'Thread Stack' (0x0000000171af8000, 1048576) test TestUpcallAsync.testUpcallsAsync(119, "f0_V_IS_IFD", VOID, [INT, STRUCT], [INT, FLOAT, DOUBLE]): success [1.171s][debug][nmt,thread] Add reserved region 'Thread Stack' (0x0000000171af8000, 1048576) test TestUpcallAsync.testUpcallsAsync(136, "f0_V_IS_FFP", VOID, [INT, STRUCT], [FLOAT, FLOAT, POINTER]): success [1.179s][debug][nmt,thread] Add reserved region 'Thread Stack' (0x0000000171af8000, 1048576) test TestUpcallAsync.testUpcallsAsync(153, "f0_V_IS_DDI", VOID, [INT, STRUCT], [DOUBLE, DOUBLE, INT]): success [1.187s][debug][nmt,thread] Add reserved region 'Thread Stack' (0x0000000171af8000, 1048576) test TestUpcallAsync.testUpcallsAsync(170, "f0_V_IS_PDF", VOID, [INT, STRUCT], [POINTER, DOUBLE, FLOAT]): success [1.190s][debug][nmt,thread] Add reserved region 'Thread Stack' (0x0000000171af8000, 2097152) [1.190s][debug][nmt,thread] XXX reserved region 'Thread Stack' (0x0000000171af8000, 1048576) This works because of the same_region check that is present for recursive definitions, In the failing case the reserved region has the same base but a new size (typically because the stack is for a compiler thread and it uses a different stack size). So we take the path that hits the guarantee.
20-12-2024
Found the underlying problem: we don't de-register the thread stacks for detaching Java threads! It is only done in JavaThread::post_run() which is never executed by a native thread that attaches/detaches.
20-12-2024
An interesting observation. In a failing case the log for TestUpCallAsync has the final entry before the crash as: test TestUpcallAsync.testUpcallsAsync(7939, "f13_F_FDS_PDD", NON_VOID, [FLOAT, DOUBLE, STRUCT], [POINTER, DOUBLE, DOUBLE]): success and the hs_err file shows there are 12 JavaThreads, all of which are the expected system threads plus main. However, I added further logging to the NMT ThreadStackTracker and in a successful run of the test when we get to that log line we see: test TestUpcallAsync.testUpcallsAsync(7939, "f13_F_FDS_PDD", NON_VOID, [FLOAT, DOUBLE, STRUCT], [POINTER, DOUBLE, DOUBLE]): success [4.062s][debug][nmt,thread] Add reserved region 'Thread Stack' (0x00007f95b19e4000, 1048576) [4.062s][debug][nmt,thread] Added thread 509: (0x00007f95b19e4000,1048576) [4.062s][debug][nmt,thread] Add committed region 'Thread Stack'(0x00007f95b19e4000, 16384) Succeeded test TestUpcallAsync.testUpcallsAsync(7956, "f13_F_FPS_IP", NON_VOID, [FLOAT, POINTER, STRUCT], [INT, POINTER]): success which indicates that NMT still considers there to 509 active threads! So in a successful we create hundreds of non-terminating threads, but in a crashing run we don't.
20-12-2024
There is no sign of any failures during region removals. In one case I see: [4.407s][debug][nmt,thread] Remove uncommitted region 'Thread Stack' (0x000000016db3c000, 2097152) Succeeded [4.407s][debug][nmt,thread] Removed region 'Thread Stack' (0x000000016db3c000, 2097152) from _reserved_regions Succeeded [4.407s][debug][nmt,thread] Remove uncommitted region 'Thread Stack' (0x000000016e160000, 2097152) Succeeded [4.407s][debug][nmt,thread] Removed region 'Thread Stack' (0x000000016e160000, 2097152) from _reserved_regions Succeeded [4.722s][debug][nmt,thread] Add reserved region 'Thread Stack' (0x000000016db3c000, 2097152) [4.722s][debug][nmt,thread] XXX reserved region 'Thread Stack' (0x000000016db3c000, 1048576) # # A fatal error has been detected by the Java Runtime Environment: which shows a region with the same base, but a different size getting removed. What we don't see here is the region allocation ...
19-12-2024
I added some logging to show the details of the new stack region and the existing one just to check it is indeed a stack region and it seems that is the case. Here is the output from a failing run of java/foreign/loaderLookup/TestLoaderLookupJNI.java [0.980s][debug][nmt,thread] Add reserved region 'Thread Stack' (0x000000016dd88000, 2097152) [0.980s][debug][nmt,thread] XXX reserved region 'Thread Stack' (0x000000016dd88000, 1048576) # # A fatal error has been detected by the Java Runtime Environment: XXX is the pre-existing region. But according to previous analysis this means that a thread never detached and so its JavaThread still exists and so does its recorded stack mapping. Yet if we look in the hs_err file there is no sign of any such thread: Threads class SMR info: _java_thread_list=0x0000600000d796d0, length=12, elements={ 0x000000012d00bc10, 0x000000010500fc10, 0x000000012f00d010, 0x000000012a80ac10, 0x000000012f00de10, 0x000000012a80b610, 0x000000012a008210, 0x000000012e808810, 0x000000012a815010, 0x000000012a818610, 0x0000000105012010, 0x000000012f19ce10 } _java_thread_list_alloc_cnt=15, _java_thread_list_free_cnt=13, _java_thread_list_max=12, _nested_thread_list_max=0 _tlh_cnt=65, _tlh_times=0, avg_tlh_time=0.00, _tlh_time_max=0 _deleted_thread_cnt=1, _deleted_thread_times=0, avg_deleted_thread_time=0.00, _deleted_thread_time_max=0 _delete_lock_wait_cnt=0, _delete_lock_wait_max=0 _to_delete_list_cnt=0, _to_delete_list_max=1 Java Threads: ( => current thread ) 0x000000012d00bc10 JavaThread "main" [_thread_blocked, id=9987, stack(0x000000016b824000,0x000000016ba27000) (2060K)] 0x000000010500fc10 JavaThread "Reference Handler" daemon [_thread_blocked, id=24067, stack(0x000000016c910000,0x000000016cb13000) (2060K)] 0x000000012f00d010 JavaThread "Finalizer" daemon [_thread_blocked, id=24579, stack(0x000000016cb1c000,0x000000016cd1f000) (2060K)] 0x000000012a80ac10 JavaThread "Signal Dispatcher" daemon [_thread_blocked, id=29955, stack(0x000000016cd28000,0x000000016cf2b000) (2060K)] 0x000000012f00de10 JavaThread "Service Thread" daemon [_thread_blocked, id=25091, stack(0x000000016cf34000,0x000000016d137000) (2060K)] 0x000000012a80b610 JavaThread "Monitor Deflation Thread" daemon [_thread_blocked, id=29443, stack(0x000000016d140000,0x000000016d343000) (2060K)] 0x000000012a008210 JavaThread "C2 CompilerThread0" daemon [_thread_in_vm, id=29187, stack(0x000000016d34c000,0x000000016d54f000) (2060K)] 0x000000012e808810 JavaThread "C1 CompilerThread0" daemon [_thread_in_native, id=28931, stack(0x000000016d558000,0x000000016d75b000) (2060K)] 0x000000012a815010 JavaThread "Notification Thread" daemon [_thread_blocked, id=28675, stack(0x000000016d764000,0x000000016d967000) (2060K)] 0x000000012a818610 JavaThread "Common-Cleaner" daemon [_thread_blocked, id=26627, stack(0x000000016d970000,0x000000016db73000) (2060K)] 0x0000000105012010 JavaThread "MainThread" [_thread_in_vm, id=26883, stack(0x000000016db7c000,0x000000016dd7f000) (2060K)] =>0x000000012f19ce10 JavaThread "C2 CompilerThread1" daemon [_thread_new, id=27399, stack(0x000000016dd88000,0x000000016df8b000) (2060K)] Total: 12 Other Threads: 0x000000012da085c0 VMThread "VM Thread" [id=19459, stack(0x000000016c678000,0x000000016c87b000) (2060K)] 0x000000012d908cb0 WatcherThread "VM Periodic Task Thread" [id=17159, stack(0x000000016c46c000,0x000000016c66f000) (2060K)] 0x000000010500ee10 WorkerThread "GC Thread#0" [id=12803, stack(0x000000016ba30000,0x000000016bc33000) (2060K)] 0x000000012e811e10 ConcurrentGCThread "G1 Main Marker" [id=13059, stack(0x000000016bc3c000,0x000000016be3f000) (2060K)] 0x000000012e812810 WorkerThread "G1 Conc#0" [id=16387, stack(0x000000016be48000,0x000000016c04b000) (2060K)] 0x000000012e8b4010 ConcurrentGCThread "G1 Refine#0" [id=21251, stack(0x000000016c054000,0x000000016c257000) (2060K)] 0x000000012e8b4a10 ConcurrentGCThread "G1 Service" [id=16899, stack(0x000000016c260000,0x000000016c463000) (2060K)] Total: 7 Could it be that the problem actually lies with the removal of the previous stack mapping? Investigating ...
19-12-2024
[~cjplummer] The AttachListenerThreads are JavaThreads, not native threads that attach/detach explicitly. They cannot exit without "detaching". As ConcAttachTest has been on the PL for so long we don't know if it would even still exhibit this problem today. So I'm focusing on the more recent failures.
19-12-2024
Backing up a bit ... "Originally" NMT asserted that reserved regions could never overlap. Then it was discovered that if an attached thread exited without detaching then the stack regions could overlap because NMT was not informed that the thread had exited. Exiting without detaching is considered a programming error so JDK-8002273 added the original code to detect this with -Xcheck:jni + // Overlapping stack regions indicate that a JNI thread failed to + // detach from the VM before exiting. This leaks the JavaThread object. + if (CheckJNICalls) { + guarantee(FLAGS_TO_MEMORY_TYPE(reserved_region->flags()) != mtThreadStack \|\| + !reserved_region->overlaps_region(rec), + "Attached JNI thread exited without being detached"); + } and that code morphed into what we have today } else { assert(reserved_rgn->overlap_region(base_addr, size), "Must be"); // Overlapped reservation. // It can happen when the regions are thread stacks, as JNI // thread does not detach from VM before exits, and leads to // leak JavaThread object if (reserved_rgn->mem_tag() == mtThreadStack) { guarantee(!CheckJNICalls, "Attached JNI thread exited without being detached"); We started using -Xcheck:jni on a lot more tests and we started seeing this guarantee fail. The implication is that the tests have attached threads that exit without detaching - in which case we should adjust those tests so that they don't run with -Xcheck:jni, or if the exit-without-detach is unintentonal, then fix the test to do the detach. The tests we have seen fail are: - serviceability/attach/ConcAttachTest.java This was the original report back in 2022, and it was never determined what thread was apparently exiting without detaching - runtime/jni/nativeStack/TestNativeStack.java This cropped up in 2023, but again this test does not obviously have a thread that exits without detaching. - java/foreign/TestUpcallAsync.java The first of the new batch of failures. The test itself is not suspect, so we have to look to the upcall mechanism itself. This is harder to reason about. It should be the case that for every UpCallContext if we attach a non-attached thread then we will detach it when the UpCallContext is destroyed. - runtime/jni/terminatedThread/TestTerminatedThread.java The second of the new batch of failures. Now this test does have an attached thread that exits without detaching. And if the OS decided to re-allocate its stack to another thread then we will crash - as seen in the failing test cases. So this test needs to disable CheckJNICalls. - runtime/jni/codegenAttachThread/TestCodegenAttach.java A more recent failure. This test does create a native thread, but it does not exit without detaching ... unless there is a fatal error and exit() is called, but that should not be visible as the thread exit and process exit are co-existent. - java/foreign/loaderLookup/TestLoaderLookupJNI.java This test also creates a native thread, but again it detaches before exit. I took a look at the hs_err file from a TestCodegenAttach failure and the thread listing does not show any overlapping stack regions. Unfortunately we do not know what regions were being compared as that is not captured by the guarantee - so I think we need to fix that first and try to see what is actually happening. But I am suspecting a NMT bug.
19-12-2024
> - serviceability/attach/ConcAttachTest.java > > This was the original report back in 2022, and it was never determined what thread was apparently exiting without detaching This tests does a bunch of concurrent attaches. It is testing to make sure we don't end up with multiple Attach Listener threads (see JDK-8225690). I'm not sure how the fix was done, but possibly we still have multiple Attach Listener threads starting up, but all but one exits, and possibly these exiting threads are not detaching first (if indeed they even attach in the first place).
19-12-2024
Thanks [~dcubed]. I will take it.
18-12-2024
[~dholmes] - I'm no longer the RE for this bug. I don't remember grabbing it.
18-12-2024
[~dcubed] can we take over this bug from you?
13-12-2024
I ran the the ConcAttach test for 100 times for both x64 and aarch64 in our CI, no crashes when the guarantee is removed.
12-12-2024
Maybe the guarantee should be removed from NMT? It seems strange to me that NMT guarantees something about an entirely different piece of machinery. This is way back from Zhengyu's implementation in 2014. From reading the code: NMT isn't going to crash if we remove the guarantee. EDIT: I'm running a build with the guarantee removed now, trying to hit the issue and see if we have a new error.
12-12-2024
Okay we need NMT experts to step in an address [~jsjolen]'s comment. Are there other cases where regions can legitmately overlap? Otherwise do we have a NMT bug here? Note that multiple tests can fail this way, not just serviceability/attach/ConcAttachTest.java
12-12-2024
+1 for removing the guarantee(). I think it highlights the accounting problem for threads that are not detaching and thus do not unregister their thread stack regions? Existing code tries to compensate for that case, and it does not really need the guarantee() for the rest to work.
12-12-2024
Just to pile on a bit, we have recently caught a similar-stacked crash in pre-prod environments running 17.0.13 with -Xcheck:jni -XX:NativeMemoryTracking=summary. The workload does use JNI heavily and threads seem to churn quite a bit there. Unfortunately, I do not have a standalone reproducer. This is to highlight it does happen in the wild, not only in OpenJDK tests. :)
11-12-2024
This is getting quite noisy in the CI - over 50 links.
10-12-2024
I'm coming to the conclusion the guarantee is incorrect as the threads do not appear to be exiting while sttached.
29-11-2024
Need to check affected tests but may simply be we need to explicitly disable -Xcheck:jni as they do have JNI attached threads that don't detach.
25-11-2024
FYI the resurgence in sightings is due to -Xcheck:jni now being applied to more test tasks after JDK-8344585
25-11-2024
We have this in virtualMemoryTracker.cpp: // Overlapped reservation. // It can happen when the regions are thread stacks, as JNI // thread does not detach from VM before exits, and leads to // leak JavaThread object if (reserved_rgn->flag() == mtThreadStack) { guarantee(!CheckJNICalls, "Attached JNI thread exited without being detached"); // Overwrite with new region ... Is that guarantee correct? There's not necessarily a bug in NMT as much as there might an outdated guarantee.
22-11-2024
Reopening the ticket in order to avoid a failure related to ProblemListing.
26-07-2024
[~dholmes], thank you for explaining, I wasn't aware of that. I'm re-opening the ticket.
26-07-2024
[~jsjolen] we can't just close this as it is associated with a ProblemListed test, and that will now cause the PL verification logic to flag this as a malformed PL entry. I would suggest just keeping this open and then close as duplicate if the VMATree rewrite does fix it.
26-07-2024
Will most probably be fixed when VirtualMemoryTracker has been ported to use VMATree
25-07-2024
Continuing with my comment 2 above regarding SIGQUIT and all the extra thread dumps I see in the LingeredApp output, I think I know what is going on. It looks like SIGQUIT is only needed when the socket file (.java_pid<pid>) doesn't already exist. Once a SIGQUIT is sent, it should get created and subsequent attaches won't need to send a SIGQUIT again. However, when there are multiple attaches racing to be the first attach, they will all send a SIGQUIT and wait for the file to be created. The first SIGQUIT to arrive will start the process of starting up the listener thread and creating the socket file, and all the other SIGQUITs probably are ignored by the attach mechanism and are passed through to the exception handler to do the thread dump. Generally speaking that means that whenever you do an attach, you might end up triggering a thread dump if someone else is trying to do the same at the same time (and this is the first attach to the process).
09-11-2023
There's also a minor bug in the test. It does the "vm = VirtualMachine.attach()" in a try block and then "vm.detach()" in the finally block. The problem is if the attach() failed then vm will be null, resulting in an NPE. So the test fails with the NPE rather than the exception thrown during the attach() call. There's no need for the detach() (or anything else) to be in the finally block.
08-11-2023
One interesting thing I just noticed about linux runs of this test is that stderr for the LingeredApp contains a bunch of thread dumps. Initially they can be confused for the dump you get from the Thread.print jcmd that the test is also doing, but since this is LingeredApp output and not the main test output, it can't be from the jcmd. I believe these thread dumps being generated due to SIGQUITs being received. The Attach API sends a SIGQUIT to initiate the attach handshake. I believe it retries if there is no response within a short period of time. So I'm guessing some of these initial SIGQUIT handshakes are not being handled by the attach listener and instead are passed on to the SIGQUIT handler, which does the thread dump. Eventually the Attach API retries with another SIGQUIT that is successful. This might be normal behavior when doing a bunch of simultaneous attaches. Maybe the first SIGQUIT triggers the starting of the listener thread, and subsequent SIGQUITS are missed until the listener thread is done starting up. However, I'm only seeing it Linux, not OSX.
08-11-2023
I would be more suspicious of a bug relating to virtual memory tracking than having a thread actually exit without detaching.
09-08-2022
This intermittent test failure occurs in tier4-rt-check-jni test tasks on linux-aarch64 and linux-x64. The -Xcheck:jni option is catching the fact that the "process reaper" daemon thread is in VirtualMemoryTracker::add_reserved_region() for its stack and runs into an overlap with an existing reserved region. That should only happen when a JNI attached thread has exited without detaching. Since the -Xcheck:jni option is specified, the guarantee() catches this situation and we crash to report the JNI issue/problem. It's not clear what JNI attached thread has exited from the log files.
08-08-2022
ILW = HLL = P4
12-07-2022