JDK-8344671 : Few JFR streaming tests fail with application not alive error on MacOS 15
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 21,24,25
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • OS: os_x
  • CPU: x86_64,aarch64
  • Submitted: 2024-11-21
  • Updated: 2025-06-25
  • Resolved: 2025-04-02
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 17 JDK 21 JDK 25
17.0.17Fixed 21.0.9-oracleFixed 25 b17Fixed
Related Reports
Duplicate :  
Duplicate :  
Duplicate :  
Relates :  
Sub Tasks
JDK-8351987 :  
Description
Tests that fail:
  jdk/jfr/api/consumer/streaming/TestJVMCrash.java
  jdk/jfr/api/consumer/streaming/TestJVMExit.java
  jdk/jfr/api/consumer/streaming/TestOutOfProcessMigration.java 

How they fail:
  - see many repeated messages "Application not alive when..." or "Process  is no longer alive, exit value = 131". 
  - eventually tests time out
Comments
Fix request [21u,17u] I backport this for parity with 21.0.9-oracle,17.0.17-oracle. Low risk, limited to mac and attaching to JVM Resolved Problemlist, 17 clean backport from 21. SAP nightly testing passed including mentioned tests.
18-06-2025

A pull request was submitted for review. Branch: master URL: https://git.openjdk.org/jdk21u-dev/pull/1874 Date: 2025-06-15 17:43:46 +0000
15-06-2025

A pull request was submitted for review. Branch: master URL: https://git.openjdk.org/jdk17u-dev/pull/3638 Date: 2025-06-15 17:42:07 +0000
15-06-2025

We are past RDP2 for 24.0.2. If this issue is critical, please follow the critical request process detailed at https://wiki.openjdk.org/display/JDKUpdates/JDK+24u
08-05-2025

[jdk24u-fix-request] Approval Request from Ivan Bereziuk
08-05-2025

A pull request was submitted for review. Branch: master URL: https://git.openjdk.org/jdk24u/pull/212 Date: 2025-04-29 14:17:33 +0000
01-05-2025

I would like to backport the fix to older LTSes as they seem to be affected as well. Please approve backport to 24u. Apart from ProblemList.txt file changes (the tests are not problem listed on older LTSes), the fix applied without further conflicts.
29-04-2025

Changeset: d979bd85 Branch: master Author: Larry Cable <larry.cable@yahoo.com> Committer: Kevin Walls <kevinw@openjdk.org> Date: 2025-04-02 16:00:14 +0000 URL: https://git.openjdk.org/jdk/commit/d979bd859215a16e6398ae627acfd40e8d71102c
02-04-2025

A pull request was submitted for review. Branch: master URL: https://git.openjdk.org/jdk/pull/24085 Date: 2025-03-17 18:26:57 +0000
26-03-2025

[~lcable] Bugs are automatically marked as Resolved->Fixed when the associated changeset is integrated.
18-03-2025

Test are being ProblemListed.
13-03-2025

If resolving the product bug becomes too time-consuming and noisy in Mach5, we could consider problem-listing the tests on platforms that fail. Since the tests uncovered an actual issue, I don't think we should modify them.
26-02-2025

> I think for now we can harden the tests as suggested whilst a RFE is filed against the attach mechanism. I've filed https://bugs.openjdk.org/browse/JDK-8350766 to track the improvement in the VM attach mechanism for macosx.
26-02-2025

we encountered this issue with the JMS agent inadvertently killing JVM processes by attaching prematurely... we have a small fragment of native code to obtain the signal masks ... I can add this easily to remedy the issue, am assigning this to me.
25-02-2025

Per off-line discussions the Linux code for the attach mechanism tries to be a little more clever in deciding whether to send SIGQUIT by looking for an installed signal handler: https://github.com/openjdk/jdk/blob/master/src/jdk.attach/linux/classes/sun/tools/attach/VirtualMachineImpl.java#L89 which was added fairly recently by JDK-8342449. So we may be able to improve the robustness on macOS as well if there is some kind of system API to query the target process. I think for now we can harden the tests as suggested whilst a RFE is filed against the attach mechanism.
18-02-2025

I had a look at this one today. I have a local macosx aarch64 which is currently on 15.1.1 and I can reproduce this failures. The 3 failing tests are: jdk/jfr/api/consumer/streaming/TestJVMCrash.java jdk/jfr/api/consumer/streaming/TestJVMExit.java jdk/jfr/api/consumer/streaming/TestOutOfProcessMigration.java After having looked into these failures locally, I can confirm that all these 3 tests fail due to the same underlying reason. The code in all these 3 tests have a similar pattern: 1. jtreg launches the test code. The main thread of the test process "P1" then uses a ProcessBuilder and launches a java "TestProcess" program "P2" (JVM args or other details do not matter in this context). 2. The P2 "TestProcess" main() method merely generates some JFR events and then expects a test specific file on the filesystem to contain an instruction on how to proceed. The instruction in that test specific file is either "exit" or "crash" and depending on what's in that file, the corresponding action (intentionally crashing the JVM or System.exit()) is done by the TestProcess main() method. 3. After the test code in P1 has launched this java process P2, the test code in P1 then immediately initiates the JVM "attach" against process P2 to get hold of the system property value of "jdk.jfr.repository" of the launched P2 process. 4. JVM "attach" implementation in the JDK involves a handshake between the processes, where as part of the handshake, the process which is initiating the attach will send a SIGQUIT to the process to which it wants to attach. 5. On macosx 15.x (both in our continuous integration setup and my local instance), it appears that when the test process P1 sends the SIGQUIT (as part of the attach) to P2, the P2 java program is still in the very early stages of the JVM being launched. In fact, based on what I am seeing, the libjvm library is still being loaded (through dlopen). That means the launch process of P2 is still in the launcher's native code when this process P2 receives that SIGQUIT from P1. That causes the process P2 to exit/crash. As a result, the attach from P1 never happens and that attach call fails with an IOException. The test code in P1 then retries attaching against that process and notices that the process P2 is no longer around and thus fails the test. All 3 tests have this same root cause leading to either explicit test failures or timeouts of the test. The root cause is that for some reason on macosx 15, it appears that the Java application launch has slowed down (I don't know to what amount). When these tests were failing/crashing locally, on a few occasions (not all), macos generated a diagnostic file (in its usual location on the filesystem "~/Library/Logs/DiagnosticReports"). Interestingly those crash report files have this (only the relevant snippet): Process: java [50986] Identifier: java Code Type: ARM-64 (Native) Parent Process: java [50985] Responsible: Terminal [XXX] ... OS Version: macOS 15.1.1 (24B91) ... Crashed Thread: 0 Dispatch queue: com.apple.main-thread Exception Type: EXC_CRASH (SIGQUIT) Exception Codes: 0x0000000000000000, 0x0000000000000000 Termination Reason: Namespace SIGNAL, Code 3 Quit: 3 Terminating Process: java [50985] Thread 0 Crashed:: Dispatch queue: com.apple.main-thread 0 libsystem_kernel.dylib 0x18f3d1e34 mach_msg2_trap + 8 1 libsystem_kernel.dylib 0x18f3e45d0 mach_msg2_internal + 80 2 libsystem_kernel.dylib 0x18f3da9d8 mach_msg_overwrite + 480 3 libsystem_kernel.dylib 0x18f3d217c mach_msg + 24 4 CoreFoundation 0x18f4f9edc __CFRunLoopServiceMachPort + 160 5 CoreFoundation 0x18f4f873c __CFRunLoopRun + 1212 6 CoreFoundation 0x18f4f7bc4 CFRunLoopRunSpecific + 588 7 libjli.dylib 0x1020c8e84 CreateExecutionEnvironment + 404 8 libjli.dylib 0x1020c4984 JLI_Launch + 1152 9 java 0x10207fbb4 main + 404 10 dyld 0x18f090274 start + 2840 Thread 1: 0 dyld 0x18f0a7fec invocation function for block in dyld4::Loader::findAndRunAllInitializers(dyld4::RuntimeState&) const + 940 1 dyld 0x18f0e629c invocation function for block in dyld3::MachOAnalyzer::forEachInitializer(Diagnostics&, dyld3::MachOAnalyzer::VMAddrConverter const&, void (unsigned int) block_pointer, void const*) const + 172 2 dyld 0x18f0d9c38 invocation function for block in dyld3::MachOFile::forEachSection(void (dyld3::MachOFile::SectionInfo const&, bool, bool&) block_pointer) const + 496 3 dyld 0x18f08c2dc dyld3::MachOFile::forEachLoadCommand(Diagnostics&, void (load_command const*, bool&) block_pointer) const + 300 4 dyld 0x18f0d8bcc dyld3::MachOFile::forEachSection(void (dyld3::MachOFile::SectionInfo const&, bool, bool&) block_pointer) const + 192 5 dyld 0x18f0db5a0 dyld3::MachOFile::forEachInitializerPointerSection(Diagnostics&, void (unsigned int, unsigned int, bool&) block_pointer) const + 160 6 dyld 0x18f0e5f90 dyld3::MachOAnalyzer::forEachInitializer(Diagnostics&, dyld3::MachOAnalyzer::VMAddrConverter const&, void (unsigned int) block_pointer, void const*) const + 432 7 dyld 0x18f0a7bb4 dyld4::Loader::findAndRunAllInitializers(dyld4::RuntimeState&) const + 176 8 dyld 0x18f0af190 dyld4::JustInTimeLoader::runInitializers(dyld4::RuntimeState&) const + 36 9 dyld 0x18f0a8270 dyld4::Loader::runInitializersBottomUp(dyld4::RuntimeState&, dyld3::Array<dyld4::Loader const*>&, dyld3::Array<dyld4::Loader const*>&) const + 312 10 dyld 0x18f0ac560 dyld4::Loader::runInitializersBottomUpPlusUpwardLinks(dyld4::RuntimeState&) const::$_0::operator()() const + 180 11 dyld 0x18f0a8460 dyld4::Loader::runInitializersBottomUpPlusUpwardLinks(dyld4::RuntimeState&) const + 412 12 dyld 0x18f0c089c dyld4::APIs::dlopen_from(char const*, int, void*) + 2432 13 libjli.dylib 0x1020c9284 LoadJavaVM + 56 14 libjli.dylib 0x1020c49a4 JLI_Launch + 1184 15 java 0x10207fbb4 main + 404 16 libjli.dylib 0x1020c9e44 apple_main + 88 17 libsystem_pthread.dylib 0x18f4132e4 _pthread_start + 136 18 libsystem_pthread.dylib 0x18f40e0fc thread_start + 8 This is the crash report of the P2 process that was being launched by the test and the one against which a attach (SIGQUIT) was issued. So it's natural that you will see that this process crashed due to a SIGQUIT. What's interesting though is the "Thread 1": ... 12 dyld 0x18f0c089c dyld4::APIs::dlopen_from(char const*, int, void*) + 2432 13 libjli.dylib 0x1020c9284 LoadJavaVM + 56 14 libjli.dylib 0x1020c49a4 JLI_Launch + 1184 This is where the java launcher loads (through dlopen() call) the libjvm: JLI_TraceLauncher("JVM path is %s\n", jvmpath); if (!JLI_IsStaticallyLinked()) { libjvm = dlopen(jvmpath, RTLD_NOW + RTLD_GLOBAL); The stack frames in that crash report seem to indicate that the dlopen() call on macos 15 seems to be running some initialization code (an internal implementation detail of dlopen) which I suspect is now taking more time compared to older macosx versions (I'm just basing this theory on the names of those functions in that stackframes originating from the call to dlopen()). And while this dlopen() is in progress, to the load the JVM, the SIGQUIT arrives and causes the process to crash. I wanted to be sure that this issue isn't due to some changes we have in mainline around libraries (I haven't followed the static library/build changes). So I ran these tests with Java 23 (and even Java 21) on my local macosx 15.1.1 version (and even passed _JAVA_LAUNCHER_DEBUG to get some launcher logs). There too these tests fail with this same issue (and on one occasion generated the crash report with similar stack frames): Thread 1: 0 dyld 0x18f08b5b4 _kernelrpc_mach_vm_protect_trap + 8 1 dyld 0x18f08f540 vm_protect + 52 2 dyld 0x18f0b87e0 lsl::MemoryManager::writeProtect(bool) + 204 3 dyld 0x18f0a7fe4 invocation function for block in dyld4::Loader::findAndRunAllInitializers(dyld4::RuntimeState&) const + 932 4 dyld 0x18f0e629c invocation function for block in dyld3::MachOAnalyzer::forEachInitializer(Diagnostics&, dyld3::MachOAnalyzer::VMAddrConverter const&, void (unsigned int) block_pointer, void const*) const + 172 5 dyld 0x18f0d9c38 invocation function for block in dyld3::MachOFile::forEachSection(void (dyld3::MachOFile::SectionInfo const&, bool, bool&) block_pointer) const + 496 6 dyld 0x18f08c2dc dyld3::MachOFile::forEachLoadCommand(Diagnostics&, void (load_command const*, bool&) block_pointer) const + 300 7 dyld 0x18f0d8bcc dyld3::MachOFile::forEachSection(void (dyld3::MachOFile::SectionInfo const&, bool, bool&) block_pointer) const + 192 8 dyld 0x18f0db5a0 dyld3::MachOFile::forEachInitializerPointerSection(Diagnostics&, void (unsigned int, unsigned int, bool&) block_pointer) const + 160 9 dyld 0x18f0e5f90 dyld3::MachOAnalyzer::forEachInitializer(Diagnostics&, dyld3::MachOAnalyzer::VMAddrConverter const&, void (unsigned int) block_pointer, void const*) const + 432 10 dyld 0x18f0a7bb4 dyld4::Loader::findAndRunAllInitializers(dyld4::RuntimeState&) const + 176 11 dyld 0x18f0af190 dyld4::JustInTimeLoader::runInitializers(dyld4::RuntimeState&) const + 36 12 dyld 0x18f0a8270 dyld4::Loader::runInitializersBottomUp(dyld4::RuntimeState&, dyld3::Array<dyld4::Loader const*>&, dyld3::Array<dyld4::Loader const*>&) const + 312 13 dyld 0x18f0ac560 dyld4::Loader::runInitializersBottomUpPlusUpwardLinks(dyld4::RuntimeState&) const::$_0::operator()() const + 180 14 dyld 0x18f0a8460 dyld4::Loader::runInitializersBottomUpPlusUpwardLinks(dyld4::RuntimeState&) const + 412 15 dyld 0x18f0c089c dyld4::APIs::dlopen_from(char const*, int, void*) + 2432 16 libjli.dylib 0x1025515b4 LoadJavaVM + 56 17 libjli.dylib 0x10254d2c0 JLI_Launch + 1160 18 java 0x10250bbb4 main + 404 19 libjli.dylib 0x102552148 apple_main + 88 20 libsystem_pthread.dylib 0x18f4132e4 _pthread_start + 136 21 libsystem_pthread.dylib 0x18f40e0fc thread_start + 8 This then implies that it's not the Java version or even any potential changes to libjvm library (which may have triggered additional initialization), but it's very likely some change in macosx 15.x which appears to be running some kind of (additional slow?) code within the dlopen() code path which seems to be slowing down the library load process. It's not yet clear to me if this only impacts a specific library or all libraries loaded through dlopen() on macosx 15. Just to be extra sure that this indeed is the root cause, I did a local change to the JVM attach mechanism to slightly delay the code which sends the SIGQUIT. What that will mean is that the P1 process which wants to attach to the P2 process, in its attach implementation will wait a few milli seconds before sending the SIGQUIT, thus allowing P2 some more time to complete loading the JVM and launching the application code. I added this patch to the JDK source code locally: diff --git a/src/jdk.attach/macosx/classes/sun/tools/attach/VirtualMachineImpl.java b/src/jdk.attach/macosx/classes/sun/tools/attach/VirtualMachineImpl.java index e869c08cd91..8956f998df9 100644 --- a/src/jdk.attach/macosx/classes/sun/tools/attach/VirtualMachineImpl.java +++ b/src/jdk.attach/macosx/classes/sun/tools/attach/VirtualMachineImpl.java @@ -72,6 +72,14 @@ public class VirtualMachineImpl extends HotSpotVirtualMachine { if (!socket_file.exists()) { File f = createAttachFile(pid); try { + long m = 100; + System.err.println("waiting " + m + " milli seconds before sending sigquit to " + pid); + try { + Thread.sleep(m); + } catch (InterruptedException e) { + throw new RuntimeException(e); + } + System.err.println("now sending sigquit to " + pid); sendQuitTo(pid); // give the target VM time to start the attach mechanism Re-built the JDK and reran the JFR streaming tests. With this change, these tests now complete successfully every single time on my local system. I will look around the macosx forums to understand if/what changed in dlopen() on macosx 15.x and if it needs to be reported to Apple. As for these tests, in theory, I think it might be OK to change these tests in a way that the test code initiates the VM attach only after the "TestProcess" main() code has started executing. That way it is sure that the process has launched and is ready for the attach and subsequent testing. Someone more familiar with these tests would have to decide if it's the right thing to do. Having said that, I believe the underlying issue (the slowdown in dlopen() on macosx 15.x) is a genuine issue and needs to be investigated since I think that can/will impact in other ways.
18-02-2025

Links to failures would be useful. See also JDK-8345147.
28-11-2024