Bug ID: JDK-8361462 JVM crashed with assert(ret == 0) failed: Failed to wait on semaphore

JDK-8361462 : JVM crashed with assert(ret == 0) failed: Failed to wait on semaphore

Type: Bug
Component: hotspot
Sub-Component: runtime
Affected Version: 26

Priority: P4
Status: In Progress
Resolution: Unresolved
OS: os_x
CPU: aarch64

Submitted: 2025-07-07
Updated: 2025-10-14

Versions (Unresolved/Resolved/Fixed)

The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.

Other
tbdUnresolved

Related Reports

Relates :	JDK-8369336 - Assess and remedy any unsafe usage of the Semaphore used by Unified Logging
Relates :	JDK-8369337 - Assess and remedy any unsafe usage of the Semaphore used by Compiler events
Relates :	JDK-8369338 - Assess and remedy any unsafe usage of the Semaphore used by the Perf code on Windows
Relates :	JDK-8369250 - Assess and remedy any unsafe usage of the Semaphore used by NonJavaThread::List
Relates :	JDK-8369631 - Assess and remedy any unsafe usage of the sr_semaphore Semaphore in the Posix signal code
Relates :	JDK-8369255 - Assess and remedy any unsafe usage of the Semaphores used by JFR
Relates :	JDK-8369472 - Assess and remedy any unsafe usage of the Semaphore used by the ZipLibraryLoaderLock

Sub Tasks

JDK-8361647 :

Report the error reason on failed semaphore calls on macOS - Resolved

Description

SUMMARY: We have established that during VM termination when `exit()` has been invoked, we can still execute code that tries to use Semaphore's for which the static destructor has already been executed, resulting in the observed assertion failures. Separate issues are created for the numerous (directly or indirectly) statically defined Semaphore instances, so that their safe usage can be evaluated and corrected if needed. The Semaphore is unsafe if it can be accessed by a `NonJavaThread`, or by a `JavaThread` in a safepoint-safe state.

---

For random generator using seed: 5323127887678115242
To re-run test with same seed value please add "-Djdk.test.lib.random.seed=5323127887678115242" to command line.
Stress process main method is started.
[234.178s][error][jvmti] Posting Resource Exhausted event: Java heap space
[310.199s][error][jvmti] Posting Resource Exhausted event: Requested array size exceeds VM limit
[572.061s][error][jvmti] Posting Resource Exhausted event: Requested array size exceeds VM limit
Fatal error in jvmti native agent: (ForceEarlyReturnVoid) unexpected error: (113). GetErrorName finished with error (0). 
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/System/Volumes/Data/mesos/work_dir/slaves/d2398cde-9325-49c3-b030-8961a4f0a253-S577077/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/1725c443-f2fc-4cb3-8d36-4912f92abfb1/runs/bf13cee2-c1c1-49dc-af9a-95488455fd59/workspace/open/src/hotspot/os/bsd/semaphore_bsd.cpp:65), pid=69909, tid=58739
#  assert(ret == 0) failed: Failed to wait on semaphore
#
# JRE version: Java(TM) SE Runtime Environment (26.0+6) (fastdebug build 26-ea+6-492)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 26-ea+6-492, mixed mode, tiered, compressed oops, compressed class ptrs, serial gc, bsd-aarch64)
# Core dump will be written. Default location: core.69909
#
[thread 66547 also had an error]
# JFR recording file will be written. Location: /System/Volumes/Data/mesos/work_dir/slaves/d2398cde-9325-49c3-b030-8961a4f0a253-S576526/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/881efc11-8a0a-4b23-ab2a-a751f36d48e2/runs/d04547ca-129a-4075-a52d-b1f851337dcc/testoutput/test-support/jtreg_closed_test_hotspot_jtreg_applications_runthese_RunThese30M_java/scratch/0/hs_err_pid69909.jfr
#
Unsupported internal testing APIs have been used.

Interestingly the hs_err file is truncated:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/System/Volumes/Data/mesos/work_dir/slaves/d2398cde-9325-49c3-b030-8961a4f0a253-S577077/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/1725c443-f2fc-4cb3-8d36-4912f92abfb1/runs/bf13cee2-c1c1-49dc-af9a-95488455fd59/workspace/open/src/hotspot/os/bsd/semaphore_bsd.cpp:65), pid=69909, tid=58739
#  assert(ret == 0) failed: Failed to wait on semaphore
#
# JRE version: Java(TM) SE Runtime Environment (26.0+6) (fastdebug build 26-ea+6-492)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 26-ea+6-492, mixed mode, tiered, compressed oops, compressed class ptrs, serial gc, bsd-aarch64)
# Core dump will be written. Default location: core.69909
#
# JFR recording file will be written. Location: /System/Volumes/Data/mesos/work_dir/slaves/d2398cde-9325-49c3-b030-8961a4f0a253-S576526/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/881efc11-8a0a-4b23-ab2a-a751f36d48e2/runs/d04547ca-129a-4075-a52d-b1f851337dcc/testoutput/test-support/jtreg_closed_test_hotspot_jtreg_applications_runthese_RunThese30M_java/scratch/0/hs_err_pid69909.jfr
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
#

---------------  S U M M A R Y ------------

Command Line: -Xbootclasspath/a:/System/Volumes/Data/mesos/work_dir/slaves/d2398cde-9325-49c3-b030-8961a4f0a253-S576526/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/881efc11-8a0a-4b23-ab2a-a751f36d48e2/runs/d04547ca-129a-4075-a52d-b1f851337dcc/testoutput/test-support/jtreg_closed_test_hotspot_jtreg_applications_runthese_RunThese30M_java/scratch/0/wb.jar -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI -XX:MaxRAMPercentage=6.25 -Dtest.boot.jdk=/System/Volumes/Data/mesos/work_dir/jib-master/install/jdk/24/36/bundles/macos-aarch64/jdk-24_macos-aarch64_bin.tar.gz/jdk-24.jdk/Contents/Home -Djava.io.tmpdir=/System/Volumes/Data/mesos/work_dir/slaves/d2398cde-9325-49c3-b030-8961a4f0a253-S576526/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/881efc11-8a0a-4b23-ab2a-a751f36d48e2/runs/d04547ca-129a-4075-a52d-b1f851337dcc/testoutput/test-support/jtreg_closed_test_hotspot_jtreg_applications_runthese_RunThese30M_java/tmp -XX:+UseSerialGC -XX:MaxRAMPercentage=50 -Djava.net.preferIPv6Addresses=false -XX:+DisplayVMOutputToStderr -Xlog:gc*,gc+heap=debug:gc.log:uptime,timemillis,level,tags -XX:+DisableExplicitGC -XX:+StartAttachListener -XX:CompileCommand=memlimit,*.*,0 -Xlog:monitorinflation=info:file=../monitorinflation.log::filesize=500m -Djava.io.tmpdir=/System/Volumes/Data/mesos/work_dir/slaves/d2398cde-9325-49c3-b030-8961a4f0a253-S576526/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/881efc11-8a0a-4b23-ab2a-a751f36d48e2/runs/d04547ca-129a-4075-a52d-b1f851337dcc/testoutput/test-support/jtreg_closed_test_hotspot_jtreg_applications_runthese_RunThese30M_java/scratch/0/java.io.tmpdir -Duser.home=/System/Volumes/Data/mesos/work_dir/slaves/d2398cde-9325-49c3-b030-8961a4f0a253-S576526/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/881efc11-8a0a-4b23-ab2a-a751f36d48e2/runs/d04547ca-129a-4075-a52d-b1f851337dcc/testoutput/test-support/jtreg_closed_test_hotspot_jtreg_applications_runthese_RunThese30M_java/scratch/0/user.home -agentpath:/System/Volumes/Data/mesos/work_dir/jib-master/install/jdk-26+6-492/macosx-aarch64-debug.test/hotspot/jtreg/native/libJvmtiStressModule.dylib -Xverify:all -javaagent:redefineagent.jar -XX:NativeMemoryTracking=detail -Djdk.test.lib.random.seed=5323127887678115242 applications.kitchensink.process.stress.Main /System/Volumes/Data/mesos/work_dir/slaves/d2398cde-9325-49c3-b030-8961a4f0a253-S576526/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/881efc11-8a0a-4b23-ab2a-a751f36d48e2/runs/d04547ca-129a-4075-a52d-b1f851337dcc/testoutput/test-support/jtreg_closed_test_hotspot_jtreg_applications_runthese_RunThese30M_java/scratch/0/kitchensink.final.properties

Host: "Mac14,3" arm64, 8 cores, 16G, Darwin 22.3.0, macOS 13.2.1 (22D68)
Time: Sun Jul  6 01:11:19 2025 GMT elapsed time: 924.390394 seconds (0d 0h 15m 24s)

---------------  T H R E A D  ---------------

Current thread (0x0000000298d41a10):  JavaThread "Thread-7972" daemon [_thread_in_vm, id=58739, stack(0x000000028ab04000,0x000000028ad07000) (2060K)]

Stack: [0x000000028ab04000,0x000000028ad07000],  sp=0x000000028ad05c30,  free space=2055k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.dylib+0x12125c8]  VMError::report(outputStream*, bool)+0x1b00  (semaphore_bsd.cpp:65)
V  [libjvm.dylib+0x1215e68]  VMError::report_and_die(int, char const*, char const*, char*, Thread*, unsigned char*, void const*, void const*, char const*, int, unsigned long)+0x55c
V  [libjvm.dylib+0x5a8448]  print_error_for_unit_test(char const*, char const*, char*)+0x0
V  [libjvm.dylib+0xfd8330]  OSXSemaphore::trywait()+0x0
V  [libjvm.dylib+0x98cd00]  JfrThreadGroup::thread_group_id_internal(JfrThreadGroupsHelper&)+0x30
V  [libjvm.dylib+0x98cc10]  JfrThreadGroup::thread_group_id(JavaThread const*, Thread*)+0xe8
V  [libjvm.dylib+0x99ef70]

Comments

These are the direct, or indirect, static Semaphores declared in the VM, and the issues filed to address them: 1. NonJavaThread::_the_list; (contains SingleWriterSynchronizer which contains a Semaphore) (JDK-8369250) 2. Semaphore ConfigurationLock::_semaphore(1); (unified logging) (JDK-8369336) 3. Semaphore PhaseTypeGuard::_mutex_semaphore(1); (compilerEvents) (JDK-8369337) 4. Semaphore PdhMutex::_semaphore(1); (os_perf_windows.cpp) (JDK-8369338) 5. Semaphore ThreadIdExclusiveAccess::_mutex_semaphore(1); (JFR objectSamp[leCheckPoint.cpp) (JDK-8369255 ) 6. Semaphore ThreadGroupExclusiveAccess::_mutex_semaphore(1); (jfrThreadGroupManager.cpp) (JDK-8369255 ) 7. Semaphore SerializerRegistrationGuard::_mutex_semaphore(1); (jfrTypeManager.cpp) (JDK-8369255 ) 8. Semaphore ZipLibraryLoaderLock::_lock(1); (utilities zipLibrary.cpp) (JDK-8369472) 9. sr_semaphore (signals_posix.cpp) ( JDK-8369631) I've also examined the use of the SingleWriterSynchronizer by OopStorage and none of those uses appear to involve a static entity containing the SWS and thus the Semaphore. Continuing the audit of other non-static Semaphores to check they are not embedded in a static entity: (TBC) ./share/cds/archiveUtils.hpp: Semaphore _end_semaphore; - Safe: embedded within ArchiveWorkers which is a StackObj ./share/gc/g1/g1FreeIdSet.hpp: Semaphore _sem; - Safe: Embedded in G1FreeIdSet which appears unused since JDK-8342382 ./share/gc/shared/workerThread.hpp: Semaphore _start_semaphore; ./share/gc/shared/workerThread.hpp: Semaphore _end_semaphore; - Safe: embedded in WorkerTaskDispatcher, embedded in WorkerThreads which is dynamically allocated ./share/gc/z/zFuture.hpp: Semaphore _sema; - Embedded in ZFuture - Embedded in ZDriverPortEntry - Embedded in ZPageAllocation ./share/jfr/periodic/sampling/jfrThreadSampler.cpp: Semaphore _sample; ./share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp: Semaphore _sample; ./share/logging/logFileOutput.hpp: Semaphore _rotation_semaphore; ./share/logging/logAsyncWriter.hpp: Semaphore _flush_sem; ./share/runtime/mutex.hpp: Semaphore _sem; ./share/utilities/waitBarrier_generic.hpp: Semaphore _sem; EDIT: This became unmanageable so was split up into different issues for specific Semaphores to let the code owners for each Semaphore determine whether their usage is safe or not. If I was able to determine a usage was safe I did not create an issue.
14-10-2025
[~dholmes] Unsure since I have to trace how the shutdown and the exit() interactions occur. I just wanted to point out that the analysis mentions that JavaThreads running _thread_in_native is problematic - but in the above case, a JavaThread is running _thread_in_vm and hitting the destroyed semaphore. Is there a race in how we think about the shutdown sequence? What is preventing the threads from re-enter? Is it the Threads_lock or the SafepointSynchronize::wait semaphore? Or both? Might not be an issue with the general shutdown sequence; a reasonable hypothesis can be found in JDK-8369255.
08-10-2025
[~mgronlun] The last safepoint never terminates so a JavaThread can never leave it once it blocks. The problem for JavaThreads is running _thread_in_native - which I suspect is what the JFR code is doing. Semaphores don't participate in thread-state modifications so there is no requirement to be _thread_in_VM to use one. Which `block_if_vm_exited` check do you think is being overcome?
07-10-2025
[~dholmes] Can a JavaThread return to the VM after the last safepoint? Or are they blocked from then onwards? I think I found the answer should be no, as per static void block_if_vm_exited() { if (_vm_exited) { wait_if_vm_exited(); } } But the reason for asking is this: Current thread (0x0000000298d41a10): JavaThread "Thread-7972" daemon [_thread_in_vm, id=58739, stack(0x000000028ab04000,0x000000028ad07000) (2060K)] Stack: [0x000000028ab04000,0x000000028ad07000], sp=0x000000028ad05c30, free space=2055k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.dylib+0x12125c8] VMError::report(outputStream, bool)+0x1b00 (semaphore_bsd.cpp:65) V [libjvm.dylib+0x1215e68] VMError::report_and_die(int, char const, char const, char, Thread, unsigned char, void const, void const, char const, int, unsigned long)+0x55c V [libjvm.dylib+0x5a8448] print_error_for_unit_test(char const, char const, char)+0x0 V [libjvm.dylib+0xfd8330] OSXSemaphore::trywait()+0x0 V [libjvm.dylib+0x98cd00] JfrThreadGroup::thread_group_id_internal(JfrThreadGroupsHelper&)+0x30 V [libjvm.dylib+0x98cc10] JfrThreadGroup::thread_group_id(JavaThread const, Thread)+0xe8 V [libjvm.dylib+0x99ef70] Should the set_vm_exited() function be called earlier than it is currently done? Perhaps even before VMThread is torn down?
07-10-2025
[~mgronlun] quite simply that is the way it works. We bring the VM to a safepoint to quell most activity by JavaThreads, but NJTs have no such mechanism. And JavaThreads that are safepoint-safe (e.g. in native) will still run.
07-10-2025
[~dholmes] "allowing the DestroyJavaVM thread to invoke `exit()` in the launcher". How can we even allow this to happen if threads are still running in the VM?
07-10-2025
There is a general race between statically allocated Semaphore objects being destroyed by libc as the process exits, and concurrently running threads in the JVM at termination time. The original reported problem could occur if the JavaThread was executing in a safepoint-safe state at VM termination time, thus allowing the termination safepoint to be reached, and for process exit to commence. The VMThread example above arises after the VMThread has logically exited, allowing the DestroyJavaVM thread to invoke `exit()` in the launcher, and the VMThread then tries to clean itself up by removing itself from the `NonJavaThread` list. The same problem can occur with GC threads. It is possible there are other cases - we would need to examine every single static Semaphore.
02-09-2025
From another occurrence we see: --------------- T H R E A D --------------- Current thread (0x000000014702b010): VMThread "VM Thread" [id=19459, stack(0x000000016fe10000,0x0000000170013000) (2060K)] Stack: [0x000000016fe10000,0x0000000170013000], sp=0x0000000170012ca0, free space=2059k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.dylib+0x12361c4] VMError::report(outputStream, bool)+0x1b00 (semaphore_bsd.cpp:112) V [libjvm.dylib+0x1239a64] VMError::report_and_die(int, char const, char const, char, Thread, unsigned char, void const, void const, char const, int, unsigned long)+0x55c V [libjvm.dylib+0x5ae4e8] print_error_for_unit_test(char const, char const, char)+0x0 V [libjvm.dylib+0xff1324] SerialArguments::initialize()+0x0 V [libjvm.dylib+0x103f210] SingleWriterSynchronizer::synchronize()+0xc8 V [libjvm.dylib+0xe8f3c4] NonJavaThread::remove_from_the_list()+0xd4 V [libjvm.dylib+0xe8f5cc] NonJavaThread::post_run()+0x1c V [libjvm.dylib+0x1176dec] Thread::call_run()+0x134 V [libjvm.dylib+0xed9724] thread_native_entry(Thread*)+0x138 C [libsystem_pthread.dylib+0x6fa8] _pthread_start+0x94 VM_Operation (0x00000001042b8ee8): Halt, mode: safepoint So this is definitely a shutdown issue. Though the stack above has incorrect entries.
02-09-2025
hs_err file is truncated: # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/System/Volumes/Data/mesos/work_dir/slaves/f7f8bd65-a387-4a2b-b519-702f2fefaf87-S168184/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/fa7af491-0dc2-48b1-af5a-adc06493da62/runs/0d718a07-6317-433b-9fc2-04b08caa6ee4/workspace/open/src/hotspot/os/bsd/semaphore_bsd.cpp:112), pid=92097, tid=19459 # assert(kr == 0 \|\| kr == 49) failed: Failed to timed-wait on semaphore: Unknown (0xf) # # JRE version: Java(TM) SE Runtime Environment (26.0+13) (fastdebug build 26-ea+13-1285) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 26-ea+13-1285, compiled mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, bsd-aarch64) # Core dump will be written. Default location: core.92097 # # If you would like to submit a bug report, please visit: # https://bugreport.java.com/bugreport/crash.jsp # --------------- S U M M A R Y ------------ Command Line: -Denv.class.path=/System/Volumes/Data/mesos/work_dir/slaves/f7f8bd65-a387-4a2b-b519-702f2fefaf87-S168251/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/443d611e-f0e2-4ff6-9124-22afb60221f9/runs/64de2529-836e-41a9-94db-8e6b06d636eb/testoutput/test-support/jtreg_open_test_jdk_svc_tools/classes/1/sun/tools/jps/TestJps.d:/System/Volumes/Data/mesos/work_dir/jib-master/install/jdk-26+13-1285/src.full/open/test/jdk/sun/tools/jps:/System/Volumes/Data/mesos/work_dir/slaves/f7f8bd65-a387-4a2b-b519-702f2fefaf87-S168251/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/443d611e-f0e2-4ff6-9124-22afb60221f9/runs/64de2529-836e-41a9-94db-8e6b06d636eb/testoutput/test-support/jtreg_open_test_jdk_svc_tools/classes/1/test/lib:/System/Volumes/Data/mesos/work_dir/jib-master/install/jdk-26+13-1285/src.full/open/test/lib:/System/Volumes/Data/mesos/work_dir/jib-master/install/jtreg/7.5.2/1/bundles/jtreg-7.5.2+1.zip/jtreg/lib/jtreg.jar -Dapplication.home=/System/Volumes/Data/mesos/work_dir/jib-master/install/jdk-26+13-1285/macosx-aarch64-debug.jdk/jdk-26/fastdebug -Xms8m -Xmx768m -XX:MaxRAMPercentage=6.25 -Dtest.boot.jdk=/System/Volumes/Data/mesos/work_dir/jib-master/install/jdk/24/36/bundles/macos-aarch64/jdk-24_macos-aarch64_bin.tar.gz/jdk-24.jdk/Contents/Home -Djava.io.tmpdir=/System/Volumes/Data/mesos/work_dir/slaves/f7f8bd65-a387-4a2b-b519-702f2fefaf87-S168251/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/443d611e-f0e2-4ff6-9124-22afb60221f9/runs/64de2529-836e-41a9-94db-8e6b06d636eb/testoutput/test-support/jtreg_open_test_jdk_svc_tools/tmp -ea -esa -Xcomp -ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -XX:+TieredCompilation -XX:+UsePerfData -Djdk.module.main=jdk.jcmd jdk.jcmd/sun.tools.jps.Jps -q -l -V Host: "Mac14,3" arm64, 8 cores, 16G, Darwin 23.3.0, macOS 14.3.1 (23D60) Time: Sun Aug 24 22:57:19 2025 GMT elapsed time: 10.383412 seconds (0d 0h 0m 10s) --------------- T H R E A D --------------- Current thread (0x000000013b85dc10): VMThread "VM Thread" [id=19459, stack(0x000000016c75c000,0x000000016c95f000) (2060K)] Stack: [0x000000016c75c000,0x000000016c95f000], sp=0x000000016c95eca0, free space=2059k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.dylib+0x12305d8] VMError::report(outputStream, bool)+0x1b68 (semaphore_bsd.cpp:112) V [libjvm.dylib+0x1233ec8] VMError::report_and_die(int, char const, char const, char, Thread, unsigned char, void const, void const, char const*, int, unsigned long)+0x55c V [libjvm.dylib+0x5a9628]
25-08-2025
Moved to runtime, given the varied nature of the sightings
28-07-2025
Another sighting with a different test: sun/tools/jps/TestJps.java # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/System/Volumes/Data/mesos/work_dir/slaves/d2398cde-9325-49c3-b030-8961a4f0a253-S577065/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/8d041014-cf6e-4af2-ad48-bb1b5bf89e5f/runs/fece01dc-7e7b-4044-bb3d-172752549d32/workspace/open/src/hotspot/os/bsd/semaphore_bsd.cpp:112), pid=39988, tid=19459 # assert(kr == 0 \|\| kr == 49) failed: Failed to timed-wait on semaphore: Unknown (0xf) # # JRE version: Java(TM) SE Runtime Environment (26.0+9) (fastdebug build 26-ea+9-860) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 26-ea+9-860, compiled mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, bsd-aarch64) Unfortunately no hs_err file was collected so we do not know which semaphore was involved.
28-07-2025
That error code corresponds to: #define KERN_INVALID_NAME 15 Looking in the implementation (https://fergofrog.com/code/codebrowser/xnu/osfmk/kern/sync_sema.c.html#semaphore_timedwait) the only place that gets reported is here: kern_return_t port_name_to_semaphore( mach_port_name_t name, semaphore_t semaphorep) { ipc_port_t port; kern_return_t kr; if (!MACH_PORT_VALID(name)) { semaphorep = SEMAPHORE_NULL; return KERN_INVALID_NAME; } and that function in turn seems to be used within the signal-handling callbacks for the semaphore functions e.g.: kern_return_t semaphore_timedwait_trap_internal( mach_port_name_t name, unsigned int sec, clock_res_t nsec, semaphore_cont_t caller_cont) { semaphore_t semaphore; mach_timespec_t wait_time; kern_return_t kr; wait_time.tv_sec = sec; wait_time.tv_nsec = nsec; if (BAD_MACH_TIMESPEC(&wait_time)) { return KERN_INVALID_VALUE; } kr = port_name_to_semaphore(name, semaphore: &semaphore); which suggests this was somehow passed a bad mach port name. Searching further that seems to come from: / kern_return_t semaphore_timedwait_trap( struct semaphore_timedwait_trap_args args) { return semaphore_timedwait_trap_internal(name: args->wait_name, sec: args->sec, nsec: args->nsec, thread_syscall_return); } but how that gets called I cannot tell. I do not know exactly what these "trap" variants are. However, given this: static Semaphore _mutex_semaphore; and the fact the hs_err log is truncated, I have a suspicion this is being executed during VM termination and that a static destructor is being run either for the particular semaphore in question, or for all Semaphore's associated with the current process. That said, we shouldn't be that far into VM termination as we have not hit the final termination safepoint (the current thread is _thread_in_VM). So this still doesn't make a lot of sense. Memory corruption could explain things of course.
14-07-2025
[~mgronlun] I will create a subtask to have the code report the unexpected return value and then we can see if this re-occurs.
08-07-2025
[~dholmes] Hard to see what can go wrong with this construct and especially why it would start to do so after about 10 years of service: class ThreadGroupExclusiveAccess : public StackObj { private: static Semaphore _mutex_semaphore; public: ThreadGroupExclusiveAccess() { _mutex_semaphore.wait(); } ~ThreadGroupExclusiveAccess() { _mutex_semaphore.signal(); } }; Semaphore ThreadGroupExclusiveAccess::_mutex_semaphore(1);
08-07-2025
One thing that is odd though, the crash here: # Internal Error (/System/Volumes/Data/mesos/work_dir/slaves/d2398cde-9325-49c3-b030-8961a4f0a253-S577077/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/1725c443-f2fc-4cb3-8d36-4912f92abfb1/runs/bf13cee2-c1c1-49dc-af9a-95488455fd59/workspace/open/src/hotspot/os/bsd/semaphore_bsd.cpp:65), pid=69909, tid=58739 # assert(ret == 0) failed: Failed to wait on semaphore Indicates this is the assertion in OSXSemaphore::wait, but the stack shows: V [libjvm.dylib+0xfd8330] OSXSemaphore::trywait()+0x0 and trywait() in implemented in terms of timedwait() not wait(). But likely this is just an issue of inaccurate stack information. The code in ThreadGroupExclusiveAccess uses wait() not trywait().
08-07-2025
[~mgronlun] . KERN_INVALID_ARGUMENT and KERN_TERMINATED should never happen in correctly written code - hence we assert success. If they do occur then the code using the semaphore has a bug. Hence this still appears a JFR issue to me. Of course it would be useful if the assert reported the actual error code.
08-07-2025
Looks like semaphore_wait() has several return codes not covered by the HotSpot abstraction: https://web.mit.edu/darwin/src/modules/xnu/osfmk/man/semaphore_wait.html RETURN VALUES KERN_INVALID_ARGUMENT The specified semaphore is invalid. KERN_TERMINATED The specified semaphore has been destroyed. KERN_ABORTED The caller was blocked due to a negative count on the semaphore, and was awoken for a reason not related to the semaphore subsystem (e.g. thread_terminate). KERN_SUCCESS The semaphore wait operation was successful.
07-07-2025