JDK-8369392 : Safepoint sync should suspend GC and Java threads concurrently
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: gc
  • Priority: P4
  • Status: In Progress
  • Resolution: Unresolved
  • Submitted: 2025-10-08
  • Updated: 2025-10-27
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdUnresolved
Related Reports
Relates :  
Description
Looking at JDK-8369283 logs, I believe we have a minor performance opportunity, look:

[5448421640ns] Safepoint synchronization begins
[5448472227ns] Suspending GC threads 
[5448489199ns] Blocking threads from starting/exiting
[5448504629ns] Arming futex wait barrier
[5448520860ns] Setting thread local yield flag for threads
[5448541169ns] 26 total threads, waiting for 15 threads to block
[5448557199ns] Checking thread status
...
[5448597837ns] Waiting for 15 threads to block 
...
[5448668341ns] Safepoint synchronization finished

So the total synchronization took 246us, out of which we spent 17us waiting for GC (G1 in this case) to suspend its threads. But we can probably do this concurrently: announce the safepoint first, let Java threads block, and before going in and checking for Java thread status, we can ask GC to suspend. This would hide all those 17us in the time we wait for Java threads to respond. Additionally, Java threads that are blocking now would likely free up CPUs to let GC STS suspension to complete more promptly.

I think STS that GCs use to suspend/resume are oblivious of safepoint state, so it should be safe to rearrange. We need to check if it really is.
Comments
A pull request was submitted for review. Branch: master URL: https://git.openjdk.org/jdk/pull/27924 Date: 2025-10-21 20:00:15 +0000
27-10-2025

Hi [~kbarrett], I didn't run more tests for the two solutions, and also ran our internal benchmark workflows for both solutions, I didn't see much performance gain from the more comprehensive solution splitting STS synchronize into two phases, I'll follow [~shade]'s solution which is much simpler with lower risk. (I guess both solution won't benefit ZGC much)
27-10-2025

[~xpeng] For ZGC, moving the generation worker thread synchronization that way still leaves ZGC with little or no benefit from this approach.
11-10-2025

[~kbarrett] For ZGC, I feel we can move generation worker thread synchronization to ZCollectedHeap::safepoint_synchronize(), only keep StackWatermarkSet::safepoint_synchronize_begin() and SuspendibleThreadSet::synchronize_begin() in ZCollectedHeap::safepoint_synchronize_begin(), they will be suspended concurrently as well.
11-10-2025

I have run simple Dacapo benchmark with safepoint log, collected the time to reach safepoint in ns of the two solutions: 1. Split STS synchronize into `synchronize_begin` and `synchronize`: AVG: 109929 ns Min: 7417 ns Max: 1946959 ns 2. Move `Universe::heap()->safepoint_synchronize_begin();` after arming wait barrier AVG: 140230 ns Min: 8750 ns Max: 3357791 ns As expected, solution #1 performs slightly better.
11-10-2025

I fixed the assert, but comparing the [~shade]'s suggestion(https://github.com/openjdk/jdk/compare/master...pengxiaolong:jdk:JDK-8369392-simple-solution?expand=1), the PR seems overly complicated.
10-10-2025

Thanks [~kbarrett]! There is indeed some weird things with the change in PR. I saw assert which indicates it is not really safe to split up, here is the stack frames from the assert: ``` Stack: [0x00000001710c4000,0x00000001712c7000], sp=0x00000001712c6790, free space=2057k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.dylib+0x1424aec] VMError::report(outputStream*, bool)+0x1b68 (suspendibleThreadSet.cpp:133) V [libjvm.dylib+0x14283e8] VMError::report_and_die(int, char const*, char const*, char*, Thread*, unsigned char*, void const*, void const*, char const*, int, unsigned long)+0x55c V [libjvm.dylib+0x5dafb0] print_error_for_unit_test(char const*, char const*, char*)+0x0 V [libjvm.dylib+0x12f802c] SuspendibleThreadSet::desynchronize()+0x0 V [libjvm.dylib+0x10390a0] SafepointSynchronize::begin()+0x2ec V [libjvm.dylib+0x1446590] VMThread::inner_execute(VM_Operation*)+0x27c V [libjvm.dylib+0x14456f4] VMThread::loop()+0x98 V [libjvm.dylib+0x1445480] VMThread::run()+0xc0 V [libjvm.dylib+0x135f860] Thread::call_run()+0xf0 V [libjvm.dylib+0xf24270] thread_native_entry(Thread*)+0x138 C [libsystem_pthread.dylib+0x6bc8] _pthread_start+0x88 VM_Operation (0x0000000175145f20): G1CollectForAllocation, mode: safepoint, requested by thread 0x0000000125023e10 ``` It is weird that SuspendibleThreadSet::desynchronize() is called from SafepointSynchronize::begin(), which I can't find where the call is from.
10-10-2025

[~xpeng] That looks pretty much like what I had in mind for JDK-8152199. I've never (and still don't really) have time to look through things to make sure there aren't any weird interactions that arise from doing that. It looks like zgc might not benefit much from this though. The generation worker thread synchronization request is still blocking until completed. I've even less idea of whether splitting that up is safe. I also noticed that Parallel and Serial have the synchronization conditional on UseStringDedup, since they don't have other threads that run concurrently with mutator threads. But the string dedup thread was changed a while ago to be a Java thread, so STS is no longer relevant for that either.
10-10-2025

I have a draft PR here https://github.com/openjdk/jdk/pull/27739/files to restructure STS synchronize by adding synchronize_begin which sets `_suspend_all_start` to start STS synchronization, and make `synchronize` to wait if all suspendible threads are blocked, it seems not super intrusive.
10-10-2025

Oh, cool, I see JDK-8152199. Looks like it calls for a more comprehensive rewrite of STS that allows for async suspend request and then waiting? I suspect we can get the similar, if not all, bang for the buck if we "just" move `Universe::heap()->safepoint_synchronize_begin();` after announcing the safepoint and arming the wait barrier. In that case, it would matter less if STS suspension is currently synchronous: we are still waiting for Java threads to block. It is much less intrusive compared to STS restructuring as well?
09-10-2025

Probably duplicate.
08-10-2025