JDK-8340547 : Starting many threads can delay safepoints
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 24
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2024-09-20
  • Updated: 2025-02-21
  • Resolved: 2024-10-09
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 24
24 b19Fixed
Related Reports
Relates :  
Description
Starting a lot of threads in a burst can significantly delay safepoint synchronization, for example up to multiple seconds.

JVM_StartThread takes Threads_lock, and under that lock appends to the threads list for ThreadSMR support which can take ~0.1ms. If we have many concurrent calls to Thread.start, there can be an arbitrary number of callers waiting for the Threads_lock.

Safepoint synchronization also needs to acquire the Threads_lock before arming the safepoint, and it has no special priority so it can be arbitrarily delayed by JVM_StartThread calls.

The attached reproducer demonstrates the issue.

```
java -Xlog:safepoint -ThreadStartTtsp.java | grep -o 'Reaching safepoint: [0-9]* ns'
Reaching safepoint: 1291591 ns
Reaching safepoint: 59962 ns
Reaching safepoint: 1958065 ns
Reaching safepoint: 14456666258 ns <-- 14 seconds!
```

Comments
Hi [~goetz] The introduced lock pattern appears somewhat suspicious for potential deadlocks. However, given its limited usage scope, I see no possibility of such a scenario occurring. The backport for JDK 17 is somewhat fragile and requires a thorough review. I would not target it for the April release either. The testing was done on release and fastdebug configurations: release tier1, fastdebug tier1-3. Reproducers from JDK-8340547, JDK-8307970 are used as well.
13-02-2025

> This is a rather new change and fiddles with locking during startup. This is a fragile and central part of the code, so I would rate this as a high risk backport. I could imagine approving this for 21, as this is the latest LTS. Given there is an adjacent Thread_lock acquired in the vicinity, I see much less risk than I would ascribe to this patch otherwise. The only concern that I have is whether there is a lock ranking problem somewhere, but again, since this introduces the only use of ThreadsLockThrottle_lock, I see no real rank inversion (and deadlock) opportunities. I would personally support pulling this backport into 21u. 17u can wait until we expose more users in 21u with this. > Also, there is a follow-up open. You mean JDK-8307970? It is not a follow-up, but rather a related issue, which makes this issue worse. So I think the existence of that other issue does not block this one from being backported.
10-02-2025

Hi [~snazarki] This is a rather new change and fiddles with locking during startup. This is a fragile and central part of the code, so I would rate this as a high risk backport. Also, there is a follow-up open. I could imagine approving this for 21, as this is the latest LTS. But I think we should defer it to 17.0.16 (July) to make sure we don't backport a regression. It would help if you describe the risk of this change in your eyes, the relevance of the open relates-to issue, and the testing you have done as it is documented for fix-request in https://wiki.openjdk.org/display/JDKUpdates/How+to+contribute+or+backport+a+fix
07-02-2025

[jdk21u-fix-request] Approval Request from snazarkin I'd like to backport this fix to eliminate the performance penalty measured in large thread pools. Some users find this as a blocker for transitioning from JDK8. Revived by Paul Hohensee and Oli Gillespie(original author)
05-02-2025

A pull request was submitted for review. Branch: master URL: https://git.openjdk.org/jdk17u-dev/pull/3263 Date: 2025-02-03 08:36:12 +0000
03-02-2025

A pull request was submitted for review. Branch: master URL: https://git.openjdk.org/jdk21u-dev/pull/1365 Date: 2025-01-28 16:44:06 +0000
28-01-2025

Changeset: e704c055 Branch: master Author: Oli Gillespie <ogillespie@openjdk.org> Date: 2024-10-09 15:28:44 +0000 URL: https://git.openjdk.org/jdk/commit/e704c055a4cf2aab77cc2b3d034f5a8b8d9e3331
09-10-2024

Moving to hotspot/runtime for initial triage.
20-09-2024

A pull request was submitted for review. Branch: master URL: https://git.openjdk.org/jdk/pull/21111 Date: 2024-09-20 15:31:42 +0000
20-09-2024

JDK-8307970 shows that SMR makes thread creation expensive, which only makes this worse: not only we trash the Threads_lock, we can also hoard it enough -- for hundreds of microseconds -- so that VM Thread would be completely parked, and its wakeup would not get a good chance to acquire the lock before another Java thread manages to acquire it.
20-09-2024