JDK-8307970 : Thread.start is slow with large number of threads
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 17.0.7,21
  • Priority: P4
  • Status: Open
  • Resolution: Unresolved
  • Submitted: 2023-05-12
  • Updated: 2025-05-20
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdUnresolved
Related Reports
Blocks :  
Relates :  
Relates :  
Relates :  
Description
We see a performance refression after migrating from JDK 8 to JDK 17/21. Similarly to JDK-8305670, the regression is related to Thread-SMR and a large number of background (idle) threads. Async-profiler shows the most time is spent in ThreadsSMRSupport::free_list called from JVM_StartThread - see attached JVM_StartThread.png.

I'm attaching the JMH benchmark that demonstrates the issue: ThreadStart.java

On JDK 8, the performance of Thread.start does not depend much on the number of threads:

Benchmark                (threadCount)  Mode  Cnt  Score   Error  Units
ThreadStart.threadStart              0  avgt    5  0.072 ± 0.006  ms/op
ThreadStart.threadStart           1000  avgt    5  0.073 ± 0.003  ms/op
ThreadStart.threadStart          10000  avgt    5  0.079 ± 0.007  ms/op
ThreadStart.threadStart          30000  avgt    5  0.088 ± 0.009  ms/op

However, on JDK 21, the latency of Thread.start degrades significantly with the number of idle threads:

Benchmark                (threadCount)  Mode  Cnt  Score   Error  Units
ThreadStart.threadStart              0  avgt    5  0.088 ± 0.002  ms/op
ThreadStart.threadStart           1000  avgt    5  0.111 ± 0.007  ms/op
ThreadStart.threadStart          10000  avgt    5  0.391 ± 0.024  ms/op
ThreadStart.threadStart          30000  avgt    5  0.828 ± 0.026  ms/op

The reason of a slowdown is a linear scan in ThreadsSMRSupport::free_list.
Comments
I did find one piece of low hanging fruit that I will fix seperately. We have the following code at the end of free_list ValidateHazardPtrsClosure validate_cl; threads_do(&validate_cl); but ValidateHazardPtrsClosure::do_thread is defined as: virtual void do_thread(Thread* thread) { assert_locked_or_safepoint(Threads_lock); if (thread == nullptr) return; ThreadsList *hazard_ptr = thread->get_threads_hazard_ptr(); if (hazard_ptr == nullptr) return; // If the hazard ptr is unverified, then ignore it since it could // be deleted at any time now. if (Thread::is_hazard_ptr_tagged(hazard_ptr)) return; assert(ThreadsList::is_valid(hazard_ptr), "hazard_ptr=" INTPTR_FORMAT " for thread=" INTPTR_FORMAT " is not valid!", p2i(hazard_ptr), p2i(thread)); } so it actually does nothing except check an assertion! So we should only be using the closure in a debug build. This avoids a full linear scan of the active threads-list. Running a simplifed non-JMH version of the benchmark we have before: [0] Thread.start took: 66567 [1000] Thread.start took: 85998 [10000] Thread.start took: 330554 [20000] Thread.start took: 532377 and after [0] Thread.start took: 60963 [1000] Thread.start took: 81571 [10000] Thread.start took: 220142 [20000] Thread.start took: 367298
27-02-2025

That low-hanging fruit was actually missed by JDK-8264624 even though it was called out by Rehn: https://bugs.openjdk.org/browse/JDK-8264624?focusedId=14413280&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14413280
25-02-2025

I don't see how the linear scan can be avoided as we are checking the set of potentially deletable threads-lists to see if they actually are deletable - which requires examining each one and the threads that are using it. It may be possible to hand the work off to the service thread instead and so allow the starting thread to proceed more quickly, but then we risk overwhelming the service thread.
19-02-2025