JDK-8185005 : Improve performance of ThreadMXBean.getThreadInfo(long ids[], int maxDepth)
  • Type: Enhancement
  • Component: core-svc
  • Sub-Component: java.lang.management
  • Affected Version: 8,10,11,14
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • OS: generic
  • CPU: generic
  • Submitted: 2017-07-20
  • Updated: 2022-06-10
  • Resolved: 2019-09-25
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11 JDK 13 JDK 14
11.0.7-oracleFixed 13.0.10Fixed 14 b16Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Relates :  
Description
The implementation of getThreadInfo(long ids[], int maxDepth) iterates over the ids array, calling Threads::find_java_thread_from_java_tid() for each one. find_java_thread_from_java_tid() does a linear search over the thread list, so if the ids array length is large, it can take quite some time to find the corresponding JavaThread*s. One idea is to add find_java_threads_from_java_tids() to get them all at once and have it put the content of the ids array into a set (hashtable, perhaps) for a "sufficiently large" ids array.
Comments
RFR (13u) I'd like to backport this for parity with jdk11/15. The patch requires some adoption as described at PR for jdk11. Tested with tier1
28-09-2021

Fix request (11u) I would like to downport this for parity with 11.0.7-oracle. I had to do some trivial resolves, see: http://mail.openjdk.java.net/pipermail/jdk-updates-dev/2020-January/002333.html I had to do some more adaptions, ConcurrentHashTable differs between 11 and 14. http://mail.openjdk.java.net/pipermail/jdk-updates-dev/2020-January/002377.html Build and test green now.
16-01-2020

Yes, I mean ConcurrentHashTable. Corrected above. Thanks!
16-01-2020

Did you mean ConcurrentHashTable instead of ConcurrentHashMap ?
15-01-2020

URL: https://hg.openjdk.java.net/jdk/jdk/rev/f4abe950c3b0 User: dtitov Date: 2019-09-25 18:16:14 +0000
25-09-2019

The connection is that the authors say that their patch is built on top of the attached jdk8u patch. That said, their patch could be done independently. I'll split it out into its own issue if you like. If one builds one's system such that there's only a single method that calls getThreadAllocatedMemory, then one would use the array argument version rather than complicate one's code by special casing it.
24-05-2019

What's the connection between GetThreadAllocatedMemory and getThreadInfo? Optimising these list-of-threads commands for the special case of a list-of-one containing only the current thread, obviously helps if they are used that way a lot. But these methods also have the single-thread-id version for that very case.
24-05-2019

Further contribution by Matt Bonner <matthew.bonnar@veeva.com> and Nathan Janken <nathan.janken@veeva.com>. Change Overview The Corretto-8 JDK includes an optimization for an internal JVM function that looks up threads by ids (see https://github.com/corretto/corretto-8/commit/47886d111152b18428b9a6fc7df2b0d6081e02b9). That change results in significant performance improvements, including cases where thread-allocated memory needs to be checked. However there is still room for further improvements in scenarios where threads check their own allocated bytes. Our initial benchmarks show that in multi-threaded scenarios the time complexity can be reduced from O(n) to O(1), where n is the number of threads performing concurrent memory allocation checks. In these types of scenarios performance is often critical. For instance these checks may be running several times within a single application request in order to profile memory usage. With the current Corretto-8 JDK implementation, when a thread's allocated bytes are checked, e.g. by using com.sun.management.ThreadMXBean#getThreadAllocatedBytes(long), the Thread_lock mutex must be acquired in order to call Threads::find_java_thread_from_java_tid. In scenarios where a thread is checking its own allocated bytes it does not need to find the thread because it is readily available. The logic for the proposed change is as follows: When a request is made to get a thread's allocated bytes, check if there is only one thread being requested and whether that thread is the current thread If it is, simply return the allocated bytes for the current thread If it isn't, fall back to the lock acquisition/thread look-up Git Diff diff --git a/src/hotspot/src/share/vm/services/management.cpp b/src/hotspot/src/share/vm/services/management.cpp index a8e6b0b2..02aff6d1 100644 --- a/src/hotspot/src/share/vm/services/management.cpp +++ b/src/hotspot/src/share/vm/services/management.cpp @@ -2236,6 +2236,17 @@ JVM_ENTRY(void, jmm_GetThreadAllocatedMemory(JNIEnv *env, jlongArray ids, "the given array of thread IDs"); } + if (num_threads == 1 && THREAD->is_Java_thread()) { + JavaThread* current_thread = (JavaThread*)THREAD; + + if (ids_ah->long_at(0) == java_lang_Thread::thread_id(current_thread->threadObj())) { + // If the only thread being requested is the current java thread, + // simply return its allocated bytes. + sizeArray_h->long_at_put(0, current_thread->cooked_allocated_bytes()); + return; + } + } + MutexLockerEx ml(Threads_lock); for (int i = 0; i < num_threads; i++) { JavaThread* java_thread = Threads::find_java_thread_from_java_tid(ids_ah->long_at(i)); Performance Test Results To test the performance impact of this change, we set up a few scenarios where we could compare the overall run time. Our approach involved running and timing two simple Java programs: one that makes use of our proposed optimization (Self-Monitoring Threads) and one that does not but hits the affected code path (Global Monitoring Thread). Self Monitoring Threads For this case we ran a Java program that sets multiple "task" threads that each call com.sun.management.ThreadMXBean#getThreadAllocatedBytes(long) in a loop with a fixed amount of iterations, and we timed the execution with a varying amount of task threads. See attached file self-monitoring-results.png. Global Monitoring Thread We also wanted to ensure that the proposed change does not negatively impact another common use case that this API supports: the case where one global monitoring thread performs memory allocation checks against several other threads. To do so, we ran a Java program that sets up multiple "task" threads that perform simple memory allocations (instantiating Strings) in a loop, while the originating thread calls com.sun.management.ThreadMXBean#getThreadAllocatedBytes(long) for each of the task threads in a loop with a fixed amount of iterations. Similar to the previous case, we then timed the execution with a varying amount of task threads. See attached file global-monitoring-results.png. Unit Testing We decided not to add a new unit test for this change because we didn't believe it to be practical since this change is solely intended to optimize performance. There should be no change in functional behavior and the existing functionality is already covered by a unit test (src/jdk/test/com/sun/management/ThreadMXBean/ThreadAllocatedMemory.java) which passes with our proposed patch applied.
23-05-2019

Attached a jdk8 patch for future reference. It might be possible to use the open-addressed hash table in the patch as the Thread list collection, so as not to add as much memory footprint.
31-10-2018

This is not on our list of current priorities. If there are additional specific customer requirements, we will consider reopening this issue. Closing as WNF.
18-12-2017