JDK-8229391 : Improve performance for com.sun.management.ThreadMXBean#getThreadAllocatedBytes(long) for current thread
  • Type: Enhancement
  • Component: core-svc
  • Sub-Component: java.lang.management
  • Affected Version: 8
  • Priority: P4
  • Status: Open
  • Resolution: Unresolved
  • Submitted: 2019-08-11
  • Updated: 2019-09-17
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdUnresolved
Related Reports
Relates :  
Relates :  
Description
This issue is created to provide  an additional optimization for  com.sun.management.ThreadMXBean#getThreadAllocatedBytes(long) method as suggested by Paul Hohensee  in the comment to JDK-8185005.

"Paul Hohensee added a comment - 2019-05-23 14:09
Further contribution by Matt Bonner <matthew.bonnar@veeva.com> and Nathan Janken <nathan.janken@veeva.com>. 

Change Overview 

The Corretto-8 JDK includes an optimization for an internal JVM function that looks up threads by ids (see https://github.com/corretto/corretto-8/commit/47886d111152b18428b9a6fc7df2b0d6081e02b9). That change results in significant performance improvements, including cases where thread-allocated memory needs to be checked. However there is still room for further improvements in scenarios where threads check their own allocated bytes. Our initial benchmarks show that in multi-threaded scenarios the time complexity can be reduced from O(n) to O(1), where n is the number of threads performing concurrent memory allocation checks. In these types of scenarios performance is often critical. For instance these checks may be running several times within a single application request in order to profile memory usage. 

With the current Corretto-8 JDK implementation, when a thread's allocated bytes are checked, e.g. by using com.sun.management.ThreadMXBean#getThreadAllocatedBytes(long), the Thread_lock mutex must be acquired in order to call Threads::find_java_thread_from_java_tid. In scenarios where a thread is checking its own allocated bytes it does not need to find the thread because it is readily available. 

The logic for the proposed change is as follows: 

When a request is made to get a thread's allocated bytes, check if there is only one thread being requested and whether that thread is the current thread 
        If it is, simply return the allocated bytes for the current thread 
        If it isn't, fall back to the lock acquisition/thread look-up 

Git Diff 

diff --git a/src/hotspot/src/share/vm/services/management.cpp b/src/hotspot/src/share/vm/services/management.cpp 
index a8e6b0b2..02aff6d1 100644 
--- a/src/hotspot/src/share/vm/services/management.cpp 
+++ b/src/hotspot/src/share/vm/services/management.cpp 
@@ -2236,6 +2236,17 @@ JVM_ENTRY(void, jmm_GetThreadAllocatedMemory(JNIEnv *env, jlongArray ids, 
              "the given array of thread IDs"); 
  } 
+ if (num_threads == 1 && THREAD->is_Java_thread()) { 
+ JavaThread* current_thread = (JavaThread*)THREAD; 
+ 
+ if (ids_ah->long_at(0) == java_lang_Thread::thread_id(current_thread->threadObj())) { 
+ // If the only thread being requested is the current java thread, 
+ // simply return its allocated bytes. 
+ sizeArray_h->long_at_put(0, current_thread->cooked_allocated_bytes()); 
+ return; 
+ } 
+ } 
+ 
  MutexLockerEx ml(Threads_lock); 
  for (int i = 0; i < num_threads; i++) { 
    JavaThread* java_thread = Threads::find_java_thread_from_java_tid(ids_ah->long_at(i)); 

Performance Test Results 

To test the performance impact of this change, we set up a few scenarios where we could compare the overall run time. Our approach involved running and timing two simple Java programs: one that makes use of our proposed optimization (Self-Monitoring Threads) and one that does not but hits the affected code path (Global Monitoring Thread). 

Self Monitoring Threads 

For this case we ran a Java program that sets multiple "task" threads that each call com.sun.management.ThreadMXBean#getThreadAllocatedBytes(long) in a loop with a fixed amount of iterations, and we timed the execution with a varying amount of task threads. See attached file self-monitoring-results.png. 

Global Monitoring Thread 

We also wanted to ensure that the proposed change does not negatively impact another common use case that this API supports: the case where one global monitoring thread performs memory allocation checks against several other threads. 

To do so, we ran a Java program that sets up multiple "task" threads that perform simple memory allocations (instantiating Strings) in a loop, while the originating thread calls com.sun.management.ThreadMXBean#getThreadAllocatedBytes(long) for each of the task threads in a loop with a fixed amount of iterations. Similar to the previous case, we then timed the execution with a varying amount of task threads. See attached file global-monitoring-results.png. 

Unit Testing 

We decided not to add a new unit test for this change because we didn't believe it to be practical since this change is solely intended to optimize performance. There should be no change in functional behavior and the existing functionality is already covered by a unit test (src/jdk/test/com/sun/management/ThreadMXBean/ThreadAllocatedMemory.java) which passes with our proposed patch applied. "