JDK-8207266 : ThreadMXBean::getThreadAllocatedBytes() can be quicker for self thread
  • Type: Enhancement
  • Component: core-svc
  • Sub-Component: java.lang.management
  • Affected Version: 10
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • OS: generic
  • CPU: generic
  • Submitted: 2018-07-13
  • Updated: 2020-02-18
  • Resolved: 2019-09-18
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 14
14 b15Fixed
Related Reports
CSR :  
Relates :  
Relates :  
Relates :  
Description
markus.gaisbauer@gmail.com posted this:

On 7/13/18 12:35 PM, Markus Gaisbauer wrote:
> Hello,
>
> I am trying to use ThreadMXBean::getThreadAllocatedBytes (com.sun.management) to get the amount of allocated memory of the current thread in some performance critical code.
>
> Unfortunately, the current implementation can be rather slow and the duration of each call unpredictable. I ran a test in a JVM with 500 threads. Depending on which thread was queried, getThreadAllocatedBytes took between 100 ns and 2500 ns.
>
> The root cause of the problem is ThreadsList::find_JavaThread_from_java_tid which performs a linear scan through all Java threads in the current process. The more threads a JVM has, the slower it gets. In the worst case, the thread with the given TID is found as the last entry in the list.
>
> Before Java 10, the oldest thread is the slowest one to query.
> Since Java 10, the youngest thread is the slowest one to query. I think this was a side effect of introducing "Thread Safe Memory Reclamation (Thread-SMR) support".
>
>              Oldest Thread   Youngest Thread
> Java 8             8740 ns             76 ns
> Java 10             109 ns           2485 ns
>
> A common use case is to query the metric for the current thread (e.g. before and after performing some operation). This case can be optimized by introducing a new method: getCurrentThreadAllocatedBytes.
>
> I created a patch for http://hg.openjdk.java.net/jdk/jdk/ and by using the new method I saw the following improvements in my test:
>  
>              Oldest Thread   Youngest Thread
> Proposal             37 ns             37 ns
>
> This is a 60x improvement over the worst case of the current API. In the best case of the current API, the new method is still 3 times faster.
>
> // based on JVM_SetNativeThreadName in jvm.cpp.
> JVM_ENTRY(jlong, jmm_GetCurrentThreadAllocatedMemory(JNIEnv *env, jobject currentThread))
>   // We don't use a ThreadsListHandle here because the current thread
>   // must be alive.
>   oop java_thread = JNIHandles::resolve_non_null(currentThread);
>   JavaThread* thr = java_lang_Thread::thread(java_thread);
>   if (thread == thr) {
>     // only supported for the current thread
>     return thr->cooked_allocated_bytes();
>   }
>   return -1;
> JVM_END
>
> The proposed method also fixes the problem, that getThreadAllocatedBytes itself allocates some memory on the current thread (two long arrays, 24 bytes) and therefore can slightly skew measurements. The new method, getCurrentThreadAllocatedBytes, returns exactly the same value if it is called twice without allocating any memory between those calls.
>
> I also built a variation of this method that could be used to query allocated memory more efficiently for anyone who already has a java.lang.Thread object:
>
> JVM_ENTRY(jlong, jmm_GetThreadAllocatedMemory(JNIEnv *env, jobject threadObj))
>   // based on code proposed in threadSMR.hpp
>   ThreadsListHandle tlh;
>   JavaThread* thr = NULL;
>   bool is_alive = tlh.cv_internal_thread_to_JavaThread(threadObj, &thr, NULL);
>   if (is_alive) {
>     return thr->cooked_allocated_bytes();
>   }
>   return -1;
> JVM_END
>
> This method took 70 ns in my test, which is 85% slower than GetCurrentThreadAllocatedMemory but still 30% faster than the best case of the current API. I currently have no immediate need for this second method, but I think it would also be a valueable addition to the API.
>
> I attached a patch for getCurrentThreadAllocatedBytes. I can create a second patch for also adding getThreadAllocatedMemory(java.lang.Thread) to the API. 
>
> I am a first time contributor and I am not 100% sure what process I must follow to get a change like this into OpenJDK. Can someone have a look at my proposal and help me through the process?
>
> Best regards,
> Markus
Comments
URL: https://hg.openjdk.java.net/jdk/jdk/rev/0f3c23c374a4 User: phh Date: 2019-09-18 12:41:53 +0000
18-09-2019

Email threads: https://mail.openjdk.java.net/pipermail/serviceability-dev/2018-July/024441.html https://mail.openjdk.java.net/pipermail/serviceability-dev/2018-August/024763.html
16-08-2019

com.sun.management.ThreadMXBean.getThreadAllocatedBytes(long id) and getThreadAllocatedBytes(long[] ids) both call the same JMM entry point : jmm_interface->GetThreadAllocatedMemory(env, ids, sizeArray); The fast path can be done in Java in both getThreadAllocatedBytes(long id) and getThreadAllocatedBytes(long[] ids) where the thread ID requested is itself; if so, call a new JMM entry point: jlong GetCurrentThreadAllocatedMemory(env); This fast path would avoid creating two long arrays and various check.
29-11-2018

I got a little more time and posted once more: On 7/13/18 4:46 PM, Daniel D. Daugherty wrote: > > I believe this is the code that's causing you grief: > > open/src/hotspot/share/services/management.cpp: > > // Gets an array containing the amount of memory allocated on the Java > // heap for a set of threads (in bytes). Each element of the array is > // the amount of memory allocated for the thread ID specified in the > // corresponding entry in the given array of thread IDs; or -1 if the > // thread does not exist or has terminated. > JVM_ENTRY(void, jmm_GetThreadAllocatedMemory(JNIEnv *env, jlongArray ids, > jlongArray sizeArray)) > // Check if threads is null > if (ids == NULL || sizeArray == NULL) { > THROW(vmSymbols::java_lang_NullPointerException()); > } > > ResourceMark rm(THREAD); > typeArrayOop ta = typeArrayOop(JNIHandles::resolve_non_null(ids)); > typeArrayHandle ids_ah(THREAD, ta); > > typeArrayOop sa = typeArrayOop(JNIHandles::resolve_non_null(sizeArray)); > typeArrayHandle sizeArray_h(THREAD, sa); > > // validate the thread id array > validate_thread_id_array(ids_ah, CHECK); > > // sizeArray must be of the same length as the given array of thread IDs > int num_threads = ids_ah->length(); > if (num_threads != sizeArray_h->length()) { > THROW_MSG(vmSymbols::java_lang_IllegalArgumentException(), > "The length of the given long array does not match the length of " > "the given array of thread IDs"); > } > > ThreadsListHandle tlh; > for (int i = 0; i < num_threads; i++) { > JavaThread* java_thread = tlh.list()->find_JavaThread_from_java_tid(ids_ah->long_at(i)); > if (java_thread != NULL) { > sizeArray_h->long_at_put(i, java_thread->cooked_allocated_bytes()); > } > } > JVM_END > > > Perhaps something like this above the "ThreadsListHandle tlh;" line: > > if (num_threads == 1 && THREAD->is_Java_thread()) { > // Only asking for 1 thread so if we're a JavaThread, then > // see if this request is for ourself. > JavaThread* jt = THREAD; > oop tobj = jt->threadObj(); > > if (ids_ah->long_at(0) == java_lang_Thread::thread_id(tobj)) { > // Return the info for ourself. > sizeArray_h->long_at_put(0, jt->cooked_allocated_bytes()); > return; > } > } > > I haven't checked to see if this will even compile, but I > think you'll get the idea. > > Dan
13-07-2018

I posted the following initial reply: On 7/13/18 2:44 PM, Daniel D. Daugherty wrote: > On 7/13/18 12:35 PM, Markus Gaisbauer wrote: >> Hello, >> >> I am trying to use ThreadMXBean::getThreadAllocatedBytes (com.sun.management) to get the amount of allocated memory of the current thread in some performance critical code. >> >> Unfortunately, the current implementation can be rather slow and the duration of each call unpredictable. I ran a test in a JVM with 500 threads. Depending on which thread was queried, getThreadAllocatedBytes took between 100 ns and 2500 ns. >> >> The root cause of the problem is ThreadsList::find_JavaThread_from_java_tid which performs a linear scan through all Java threads in the current process. The more threads a JVM has, the slower it gets. In the worst case, the thread with the given TID is found as the last entry in the list. >> >> Before Java 10, the oldest thread is the slowest one to query. >> Since Java 10, the youngest thread is the slowest one to query. I think this was a side effect of introducing "Thread Safe Memory Reclamation (Thread-SMR) support". >> >> Oldest Thread Youngest Thread >> Java 8 8740 ns 76 ns >> Java 10 109 ns 2485 ns > > It is good to see that longest search is much faster. Erik and Robbin > will be pleased since speeding up traversal of the ThreadsList was one > of the things that we tried to do during the Thread-SMR project. > > A first step is get a new bug filed that documents the issue with > ThreadMXBean::getThreadAllocatedBytes(). Perhaps Gary or Serguei > will take care of that. > > Dan
13-07-2018