JDK-8274298 : JFR Thread Sampler thread must not acquire malloc lock after suspending a thread because of possible deadlock
  • Type: Bug
  • Component: hotspot
  • Sub-Component: jfr
  • Affected Version: 15,16,17,18
  • Priority: P2
  • Status: Closed
  • Resolution: Fixed
  • Submitted: 2021-09-24
  • Updated: 2022-01-11
  • Resolved: 2021-12-09
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 18 JDK 19
18 b28Fixed 19Fixed
Related Reports
Relates :  
Description
Suspendee thread (state "sigsuspended"):

    frame #0: 0x00007fff20413dde libsystem_kernel.dylib`__sigsuspend + 10
    frame #1: 0x00000001103c5f98 libjvm.dylib`SR_handler(int, __siginfo*, __darwin_ucontext*) + 248
    frame #2: 0x00007fff2046bd7d libsystem_platform.dylib`_sigtramp + 29
    frame #3: 0x00007fff2024f0f9 libsystem_malloc.dylib`small_malloc_from_free_list + 532
    frame #4: 0x00007fff2024e877 libsystem_malloc.dylib`small_malloc_should_clear + 259
    frame #5: 0x00007fff2024e692 libsystem_malloc.dylib`szone_malloc_should_clear + 109
    frame #6: 0x00007fff20267f3b libsystem_malloc.dylib`_malloc_zone_malloc + 118
    frame #7: 0x0000000110312e80 libjvm.dylib`os::malloc(unsigned long, MEMFLAGS, NativeCallStack const&) + 240
    frame #8: 0x000000010fbf5e32 libjvm.dylib`AllocateHeap(unsigned long, MEMFLAGS, AllocFailStrategy::AllocFailEnum) + 98
    frame #9: 0x00000001104def34 libjvm.dylib`vframeArray::allocate(JavaThread*, int, GrowableArray<compiledVFrame*>*, RegisterMap*, frame, frame, frame, bool) + 52
    frame #10: 0x000000010fdfba4c libjvm.dylib`Deoptimization::fetch_unroll_info_helper(JavaThread*, int) + 1356
    frame #11: 0x000000010fdfb4aa libjvm.dylib`Deoptimization::fetch_unroll_info(JavaThread*, int) + 42

JFR Thread Sampler thread:

    frame #5: 0x0000000110312e80 libjvm.dylib`os::malloc(unsigned long, MEMFLAGS, NativeCallStack const&) + 240
    frame #6: 0x000000010fbf5d84 libjvm.dylib`AllocateHeap(unsigned long, MEMFLAGS, NativeCallStack const&, AllocFailStrategy::AllocFailEnum) + 20
    frame #7: 0x000000010ffb7cea libjvm.dylib`JfrCHeapObj::allocate_array_noinline(unsigned long, unsigned long) + 106
    frame #8: 0x000000010fff346c libjvm.dylib`JfrEpochStorageHost<JfrBuffer, JfrMspaceRemoveRetrieval, false>::acquire(unsigned long, Thread*) + 220
    frame #9: 0x000000010fff2b70 libjvm.dylib`JfrEpochQueue<JfrEpochQueueKlassPolicy>::enqueue(Klass const*) + 160
    frame #10: 0x000000010ffe636f libjvm.dylib`JfrStackTrace::record_thread(JavaThread&, frame&) + 607
    frame #11: 0x000000010ffefaeb libjvm.dylib`OSThreadSampler::protected_task(os::SuspendedThreadTaskContext const&) + 139
    frame #12: 0x000000011031da3f libjvm.dylib`os::ThreadCrashProtection::call(os::CrashProtectionCallback&) + 79
    frame #13: 0x000000010ffef953 libjvm.dylib`OSThreadSampler::do_task(os::SuspendedThreadTaskContext const&) + 131
    frame #14: 0x00000001103c6203 libjvm.dylib`os::SuspendedThreadTask::internal_do_task() + 67
    frame #15: 0x000000011031505e libjvm.dylib`os::SuspendedThreadTask::run() + 14
    frame #16: 0x000000010fff02b1 libjvm.dylib`JfrThreadSampleClosure::do_sample_thread(JavaThread*, JfrStackFrame*, unsigned int, JfrSampleType) + 385
    frame #17: 0x000000010fff0c59 libjvm.dylib`JfrThreadSampler::task_stacktrace(JfrSampleType, JavaThread**) + 793
    frame #18: 0x000000010fff08d1 libjvm.dylib`JfrThreadSampler::run() + 353

An invariant exists that the JFR Thread Sampler thread does not take critical locks because it can result in deadlock situations just like the one listed above. 

JDK-8233705 introduced a means to only iterate the incrementally tagged set of klasses in the JVM - instead of all - as part of serializing metadata information. The newly tagged klasses are enqueued onto a thread-local buffer using a load barrier. If the buffer runs out of space, another is accommodated from a free list or via a new allocation. Unfortunately, this is problematic for the JfrThreadSampler because the suspended thread can hold the malloc lock. For example, if the suspended thread is in the process of deoptimization (above). The Deoptimization::UnrollBlock is a CHeapObj, and the vframes arrays use the NEW_C_HEAP_ARRAY macro. 

 The enqueue mechanism introduced in JDK-8233705 needs to accommodate special handling for the JFR Thread Sampler thread. 

 One solution to this problem is to explicitly monitor the size of the thread-local buffer of the JfrThreadSampler thread and pre-emptively renew it before thread suspension. 
Comments
Changeset: 965ea8d9 Author: Markus Grönlund <mgronlun@openjdk.org> Date: 2021-12-09 09:29:59 +0000 URL: https://git.openjdk.java.net/jdk/commit/965ea8d9cd29aee41ba2b1b0b0c67bb67eca22dd
09-12-2021