JDK-8329109 : Threads::print_on() tries to print CPU time for terminated GC threads
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 17,23
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2024-03-26
  • Updated: 2024-10-24
  • Resolved: 2024-04-05
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 17 JDK 21 JDK 22 JDK 23
17.0.13Fixed 21.0.4Fixed 22.0.2Fixed 23 b18Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Relates :  
Description
In Threads::print_on() PrintOnClosure is applied to the GC threads:

https://github.com/openjdk/jdk/blob/da8a095a19c90e7ee2b45fab9b533a1092887023/src/hotspot/share/runtime/threads.cpp#L1340

This is problematic if the GC thread has terminated because Thread::print_on() attempts to retrieve and print the CPU time. This can crash if the thread has terminated.

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007fd987694580, pid=32024, tid=32074
#
# JRE version: OpenJDK Runtime Environment SapMachine (17.0.10+7) (build 17.0.10+7-LTS)
# Java VM: OpenJDK 64-Bit Server VM SapMachine (17.0.10+7-LTS, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# C  [libpthread.so.0+0xf580]  pthread_getcpuclockid+0x0
#
# Core dump will be written. Default location: Core dumps may be processed with "/opt/dynatrace/oneagent/agent/rdp -p %p -P %P -c %c -e %e -s %s" (or dumping to /app/tomcat_trunk.20240320003323/core.32024)
#
# If you would like to submit a bug report, please visit:
#   https://github.com/SAP/SapMachine/issues/new
#

---------------  T H R E A D  ---------------
Current thread (0x00007fd98277b6a0):  VMThread "VM Thread" [stack: 0x00007fd948a9a000,0x00007fd948b9a000] [id=32074]
Stack: [0x00007fd948a9a000,0x00007fd948b9a000],  sp=0x00007fd948b98778,  free space=1017k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [libpthread.so.0+0xf580]  pthread_getcpuclockid+0x0
V  [libjvm.so+0xe65d05]  Thread::print_on(outputStream*, bool) const+0x45
V  [libjvm.so+0xbf79f7]  NamedThread::print_on(outputStream*) const+0x57
V  [libjvm.so+0xe6e7d1]  PrintOnClosure::do_thread(Thread*)+0x21
V  [libjvm.so+0x6ebf96]  G1CollectedHeap::gc_threads_do(ThreadClosure*) const+0x26
V  [libjvm.so+0xe6a6e9]  Threads::print_on(outputStream*, bool, bool, bool, bool)+0x469
V  [libjvm.so+0xefa60c]  VM_Operation::evaluate()+0xec
V  [libjvm.so+0xefbee8]  VMThread::evaluate_operation(VM_Operation*)+0xb8
V  [libjvm.so+0xefc3d4]  VMThread::inner_execute(VM_Operation*)+0x1d4
V  [libjvm.so+0xefc6bf]  VMThread::run()+0xbf
V  [libjvm.so+0xe6b6c8]  Thread::call_run()+0xa8
V  [libjvm.so+0xc29461]  thread_native_entry(Thread*)+0xe1

Other Threads:
=>0x00007fd98277b6a0 VMThread "VM Thread" [stack: 0x00007fd948a9a000,0x00007fd948b9a000] [id=32074]
  0x00007fd85851c390 Thread "vitals sampler thread" [stack: 0x00007fd8ccc27000,0x00007fd8ccd27000] [id=32740]
  0x00007fd9816c48f0 GCTaskThread "GC Thread#0" [stack: 0x00007fd9490a0000,0x00007fd9491a0000] [id=32067]
  0x00007fd958005840 GCTaskThread "GC Thread#1" [stack: 0x00007fd94809a000,0x00007fd94819a000] [id=32120]
  0x00007fd958005ed0 GCTaskThread "GC Thread#2" [stack: 0x00007fd8cda3c000,0x00007fd8cdb3c000] [id=32121]
  0x00007fd9580066e0 GCTaskThread "GC Thread#3" [stack: 0x00007fd8cd93b000,0x00007fd8cda3b000] [id=32122]
  0x00007fd958006fe0 GCTaskThread "GC Thread#4" [stack: 0x00007fd8cd83a000,0x00007fd8cd93a000] [id=32123]
  0x00007fd9580078e0 GCTaskThread "GC Thread#5" [stack: 0x00007fd8cd739000,0x00007fd8cd839000] [id=32124]
  0x00007fd9580083a0 GCTaskThread "GC Thread#6" [stack: 0x00007fd8cd638000,0x00007fd8cd738000] [id=32125]
  0x00007fd958008e60 GCTaskThread "GC Thread#7" [stack: 0x00007fd8cd537000,0x00007fd8cd637000] [id=32126]
  0x00007fd9816cbc00 ConcurrentGCThread "G1 Main Marker" terminated
  0x00007fd9816ccbb0 ConcurrentGCThread "G1 Conc#0" [stack: 0x00007fd948e9e000,0x00007fd948f9e000] [id=32069]
  0x00007fd982535c50 ConcurrentGCThread "G1 Refine#0" terminated
  0x00007fd8b8001a10 ConcurrentGCThread "G1 Refine#1" terminated
  0x00007fd820002850 ConcurrentGCThread "G1 Refine#2" terminated
  0x00007fd81c000f40 ConcurrentGCThread "G1 Refine#3" terminated
  0x00007fd8a4008a00 ConcurrentGCThread "G1 Refine#4" terminated
  0x00007fd8b0021ee0 ConcurrentGCThread "G1 Refine#5" terminated
  0x00007fd8ac4f4ad0 ConcurrentGCThread "G1 Refine#6" terminated
  0x00007fd8bc001670 ConcurrentGCThread "G1 Refine#7" terminated
  0x00007fd982536b90 ConcurrentGCThread "G1 Service" terminated
  0x00007fd98276b8e0 ConcurrentGCThread "StringDedupProcessor" terminated

Comments
jdk17u-dev backport request I would like to have the fix in jdk17u-dev as well, because the issue (potential crash) is present there too. The backport is not fully clean but fits in well and is low risk.
10-07-2024

A pull request was submitted for review. Branch: master URL: https://git.openjdk.org/jdk17u-dev/pull/2698 Date: 2024-07-09 10:29:44 +0000
09-07-2024

[jdk22u-fix-request] Approval Request from Liang Mao Clean backport of fixing crash on Threads::print_on() . Risk is very low.
20-05-2024

A pull request was submitted for review. URL: https://git.openjdk.org/jdk22u/pull/215 Date: 2024-05-20 02:42:16 +0000
20-05-2024

[jdk21u-fix-request] Approval Request from Liang Mao Clean backport of fixing crash on Threads::print_on() . Risk is very low.
17-05-2024

A pull request was submitted for review. URL: https://git.openjdk.org/jdk21u-dev/pull/585 Date: 2024-05-17 09:05:54 +0000
17-05-2024

Changeset: c1cfb43d Author: Richard Reingruber <rrich@openjdk.org> Date: 2024-04-05 06:27:08 +0000 URL: https://git.openjdk.org/jdk/commit/c1cfb43d8de8d0eed01b9bb6aa1202dc7cbff007
05-04-2024

A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/18518 Date: 2024-03-27 16:09:46 +0000
28-03-2024

> I don't think locked_or_safepoint works because GC threads run concurrently with safepoints and so may still terminate. That's clear. I didn't assume that the safepoint synchronizes with GC threads. It is the NonJavaThread list and its iterators. Non-java threads wait for in progress iterations when about to terminate [1]. Let's use the PR for discussions of the proposed fix. [1] NonJavaThread::remove_from_the_list() https://github.com/openjdk/jdk/blob/47f33a59eaaffc74881fcc9e29d13ff9b2538c2a/src/hotspot/share/runtime/nonJavaThread.cpp#L97-L102
28-03-2024

I would have expected the solution to "simply" be that in: void G1CollectedHeap::gc_threads_do(ThreadClosure* tc) const { workers()->threads_do(tc); tc->do_thread(_cm_thread); _cm->threads_do(tc); _cr->threads_do(tc); tc->do_thread(_service_thread); } which ever `threads_do` involves a potentially terminating thread would have some kind of guard to prevent that e.g. acquiring Threads_lock. I don't think locked_or_safepoint works because GC threads run concurrently with safepoints and so may still terminate. Of course other GCs may have the same issue.
28-03-2024

This RFE is relevant to this bug: JDK-8202090 evolve ThreadList to protect non-JavaThreads in addition to JavaThreads
27-03-2024

Would it be ok to delegate to Threads::threads_do() if at a safepoint? Terminated threads wouldn't be printed then (thread list iterators prevent this). Draft: https://github.com/openjdk/jdk/pull/18518
27-03-2024

I'm not familiar with the details of GC thread lifecycle (I didn't think they terminated), but it seems G1CollectedHeap::gc_threads_do needs to make allowances for this.
26-03-2024

RT Triage: transferring this to GC because GC threads are involved in the crash
26-03-2024

Could we retrieve the CPU time iff we find the thread with a NonJavaThread::Iterator?
26-03-2024