Bug ID: JDK-8258027 [linux] SIGSEGV pthread

JDK-8258027 : [linux] SIGSEGV pthread_getcpuclockid crash

Type: Bug
Component: hotspot
Sub-Component: runtime
Affected Version: 16

Priority: P2
Status: Closed
Resolution: External
OS: linux_suse_sles_11,linux_ubuntu
CPU: generic

Submitted: 2020-12-10
Updated: 2024-12-11
Resolved: 2020-12-18

Related Reports

Blocks :	JMC-6980 - [Linux] (crash) SIGSEGV pthread_getcpuclockid while JFR Recording of a running JVM instance
Duplicate :	JDK-8258031 - [Linux] pthread_getcpuclockid from JfrThreadCPULoadEvent::send_events()
Relates :	JDK-8329109 - Threads::print_on() tries to print CPU time for terminated GC threads
Relates :	JDK-8345970 - pthread_getcpuclockid related crashes in shenandoah tests

Description

Crash while using JMC JMC 8.0.0 with latest JDK 16 or with JDK 11.0.9 
b07. 

Steps to Reproduce : 
1. Use JMC 8.0.0 latest build and extract the same.
2. Launch JMC with additional arguments "~/pathtojmc/jmc -vm $JDK16_HOME/bin -consoleLog -debug" 
3. Open a running JVM instance in the "JVM browser" and Using Mouse Right click select "Start JMX Console" 
4. "JVM browser" , Right click and select "Start Flight Recording" , (Optional) reduce the "Recording time" to "10 s" instead of 1 m (default), and click on "Next" and "Finish", Wait for recording to complete. 
5. Close the JMC application. (Will have the call stack as attached in this bug) 
6. If this doesn't crash, then instead of closing the application, Repeat the step 4 many times to re-produce the crash.  (Leads to crash with details mentioned in JDK-8258031) 

Note: Crash only on Ubuntu 18.04 / 20.04, OEL 7.6 and SUSE linux , Not in Windows or Mac OS 

Some Part of the call stack is mentioned below and complete call stack is attached. 
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007ff09219d905, pid=15665, tid=15672
#
# JRE version: Java(TM) SE Runtime Environment (16.0+27) (build 16-ea+27-1884)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (16-ea+27-1884, mixed mode, sharing, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# C  [libpthread.so.0+0xf905]  pthread_getcpuclockid+0x5
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to /home/guruhb/ade/temp/8b05/core.15665)
#
# JFR recording file will be written. Location: /home/guruhb/ade/temp/8b05/hs_err_pid15665.jfr
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
#

---------------  S U M M A R Y ------------

...

---------------  T H R E A D  ---------------

Current thread (0x00007ff08c170860):  VMThread "VM Thread" [stack: 0x00007ff057170000,0x00007ff057270000] [id=15672]

Stack: [0x00007ff057170000,0x00007ff057270000],  sp=0x00007ff05726e908,  free space=1018k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [libpthread.so.0+0xf905]  pthread_getcpuclockid+0x5
V  [libjvm.so+0xd71571]  Thread::print_on(outputStream*, bool) const+0x41
V  [libjvm.so+0xd76ea0]  Threads::print_on(outputStream*, bool, bool, bool, bool)+0x190
V  [libjvm.so+0xdf9e6a]  VM_Operation::evaluate()+0xea
V  [libjvm.so+0xdfb745]  VMThread::evaluate_operation(VM_Operation*)+0xb5
V  [libjvm.so+0xdfbb68]  VMThread::inner_execute(VM_Operation*)+0x1c8
V  [libjvm.so+0xdfbe2f]  VMThread::run()+0xbf
V  [libjvm.so+0xd7801d]  Thread::call_run()+0xfd
V  [libjvm.so+0xbd0347]  thread_native_entry(Thread*)+0xe7

Comments

> please file a new issue - this issue was a bug in GTK+ Hi David, I created JDK-8345970 ; we saw this 5 times since end of November. Always on Alpine; according to https://bugs.openjdk.org/browse/JDK-8240187 "However, this is not the case for musl libc which does not check the passed thread id" it is no wonder, Alpine is doing bad when pthread_getcpuclockid is called with an invalid thread id .
11-12-2024
[~mbaesken] please file a new issue - this issue was a bug in GTK+
11-12-2024
We see recently a few of similar crashes ; mostly in shenandoah jtreg jdk24 tests. Those crashes are seen recently on Alpine Linux. Maybe shenandoah calls more into pthread_getcpuclockid, and sometimes we call it on "bad" (already terminated?) threads. Looks like Alpine is even more sensitive to this and pthread_getcpuclockid crashes on such threads? Can we add a little check or assert for 'good' threads ? Example : # SIGSEGV (0xb) at pc=0x00007fd79548e234, pid=24021, tid=24114 # # JRE version: OpenJDK Runtime Environment (24.0) (fastdebug build 24-internal-adhoc.jenkinsi.jdk) # Java VM: OpenJDK 64-Bit Server VM (fastdebug 24-internal-adhoc.jenkinsi.jdk, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, shenandoah gc, linux-amd64) # Problematic frame: # C [ld-musl-x86_64.so.1+0x56234] pthread_getcpuclockid+0x0 Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) C [ld-musl-x86_64.so.1+0x56234] pthread_getcpuclockid+0x0 V [libjvm.so+0x1889bf4] ThreadTimeAccumulator::do_thread(Thread)+0x14 (shenandoahMmuTracker.cpp:51) V [libjvm.so+0x18890c0] ShenandoahMmuTracker::fetch_cpu_times(double&, double&)+0x50 (shenandoahMmuTracker.cpp:76) V [libjvm.so+0x18895ce] ShenandoahMmuTracker::record_young(unsigned long)+0x6e (shenandoahMmuTracker.cpp:100) V [libjvm.so+0x17db715] ShenandoahGenerationalControlThread::service_concurrent_cycle(ShenandoahHeap, ShenandoahGeneration, GCCause::Cause&, bool)+0x1e5 (shenandoahGenerationalControlThread.cpp:618) V [libjvm.so+0x17dc0c8] ShenandoahGenerationalControlThread::service_concurrent_normal_cycle(ShenandoahGenerationalHeap, ShenandoahGenerationType, GCCause::Cause)+0x128 (shenandoahGenerationalControlThread.cpp:581) V [libjvm.so+0x17dcde2] ShenandoahGenerationalControlThread::run_service()+0x642 (shenandoahGenerationalControlThread.cpp:229) V [libjvm.so+0xabca5b] ConcurrentGCThread::run()+0x1b (concurrentGCThread.cpp:48) V [libjvm.so+0x1a8b2d6] Thread::call_run()+0xb6 (thread.cpp:232) V [libjvm.so+0x15aa58a] thread_native_entry(Thread*)+0x17a (os_linux.cpp:849) siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x00007fd7792b8b68 Should I create a new JBS issue ?
10-12-2024
Based on the analysis I am closing this as an "External" issue. The SWT callback code calls jni_AttachAsDaemon, performs a java upcall and then detaches from the VM again. The suspicion is that a GTK error causes the attached thread to terminate abruptly without ever detaching from the VM.
18-12-2020
ILW = HMM = P2
15-12-2020
Thanks [~pchilanomate] for the additional investigation and analysis.
14-12-2020
I suspect this may be a long standing potential bug in GTK+/SWT whereby an error in the thread attached to the VM causes it to abort via pthread_exit, and failing to detach from the VM in the process. If that happens the VM has no way to detect, or correct for, an invalid pthread id. If we're lucky then pthread_getcpuclockid returns ESRCH; and if unlucky it crashes.
14-12-2020
This is looking to me like a JMC issue. I downloaded JMC as directed and simply started it with no arguments and after a few seconds it just crashed with the same fault, but this is with 8u262! # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x00007fe579fb8f10, pid=25939, tid=0x00007fe50511c700 # # JRE version: OpenJDK Runtime Environment (8.0_262-b10) (build 1.8.0_262-b10) # Java VM: OpenJDK 64-Bit Server VM (25.262-b10 mixed mode linux-amd64 compressed oops) # Problematic frame: # C [libpthread.so.0+0xcf10] pthread_getcpuclockid+0x0
11-12-2020
If pthread_getcpuclockid is crashing that suggests a bug in the pthreads implementation. We know the VM can sometimes encounter a terminated thread that failed to detach (which is an application bug), in which case we pass an "invalid" pthread_t to pthread_getcpuclockid, but that should result in ESRCH error at worst, not a crash. Update: the validity check is very basic and just involves assuming the value is a pointer to a pthread struct and then checking one field is >= 0. So if the value is completely bogus then attempt read the field can fault.
11-12-2020
Looking at more of the hs_err logs I see this problem in a number of them e.g. https://bugs.openjdk.java.net/secure/attachment/91488/hs_err_pid145310.log 0x00007f80f0027990 JavaThread "Thread-5" daemon [_thread_in_native, id=145396, stack(0x00007f807bbff000,0x00007f807c3fe000)] 0x00007f818802eca0 JavaThread "Thread-23" daemon [_thread_in_native, id=145525, stack(0x00007f807bbff000,0x00007f807c3fe000)] 0x00007f8144609bb0 JavaThread "Thread-31" daemon [_thread_in_native, id=146142, stack(0x00007f807bbff000,0x00007f807c3fe000)] What are these non-descript daemon threads? Are they native threads that have attached to the VM? How can they have the same stack!
11-12-2020
I took a look at the possibly related Eclipse crash, and the failure modes do all seem to be the same. In each case the faulting address is close to the start of a thread's stack (stacks grow down). What I noticed with the hs_err log from the eclipse crash was very interesting: 0x00007fcc80953000 JavaThread "Thread-92" daemon [_thread_in_native, id=26348, stack(0x00007fcb1b501000,0x00007fcb1bd00000)] 0x00007fcbc408c000 JavaThread "Thread-99" daemon [_thread_in_native, id=26381, stack(0x00007fcb1b501000,0x00007fcb1bd00000)] 0x00007fcbf8406800 JavaThread "Thread-105" daemon [_thread_in_native, id=26505, stack(0x00007fcb1b501000,0x00007fcb1bd00000)] 0x00007fcc14048000 JavaThread "Thread-109" daemon [_thread_in_native, id=26562, stack(0x00007fcb1b501000,0x00007fcb1bd00000)] 0x00007fcc5893e800 JavaThread "Thread-116" daemon [_thread_in_native, id=26592, stack(0x00007fcb1b501000,0x00007fcb1bd00000)] We have 5 threads all claiming to have the same stack! And the faulting address is near the start of that stack (0x00007fcb1bcff9d0) https://bugs.debian.org/cgi-bin/bugreport.cgi?att=1;bug=954822;filename=hs_err_pid25471.log;msg=5
11-12-2020
Unfortunately I am unable to reproduce the crash with my fastdebug build.
11-12-2020