JDK-8205878 : pthread_getcpuclockid is expected to return 0 code
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 11
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • OS: linux
  • Submitted: 2018-06-26
  • Updated: 2024-03-26
  • Resolved: 2018-07-10
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11 JDK 12
11 b22Fixed 12Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Description
jib > #
jib > # A fatal error has been detected by the Java Runtime Environment:
jib > #
jib > #  Internal Error (/scratch/opt/mach5/mesos/work_dir/slaves/c4ee7e63-1ded-4e8c-9581-ce26f27e3af4-S143/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/3d526e9e-65a2-4897-b530-b74b0f51029c/runs/3cfed237-5d06-41ef-be56-e53ef43b32c3/workspace/open/src/hotspot/os/linux/os_linux.cpp:5581), pid=4209, tid=4219
jib > #  assert(rc == 0) failed: pthread_getcpuclockid is expected to return 0 code

Sample stack:

Stack: [0x00007fd1c432e000,0x00007fd1c442e000],  sp=0x00007fd1c442c090,  free space=1016k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x1878277]  VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long)+0x2c7
V  [libjvm.so+0x18790df]  VMError::report_and_die(Thread*, void*, char const*, int, char const*, char const*, __va_list_tag*)+0x2f
V  [libjvm.so+0xb4e320]  report_vm_error(char const*, int, char const*, char const*, ...)+0x100
V  [libjvm.so+0xf87ecf]  JfrTypeManager::write_type_set()+0x19f
V  [libjvm.so+0xf50099]  JfrRecorderService::finalize_current_chunk_on_vm_error()+0x39
V  [libjvm.so+0xf501fe]  JfrRecorderService::rotate(int)+0x7e
V  [libjvm.so+0xefe928]  JfrEmergencyDump::on_vm_shutdown(bool)+0x278
V  [libjvm.so+0x1878535]  VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long)+0x585
V  [libjvm.so+0x18790df]  VMError::report_and_die(Thread*, void*, char const*, int, char const*, char const*, __va_list_tag*)+0x2f
V  [libjvm.so+0xb4e320]  report_vm_error(char const*, int, char const*, char const*, ...)+0x100
V  [libjvm.so+0x14c6360]  os::thread_cpu_time(Thread*, bool)+0x80
V  [libjvm.so+0x17ad026]  Thread::print_on(outputStream*, bool) const+0x46
V  [libjvm.so+0x17b23be]  JavaThread::print_on(outputStream*, bool) const+0x13e
V  [libjvm.so+0x17b65e9]  Threads::print_on(outputStream*, bool, bool, bool, bool)+0x1c9
V  [libjvm.so+0x18bd1e2]  VM_Operation::evaluate()+0x132
V  [libjvm.so+0x18b884e]  VMThread::evaluate_operation(VM_Operation*) [clone .constprop.51]+0x18e
V  [libjvm.so+0x18b9107]  VMThread::loop()+0x4d7
V  [libjvm.so+0x18b9713]  VMThread::run()+0xd3
V  [libjvm.so+0x14c7a60]  thread_native_entry(Thread*)+0x100

VM_Operation (0x00007fd1f2597a00): PrintThreads, mode: safepoint, requested by thread 0x00007fd1e803f000

Comments
URL: http://hg.openjdk.java.net/jdk/jdk11/rev/1835f9fca157 User: dholmes Date: 2018-07-10 00:19:32 +0000
10-07-2018

So far the only API that seems to have a potential problem here is os::thread_cpu_time() which relies on using the native thread ID to call a native function to get the thread CPU time. On Linux only the fast_thread_cpu_time appears to be affected as it calls pthread_getcpuclockid, which is the failure observed here. The slow path using /proc seems to accommodate the invalid ID as the /proc/self/task/<id> will either not exist (returns -1) or else it will read a valid value for the wrong thread. Both AIX and OS X call OS specific functions but upon error (which is hopefully what will happen with an invalid id rather than a crash) return -1. Solaris seems immune as it also reads from /proc (using the lwp id) and will return -1. So the problem does seem limited to linux and only because the fast_thread_cpu_time logic can't propagate an error from thread_cpu_clockid - something we can rectify.
04-07-2018

I experimented to see what happens if a thread dump encounters one of these terminated-but-attached threads and it seems okay: "Thread-1" #22 prio=5 os_prio=0 cpu=2.88ms elapsed=20.77s tid=0x00007f9368001000 nid=0x68bf runnable [0x0000000000000000] java.lang.Thread.State: RUNNABLE JavaThread state: _thread_in_native Thread: 0x00007f9368001000 [0x68bf] State: _at_safepoint _has_called_back 0 _at_poll_safepoint 0 JavaThread state: _thread_in_native This just shows a JNI attache dthread that is currently back outside the VM - _thread_in_native and the stack is zero [0x000...]
04-07-2018

Raising to P3 and re-targeting to 11. If this is a concern I can split the fixing of the tests from making the VM more robust. The tests can be fixed in 11 while the VM changes can be done in 12.
03-07-2018

BTW the trigger here is DumpOnExit as that is when we encounter the terminated threads.
03-07-2018

The problem is that the test creates native threads that attach to the VM but which then terminate without detaching! The tests need to be fixed and we need to make the VM thread management logic more resilient - though there's no real way to determine that we have a "zombie" JavaThread. It will be _thread_in_native, but that isn't sufficient to identify the problem.
03-07-2018

This reproduces locally when run with JFR enabled. The error is ESRCH as suspected, which indicates that somehow a thread on the ThreadsList has terminated since the snapshot was acquired.
03-07-2018

Submitted two tier-5 runs and no failures.
27-06-2018

Most likely this is using a thread that has terminated - which should not be possible of course. Instrumenting a build to find out.
27-06-2018