JDK-8218483 : Crash in "assert(_daemon_threads_count->get_value() > daemon_count) failed: thread count mismatch 5 : 5"
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 12
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2019-02-06
  • Updated: 2021-09-16
  • Resolved: 2019-04-03
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11 JDK 13
11.0.14-oracleFixed 13 b16Fixed
Related Reports
Relates :  
Description
Latest jdk12 bits crashed with "Internal Error" in src/hotspot/share/services/threadService.cpp:177

#
#  Internal Error (workspace/open/src/hotspot/share/services/threadService.cpp:167), pid=1616, tid=1695
#  assert(_daemon_threads_count->get_value() > daemon_count) failed: thread count mismatch 5 : 5
#
# JRE version: Java(TM) SE Runtime Environment (12.0) (fastdebug build 12-internal+0-2019-02-05-2047083.ekaterina.pavlova.jdk.jdk12)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 12-internal+0-2019-02-05-2047083.ekaterina.pavlova.jdk.jdk12, mixed mode, aot, sharing, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0x17ca632]  ThreadService::remove_thread(JavaThread*, bool)+0x482
#

---------------  T H R E A D  ---------------

Current thread (0x00007fd5e86b1800):  JavaThread "Thread-1" daemon [_thread_in_vm, id=1695, stack(0x00007fd5a8078000,0x00007fd5a8179000)]

Stack: [0x00007fd5a8078000,0x00007fd5a8179000],  sp=0x00007fd5a8177ae0,  free space=1022k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x17ca632]  ThreadService::remove_thread(JavaThread*, bool)+0x482
V  [libjvm.so+0x17b68d3]  Threads::remove(JavaThread*)+0xf3
V  [libjvm.so+0x17be128]  JavaThread::exit(bool, JavaThread::ExitType)+0x7d8
V  [libjvm.so+0x17be6a9]  JavaThread::thread_main_inner()+0x149
V  [libjvm.so+0x17beafc]  JavaThread::run()+0x1cc
V  [libjvm.so+0x17ba715]  Thread::call_run()+0x75
V  [libjvm.so+0x14a41e6]  thread_native_entry(Thread*)+0x106

Comments
Fix request (11u): Requesting backport of this race condition fix. The assertion crash happens in our 11u test infra once in a while, especially with TCK test api/java_lang/ref/Cleaner/index.html and debug VM. Testing: GHA and SAP internal test system. Patch did not apply cleanly, backport approved by Thomas Stuefe.
12-09-2021

URL: http://hg.openjdk.java.net/jdk/jdk/rev/b788c494aa46 User: dholmes Date: 2019-04-03 22:04:55 +0000
03-04-2019

Tested by adding an artificial delay before Threads::remove(this) and a simple test case: Thread t = new Thread(); t.start(); t.join(); t.setDaemon(true); which triggered the assertion before the fix and not after.
01-04-2019

The test is extremely simple and exposes a race condition. The test does: Thread t = new Thread(); t.start(); Cleaner.create(r -> t); which initially creates a non-daemon thread and constructs a ThreadFactory where the newThread method will return that started thread. Note that this thread may have terminated before we even get to call Cleaner.create(). The ThreadFactory is used as follows: Thread thread = threadFactory.newThread(this); thread.setDaemon(true); thread.start(); As t was already started we will get IllegalThreadStateException from thread.start() - which is what this test is checking for. But lets look at setDaemon: if (isAlive()) { throw new IllegalThreadStateException(); } daemon = on; so the IllegalThreadStateException may even come from here. But suppose the thread has in fact already terminated ie. it has reached the point in its termination sequence where isAlive returns false - we then proceed to set the daemon field to "on" (true in this case). So now inside the VM as the thread terminates we've executed: JavaThread::exit(...) { ... ensure_join(this); // <= thread is no longer seen as alive ... Threads::remove(this); // <= this is where we decrement the thread counts so if setDaemon is called between ensure_join() and Threads::remove() this non-daemon thread will appear to be a daemon thread! Hence the counts gets out of sync and the assertion fails. We need to know the daemon state while the thread was still alive, and not read it from the threadObj. Which we can also do simply by capturing the daemon state at the start of JavaThread::exit and pass it as a parameter to Threads::remove. But also note there is a bug in java.lang.Thread.setDaemon - which I will file seperately - but it may not be considered worth fixing.
29-03-2019

We now have two cases in ~7 weeks. Both the same testcase and both with Graal. Correction: 3 cases. The first (that we still have records of) was actually: api/java_lang/ref/Cleaner/CreateWithThreadFactory.html Dec 22nd 2018, 7:07:40 am and it was not using Graal.
28-03-2019

[~epavlova] can you please provide additional information. Thanks.
21-02-2019

Current thread (0x00007fd5e86b1800): JavaThread "Thread-1" daemon [_thread_in_vm, id=1695, stack(0x00007fd5a8078000,0x00007fd5a8179000)] Stack: [0x00007fd5a8078000,0x00007fd5a8179000], sp=0x00007fd5a8177ae0, free space=1022k Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x17ca632] ThreadService::remove_thread(JavaThread*, bool)+0x482 V [libjvm.so+0x17b68d3] Threads::remove(JavaThread*)+0xf3 V [libjvm.so+0x17be128] JavaThread::exit(bool, JavaThread::ExitType)+0x7d8 V [libjvm.so+0x17be6a9] JavaThread::thread_main_inner()+0x149 V [libjvm.so+0x17beafc] JavaThread::run()+0x1cc V [libjvm.so+0x17ba715] Thread::call_run()+0x75 V [libjvm.so+0x14a41e6] thread_native_entry(Thread*)+0x106 Failing code: // Counts are incremented at the same time, but atomic counts are // decremented earlier than perf counts. assert(_live_threads_count->get_value() > count, "thread count mismatch %d : %d", (int)_live_threads_count->get_value(), count); _live_threads_count->dec(1); if (daemon) { assert(_daemon_threads_count->get_value() > daemon_count, "thread count mismatch %d : %d", (int)_daemon_threads_count->get_value(), daemon_count); _daemon_threads_count->dec(1); } This seems more likely related to JDK-8021335: "Missing synchronization when reading counters for live threads and peak thread count". That said we worked through that logic very carefully. The atomic counts and perf counters are always incremented together under the Threads_lock. The atomic counters are decremented first, normally without the Threads_lock, hence the perf counter should always be > atomic counter at the point of the assert (where the Threads_lock is again held). Does this ever fail without AOT?
13-02-2019

I've recheck the logic and it seems impossible for the assertion to fire.
13-02-2019

[~epavlova] is it the case that you have only seen this failure once? Otherwise can you point me to where the failures were seen please.
13-02-2019

As I am mostly testing AOT I didn't see this failure without AOT. If this is more related to JDK-8021335 then I would say the issue is really hard to reproduce as I did a lot of AOT testing since JDK-8021335 was fixed. I tried to reproduce it with and without AOT and no luck.
06-02-2019

While the crash occurred during execution of api/java_lang/ref/Cleaner/CreateWithThreadFactory.html with AOTed java.base module and AOTed jck classes I am pretty much sure this failure is not JCK specific. First it seems very intermittent, second I was not able to reproduce it on my Linux box. I am wondering if this failure could be related to recently fixed JDK-8213231. [~ehelin] could you please have a look?
06-02-2019