JDK-8021335 : Missing synchronization when reading counters for live threads and peak thread count
  • Type: Bug
  • Component: hotspot
  • Sub-Component: svc
  • Affected Version: 5.0,6,7,8,11,12
  • Priority: P5
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2013-07-25
  • Updated: 2020-02-13
  • Resolved: 2018-10-26
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11 JDK 12
11.0.3Fixed 12 b18Fixed
Related Reports
Duplicate :  
Relates :  
Relates :  
Relates :  
Description
While analysing the reasons for the intermittent failures of java/lang/management/ThreadMXBean/ResetPeakThreadCount.java test it turned out that there are, indeed, places where the counters for live threads and peak thread count are being read without any synchronization in place - at the same time all the writes are synchronized by Threads_lock. 

See the discussion thread (http://mail.openjdk.java.net/pipermail/serviceability-dev/2013-July/010709.html) for more details.
Comments
The Graal failures are probably JDK-8238940.
13-02-2020

We reenabled this test in our nightly testing now that it has been backported to 11.0.3 and it passes with C2 but still fails with Graal on all configurations (this is 11.0.5): java.lang.RuntimeException: ***** Unexpected thread count: previous = 11 current = 25 delta = 8***** at ResetPeakThreadCount.checkThreadCount(ResetPeakThreadCount.java:228) at ResetPeakThreadCount.main(ResetPeakThreadCount.java:83) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:127) at java.base/java.lang.Thread.run(Thread.java:834)
30-01-2020

Thanks. The fact that this bug affects all versions since 5 is embarrassing.
05-11-2018

Fix Request This is a fix to an old and intermittent "mainttainer pain" issue. The fix removes a race condition, which is covered by the ResetPeakThreadCount test in the description. Risk should be low because the values of these information counters are not critical to JVM correctness. Patch applies cleanly.
05-11-2018

> Any plans to backport this to 11? Seems like a good idea. The patch applies cleanly to jdk11u and the test hasn't failed again.
05-11-2018

Any plans to backport this to 11?
05-11-2018

Here are the logs for my jdk-12+16 Solaris-X64 sightings: $ unzip -l jdk-12+16_solx64.8021335.zip Archive: jdk-12+16_solx64.8021335.zip Length Date Time Name --------- ---------- ----- ---- 17312 10-20-2018 00:29 jdk-12+16_1/ResetPeakThreadCount.jtr.fastdebug 17251 10-20-2018 01:57 jdk-12+16_1/ResetPeakThreadCount.jtr.release 16589 10-19-2018 20:58 jdk-12+16_1/ResetPeakThreadCount.jtr.slowdebug 16597 10-20-2018 18:24 jdk-12+16_2/ResetPeakThreadCount.jtr.slowdebug --------- ------- 67749 4 files
22-10-2018

Need a better name for current_thread_exiting_helper - why not just ThreadService::decrement_thread_counts() ? Unclear if the workarounds in the test were all related to this issue - but I guess we will just see that over time. Probably ready for review. Thanks.
16-10-2018

I think this version solves the problem with current_thread_exiting and the asserts: http://cr.openjdk.java.net/~dlong/8021335/webrev.2/
16-10-2018

JavaThread::cleanup_failed_attach_current_thread also calls Threads::remove() without calling ThreadService::current_thread_exiting() first, so it triggers the asserts. I guess I need to make the call to ThreadService::current_thread_exiting() optional somehow.
16-10-2018

It's not completely arbitrary. ThreadService::current_thread_exiting is the hook that is required to make the count consistent after Thread.join, so it makes some sense to pair current_thread_exiting() with ensure_join(). However, I could remove the new asserts or disable them during destroy_vm.
16-10-2018

> For every ThreadService::add_thread call, there should be a call to current_thread_exiting and remove_thread. I If terminating normally sure. For the destroy_vm case the whole VM is going away so the thread counts are of secondary concern - and we skip a bunch of normal termination logic that might itself result in an inconsistency between the observed thread state and the thread counts. current_thread_exiting() seems harmless enough to call but it seems an arbitrary change to make.
16-10-2018

If the atomic counts are incremented but not decremented, (or decremented without first being incremented), then the live counts will be wrong. I don't think the reported bugs were specific enough to identify the broken code responsible, but my new asserts will fail if a ThreadService::current_thread_exiting call is missed. In the old implementation, missing a current_thread_exiting call could result in the live count being too big and _exiting_threads_count going negative. For every ThreadService::add_thread call, there should be a call to current_thread_exiting and remove_thread.
16-10-2018

The preceding comment has changed so ... > The reason I moved ThreadService::current_thread_exiting was so that it would be called even if "destroy_vm" is true. Why? There's no reported bug in this area so why make a potentially destabilising change? It seems ThreadService should see all JavaThreads unless is_hidden_from_external_view() is true.
16-10-2018

I moved the call to ThreadService::current_thread_exiting that was inside JavaThread::exit, and ThreadService::add_thread takes a JavaThread* as its argument, so ThreadService has no way to count non-Java threads, including non-Java compiler threads. The reason I moved ThreadService::current_thread_exiting was so that it would be called even if "destroy_vm" is true.
16-10-2018

There are definitely bugs with the exit count handling - so much so that I'm surprised this really works at all. I can only assume most hidden threads don't actually terminate. It's hard to see exactly how all the counts are managed. I'm unclear on how compiler threads are handled now that you moved ThreadService::current_thread_exiting. In general I need a better understanding of exactly which threads are intended to be tracked via ThreadService ?
16-10-2018

Here's my suggested fix: http://cr.openjdk.java.net/~dlong/8021335/webrev/
16-10-2018

I don't think grabbing the threads lock will help here. These variables are incremented without the threads lock: volatile int ThreadService::_exiting_threads_count = 0; volatile int ThreadService::_exiting_daemon_threads_count = 0; Besides the race here: static jlong get_live_thread_count() { return _live_threads_count->get_value() - _exiting_threads_count; } 93 static jlong get_daemon_thread_count() { return _daemon_threads_count->get_value() - _exiting_daemon_threads_count; } these variables are incorrectly incremented/decremented in the case of thread->is_hidden_from_external_view() and thread->is_jvmti_agent_thread(). It appears that the reason we need to decrement the counts before Thread::exit() returns is because Thread.join() can return as soon as we reach ensure_join(). I suggest replacing these atomic "exiting" counts with atomic "live" counts instead.
15-10-2018

Here are the logs for my jdk-12+11 Linux-X64 sightings: $ unzip -l jdk-12+11_linux.8021335.zip Archive: jdk-12+11_linux.8021335.zip Length Date Time Name --------- ---------- ----- ---- 13271 2018-09-14 01:42 jdk-12+11_1/ResetPeakThreadCount.jtr.release --------- ------- 13271 1 file
17-09-2018

True. This test just failed last night for us with 11.
28-08-2018

It has been a long time between reported failures linked to this bug. It's unclear whether a new issue has arisen or whether the connection here was just overlooked. It's a real shame this has languished for 5 years!
11-07-2018