Bug ID: JDK-8132849 Increased stop time in cleanup phase because of single-threaded walk of thread stacks in NMethodSweeper::mark_active

JDK-8132849 : Increased stop time in cleanup phase because of single-threaded walk of thread stacks in NMethodSweeper::mark_active_nmethods()

Type: Bug
Component: hotspot
Sub-Component: compiler
Affected Version: 8u45,9,10,11

Priority: P2
Status: Resolved
Resolution: Fixed
OS: generic
CPU: generic

Submitted: 2015-08-02
Updated: 2021-09-24
Resolved: 2018-09-25

Versions (Unresolved/Resolved/Fixed)

The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.

JDK 11	JDK 12
11.0.8-oracleFixed	12 b13Fixed

Related Reports

Blocks :	JDK-8184751 - Provide thread pool for parallel safepoint cleanup
Relates :	JDK-8211129 - compiler/whitebox/ForceNMethodSweepTest.java fails after JDK-8132849
Relates :	JDK-8211392 - compiler/profiling/spectrapredefineclass_classloaders/Launcher.java times out in JDK12 CI
Relates :	JDK-8184751 - Provide thread pool for parallel safepoint cleanup
Relates :	JDK-8215555 - TieredCompilation C2 threads can excessively block handshakes
Relates :	JDK-8180932 - Parallelize safepoint cleanup
Relates :	JDK-8211141 - jdk/jfr/event/compiler/TestCodeSweeperStats.java fails with Error: Value not atLeast1, field='sweepCount'

Description

One of our services that runs with a few thousand threads recently noticed an increase in safepoint stop times, but not gc times, upon transitioning to JDK 8 from JDK 7.

Further investigation revealed that most of the delta was related to the so-called pre-gc/vm-op "cleanup" phase when various book-keeping activities are performed, and more specifically in the portion that walks java thread stacks single-threaded (!) and updates the hotness counters for the active nmethods. This code appears to
be new to JDK 8 (in jdk 7 one would walk the stacks only during code cache sweeps).

This code should be fixed by either:
(a) doing these updates by walking thread stacks in multiple worker threads in parallel, or best of all:
(b) doing these updates when we walk the thread stacks during GC, and skipping this phase entirely
      for non-GC safepoints (with attendant loss in frequency of this update in low GC frequency
      scenarios).

I also wonder how code cache sweep/eviction quality might be compromised if one were to do these hotness updates at a much lower frequency, as a temporary workaround to the performance problem.

This issue was discussed at some length on the hotspot-compiler-dev@openjdk.java.net mailing list in the thread starting at:

       http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-July/018618.html

where further detail can be obtained.

The same issue exists in JDK 9, but not in JDK 7.

More specifically, as stated in one of the email messages in that thread:

''
I noticed the increase even with Initial and Reserved set to the default of 240 MB, but actual usage much lower (less than a quarter).

Look at this code path. Note that this is invoked at every safepoint (although it says "periodically" in the comment). In the mark_active_nmethods() method, there's a thread iteration in both branches of the if.
....
PS: for comparison, here's data with +TraceSafepointCleanup from JDK 7 (first, where this isn't done)
vs JDK 8 (where this is done) with a program that has a few thousands of threads:



JDK 7:
...
2827.308: [sweeping nmethods, 0.0000020 secs]
2828.679: [sweeping nmethods, 0.0000030 secs]
2829.984: [sweeping nmethods, 0.0000030 secs]
2830.956: [sweeping nmethods, 0.0000030 secs]
...

JDK 8:
...
7368.634: [mark nmethods, 0.0177030 secs]
7369.587: [mark nmethods, 0.0178305 secs]
7370.479: [mark nmethods, 0.0180260 secs]
7371.503: [mark nmethods, 0.0186494 secs]
...
Here's a snapshot of the code cache related counters:

sun.ci.codeCacheCapacity=251658240
sun.ci.codeCacheMaxCapacity=251658240
sun.ci.codeCacheMethodsReclaimedNum=3450
sun.ci.codeCacheSweepsTotalNum=58
sun.ci.codeCacheSweepsTotalTimeMillis=1111
sun.ci.codeCacheUsed=35888704

Notice that the code cache usage is less that 35 MB, for the 240 MB capacity, yet it seems we have had 58 sweeps already, and safepoint cleanup says:

[mark nmethods, 0.0165062 secs]

Even if the two closures do little or no work, the single-threaded walk over deep stacks of a thousand threads will cost time for applications with many threads, and this is now done at each safepoint irrespective of the sweeper activity as far as I can tell. It seems as if this work should be somehow rolled up (via a suitable injection) into GC's thread walks that are done in parallel, rather than doing this in a pre-GC phase (unless I am mssing some reason that the sequencing is necessary, which it doesn't seem to be here).
''


// Scans the stacks of all Java threads and marks activations of not-entrant methods.
// No need to synchronize access, since 'mark_active_nmethods' is always executed at a
// safepoint.
void NMethodSweeper::mark_active_nmethods() {
  assert(SafepointSynchronize::is_at_safepoint(), "must be executed at a safepoint");
  // If we do not want to reclaim not-entrant or zombie methods there is no need
  // to scan stacks
  if (!MethodFlushing) {
    return;
  }

  // Increase time so that we can estimate when to invoke the sweeper again.
  _time_counter++;

  // Check for restart
  assert(CodeCache::find_blob_unsafe(_current) == _current, "Sweeper nmethod cached state invalid");
  if (!sweep_in_progress()) {
    _seen = 0;
    _sweep_fractions_left = NmethodSweepFraction;
    _current = CodeCache::first_nmethod();
    _traversals += 1;
    _total_time_this_sweep = Tickspan();

    if (PrintMethodFlushing) {
      tty->print_cr("### Sweep: stack traversal %d", _traversals);
    }
    Threads::nmethods_do(&mark_activation_closure);

  } else {
    // Only set hotness counter
    Threads::nmethods_do(&set_hotness_closure);
  }

  OrderAccess::storestore();
}

Comments

Fix request (11u) -- Don't downport for now, was backed out from 11.0.8-oracle.
28-04-2020
URL: http://hg.openjdk.java.net/jdk/jdk/rev/eb954a4b6083 User: rkennke Date: 2018-09-25 13:38:56 +0000
25-09-2018
Since JDK11 G1 and ZGC added the use of the gc worker thread pool for the cleanup tasks; JDK-8184751 has been withdrawn because of these changes. [~twisti]: Could you please retest whether this is still an issue?
27-07-2018
Any updates on this one?
19-07-2018
Hi [~rkennke], please note if you are planning to defer this 8132849 from jdk10 as it is blocked by 8184751, request to change the 'Fix Version' here to 'tbd_major'.
18-12-2017
Dean: I suppose we could. But hi-jacking the contexts of those threads seems complicated and risky and not worth, it should be cheap to keep a dedicated small thread pool for this purpose instead. Depending on how the GC works, we may use the existing GC thread pool. In my experiments it didn't seem worth to use around 4-8 threads for this.
01-12-2017
Can we also take advantage of the pool of JavaThreads waiting at safepoints doing nothing?
01-12-2017
Nils: very likely not. The linked bug implemented the "framework" for parallelizing safepoint cleanup, but it asks the GC for parallel worker threads to do that. No GC in OpenJDK currently provides a thread pool. There is JDK-8184751 to implement this missing piece, and I have a patch ( http://cr.openjdk.java.net/~rkennke/8184751/webrev.00/ ) but I was told (in private) by some guys that this is not going to get accepted. I'll be happy to re-open the discussion though. The patch seems simple enough, and it's present and active in Shenandoah since half a year without issues.
01-12-2017
EDIT: The linked bug is already pushed. We should verify if this regression is gone.
01-12-2017
Yes, that's a good question. It may be worth adding a flag to switch between scanning the entire thread stacks and only looking at the top method and perform some benchmarking.
16-11-2017
I wonder if set_hotness_closure really needs to walk thread stacks. Shouldn't "hot" methods tend to be in the top frame(s)? Can we really say that an nmethod N frames deep is still hot?
16-11-2017
Re-assigning to Roman who will work on this as a follow up on JDK-8180932. Thanks!
06-07-2017
Justification for deferral to JDK 10: Performance problem for applications with a large number of threads due to increased safepoint pause time. This was introduced in JDK 8 by JDK-8020151 (see detailed analysis in my last comment). Fixing this is complex and requires a careful analysis to not re-introduce the original regressions. The fix potentially affects multiple components (compiler, runtime and GC) with a high risk of introducing a bug tail. Such nmethod sweeper bugs are hard to detect and fix. We should defer this high-risk fix to JDK 10 and backport the fix when it proved to be stable.
19-08-2016
The code cache sweeper scans thread stacks and performs one of the following two tasks at safepoints in NMethodSweeper::mark_active_nmethods(): 1) Searching for activations of non-entrant methods (see MarkActivationClosure) to avoid converting active nmethods to zombie state 2) Resetting the hotness counter of methods being active on the stack (see SetHotnessClosure) Task 1) is only performed between sweeper cycles and is already included in JDK 7. The second task was introduced in JDK 8 by JDK-8020151 [1] which added a method hotness counter (nmethod::_hotness_counter) that is decremented if the nmethod is not found on the stack during scanning. This was necessary to fix severe performance regressions due to inefficient sweeping when the code cache fills up. See old discussion on hotspot-compiler-dev [2]: "There can be large performance regressions when the code cache fills up. There are several reasons for the performance regression: First (1), when the code cache is full and methods are speculatively disconnected, the oldest methods (based on compilation ID) are scheduled for flushing. This can result in flushing hot methods. Second (2), when compilation is disabled due to a full code cache, the number of sweeps can go down. A lower number of sweep operations results in slower method flushing." Removing task 2) will re-introduce these regressions but, as mentioned in the bug description, there are some possible solutions that would reduce the impact on applications with a large number of threads: (a) parallelize stack scanning (b) move stack scanning into GC (merge with existing stack scanning) (c) don't scan stacks of all the threads at a single safepoint but distribute work over multiple safepoints (d) only use hotness counters if the code cache fills up All of these need careful investigation to not re-introduce the regressions fixed by JDK-8020151. Especially, (b) may not scan the stacks often enough in low GC frequency scenarios causing hot nmethods to be flushed. [1] http://hg.openjdk.java.net/hsx/hsx25/hotspot/rev/510fbd28919c [2] http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2013-September/011588.html
19-08-2016
ILW = performance regression; always; none = MHH = P2
15-06-2016