Bug ID: JDK-8318986 Improve GenericWaitBarrier performance

Type: Enhancement
Component: hotspot
Sub-Component: runtime
Affected Version: 17,21,22

Priority: P3
Status: Resolved
Resolution: Fixed

Submitted: 2023-10-27
Updated: 2024-05-30
Resolved: 2023-11-22

JDK 17	JDK 21	JDK 22
17.0.13Fixed	21.0.4Fixed	22 b26Fixed

While running simple benchmarks for safepoints, I was surprised to see impressively bad performance on my Mac M1 with a simple workload like this:

```
public class LotsRunnable {
   static final int THREAD_COUNT = Integer.getInteger("threads", Runtime.getRuntime().availableProcessors() * 4);
   static Object sink;

   public static void main(String... args) throws Exception {
     for (int c = 0; c < THREAD_COUNT; c++) {
       Thread t = new Thread(() -> {
         while (true) {
            Thread.onSpinWait();
         }
       });
       t.setDaemon(true);
       t.start();
     }

     System.out.println("Started");

     long stop = System.nanoTime() + 10_000_000_000L;
     while (System.nanoTime() < stop) {
       sink = new byte[100_000];
     }
   }
} 
```

If you run with -Xlog:safepoint -Xlog:gc, then you would notice that GC pause times and the actual vm op times are completely out of whack. For example:

```
$ java -Xlog:safepoint -Xlog:gc -Xmx2g LotsRunnable.java
[3.188s][info][gc       ] GC(19) Pause Young (Normal) (G1 Evacuation Pause) 308M->2M(514M) 0.878ms
[3.326s][info][safepoint] Safepoint "G1CollectForAllocation", Time since last: 4963375 ns, Reaching safepoint: 349292 ns, Cleanup: 2000 ns, At safepoint: 138700375 ns, Total: 139051667 ns
```
Note how the pause is <1ms, but the "At safepoint" is whole 138 ms (!!!).

Deeper profiling shows that the problem is on the path where we wake up the threads from the safepoint:
 https://github.com/openjdk/jdk/blob/4f9f1955ab2737880158c57d4891d90e2fd2f5d7/src/hotspot/share/runtime/safepoint.cpp#L494-L495

JDK-8214271 ("Fast primitive to wake many threads") added the WaitBarrier to serve on that path. Before that, in JDK 11, the performance is okay. This makes it a regression between JDK 11 and JDK 17.

WaitBarrier has two implementations: one for Linux that uses futex-es, and another generic one that uses semaphores. For implementation reasons, the generic version has to wait for all threads to leave the barrier before it unblocks from disarm(). This means that all threads that are currently blocked for safepoint need to roll out of wait() before we unblock from safepoint! Which effectively runs into the same problem as TTSP, only worse: all those threads are blocked, need to be woken up, scheduled, etc.

This is not what Linux futex-based implementation does: it just notifies the futex, and leaves.

While unblocked threads start to execute, and so we are not completely blocked waiting for disarm(), this definitely:
 a) trips the safepoint timings;
 b) delays any further actions of VMThread;
 c) delays resuming GC from STS, as `Universe::heap()->safepoint_synchronize_end()` comes after this;
 d) places a limit on the safepoint frequency we can have;
 e) maybe something else I cannot see right away;

I think the intent for the safepoint end code is to be fast to avoid any of these surprises. To that end, I think we can improve GenericWaitBarrier to avoid most of the performance cliff.

WIP: https://github.com/openjdk/jdk/pull/16404

Would you mind approving for JDK 21u then?
12-04-2024
This plan looks fine to me, thanks.
10-04-2024
Yes, i mean to have the change in 21 in July, and in 17 in October.
10-04-2024
That's fine. When you say "one release after 21", does that mean we want to release it in 21.0.4 (July), or something else?
10-04-2024
[17u] [~shade], I would like to postpone this to the October update, to be one release after 21.
10-04-2024
A pull request was submitted for review. URL: https://git.openjdk.org/jdk17u-dev/pull/2041 Date: 2023-12-11 17:39:49 +0000
18-03-2024
[jdk17u-fix-request] Approval Request from Aleksey Shipilëv Clean backport to drastically improve safepoint performance under heavy load, fixing regression between JDK 11 and JDK 17. Applies cleanly. There is no bugtail in mainline since integration in Nov 2023. tier{1,2,3} tests pass. Risk is medium, as it touches the common code path for non-Linux platforms, but it is also frequently exercised, and thus a breakage would manifest often.
18-03-2024
[jdk21u-fix-request] Approval Request from Aleksey Shipilëv Clean backport to drastically improve safepoint performance under heavy load, fixing regression between JDK 11 and JDK 17. Applies cleanly. There is no bugtail in mainline since integration in Nov 2023. All tests pass. Risk is medium, as it touches the common code path for non-Linux platforms, but it is also frequently exercised, and thus a breakage would manifest often.
29-02-2024
A pull request was submitted for review. URL: https://git.openjdk.org/jdk21u-dev/pull/70 Date: 2023-12-19 13:09:39 +0000
01-02-2024
Changeset: 30462f9d Author: Aleksey Shipilev <shade@openjdk.org> Date: 2023-11-22 17:55:17 +0000 URL: https://git.openjdk.org/jdk/commit/30462f9da40d3a7ec18fcf46e2154fabb5fd4753
22-11-2023
A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/16404 Date: 2023-10-27 15:40:11 +0000
01-11-2023