JDK-8267502 : JDK-8246677 caused 16x performance regression in SynchronousQueue
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.util.concurrent
  • Affected Version: 17,21,22
  • Priority: P3
  • Status: Closed
  • Resolution: Fixed
  • OS: linux
  • Submitted: 2021-05-20
  • Updated: 2024-05-17
  • Resolved: 2023-07-22
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 21 JDK 22
21.0.2Fixed 22 b08Fixed
Related Reports
Relates :  
Relates :  
Description
I was benchmarking SynchronousQueue from the OpenJDK Tip. I found its performance is 16 times lower than the performance of SynchronousQueue from OpenJDK 11. Comparing SynchronousQueue implementations I found the OpenJDK Tip SynchronousQueue is changed by JDK-8246677. Reverting SynchronousQueue to the version before JDK-8246677 fixed the regression.

I have the regression on Amazon Linux 2 AArch64 and on Amazon Linux 2 x86_64.
To reproduce:

$ build/linux-x86_64-server-release/images/jdk/bin/javac SQBench.java
$ build/linux-x86_64-server-release/images/jdk/bin/java SQBench
500000
500000
500000
500000
500000
# Warmup done. Restarting threads.
# duration = 1013243934
# duration (ns) per round trip op = 506.621967
# round trip ops/sec = 1973858

$ cd src/java.base/share/classes/java/util/concurrent
$ wget https://raw.githubusercontent.com/openjdk/jdk/5cfa8c94d6765b93f910fc01cd1ac2a56b16d78a/src/java.base/share/classes/java/util/concurrent/SynchronousQueue.java
$ cd -
$ make images
$ build/linux-x86_64-server-release/images/jdk/bin/java  SQBench
500000
500000
500000
500000
500000
# Warmup done. Restarting threads.
# duration = 15841810490
# duration (ns) per round trip op = 7920.905245
# round trip ops/sec = 126248
$ uname -o -r
5.4.109-57.183.amzn2int.x86_64 GNU/Linux

----------- SQBench.java
import java.util.Random;
import java.util.concurrent.*;

public class SQBench {
  public static final long WARMUP_PASS_COUNT = 5;
  public static final long WARMUP_ITERATIONS = 500L * 1000L;
  public static final long ITERATIONS = 2 * 1000L * 1000L;

  static class Producer extends java.lang.Thread {
    final long iterations;
    Random rand;
    BlockingQueue<Integer> queue;

    Producer(final long terminatingIterationCount, BlockingQueue<Integer> queue) {
      this.iterations = terminatingIterationCount;
      this.queue = queue;
      rand = new Random(100);
    }

    public void run() {
      try {
        for (long i = 0; i < iterations; i++) {
          queue.put(rand.nextInt(10000));
        }
        queue.put(-1);
      } catch (InterruptedException ie) {
      }
    }
  }

  static class Consumer extends java.lang.Thread {
    public int bits = 0;
    public int count = 0;
    BlockingQueue<Integer> queue;

    Consumer(BlockingQueue<Integer> queue) {
      this.queue = queue;
    }

    public void run() {
      while (true) {
        try {
          Integer number = queue.take();
          if (number == -1) return;
          ++count;
          bits += Integer.bitCount(number.intValue());
        } catch (InterruptedException ie) {
        }
      }
    }
  }

  public static void main(final String[] args) {
    BlockingQueue<Integer> queue = new SynchronousQueue<>();
    try {
      Producer producer;
      Consumer consumer;

      for (int i = 0; i < WARMUP_PASS_COUNT; i++) {
        consumer = new Consumer(queue);
        producer = new Producer(WARMUP_ITERATIONS, queue);
        producer.start();
        consumer.start();
        producer.join();
        consumer.join();
        System.out.println(consumer.count);
      }

      java.lang.Thread.sleep(1000); // Let things (like JIT compilations) settle down.
      System.out.println("# Warmup done. Restarting threads.");

      consumer = new Consumer(queue);
      producer = new Producer(ITERATIONS, queue);

      long start = System.nanoTime();
      producer.start();
      consumer.start();
      producer.join();
      consumer.join();
      long duration = System.nanoTime() - start;

      System.out.println("# duration = " + duration);
      System.out.println("# duration (ns) per round trip op = " + duration / (ITERATIONS * 1.0));
      System.out.println(
          "# round trip ops/sec = " + (ITERATIONS * 1000L * 1000L * 1000L) / duration);
    } catch (InterruptedException ex) {
      System.err.println("SpinWaitTest interrupted.");
    }
  }
}


Comments
A pull request was submitted for review. URL: https://git.openjdk.org/jdk21u/pull/168 Date: 2023-09-15 18:39:02 +0000
15-09-2023

Changeset: 8d1ab570 Author: Doug Lea <dl@openjdk.org> Date: 2023-07-22 10:41:42 +0000 URL: https://git.openjdk.org/jdk/commit/8d1ab57065c7ebcc650b5fb4ae098f8b0a35f112
22-07-2023

A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/14317 Date: 2023-06-05 18:52:00 +0000
20-07-2023

This update changed spin versus context-switch behavior, which improves performance in some use cases (including most within Loom), but worsens in others (like this benchmark). It is overall a net improvement. As loom and related JVM updates progress, there will be opportunities to find a better balance. But for now this is a known performance side effect without a simple solution.
24-05-2021

Assigned to the author of PR
20-05-2021