ADDITIONAL SYSTEM INFORMATION :
$ java -version
openjdk version "11.0.21" 2023-10-17
OpenJDK Runtime Environment (build 11.0.21+9-post-Ubuntu-0ubuntu122.04)
OpenJDK 64-Bit Server VM (build 11.0.21+9-post-Ubuntu-0ubuntu122.04, mixed mode)
$ uname -a
Linux jitsi-net-jvb-74-72-234.chaos.jitsi.net 5.15.0-1049-oracle #55-Ubuntu SMP Mon Nov 20 19:53:49 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux
$ cat /etc/issue
Ubuntu 22.04.3 LTS \n \l
A DESCRIPTION OF THE PROBLEM :
The old (PlainDatagramSocketImpl) implementation of DatagramSockets allowed multiple threads to send to the same socket simultaneously. On Linux, at least, the kernel-level socket implementation also supports this, allowing this to be a performance win.
The new (NIO-based) implementation, by contrast, synchronizes sending on DatagramSockets, allowing only one thread at a time to send a datagram, blocking other threads.
This causes a significant performance regression in our software (the Jitsi Videobridge).
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Execute the attached Java code on a multi-core Linux machine.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Running with more threads gives a time advantage in all cases, up to the number of cores on the machine.
ACTUAL -
Running with more threads gives a time advantage on Java 11, and on Java 17 iff -Djdk.net.usePlainDatagramSocketImpl=true is set, but not on Java 21, or Java 17 without that flag.
$ /usr/lib/jvm/java-11-openjdk-arm64/bin/java -version
openjdk version "11.0.21" 2023-10-17
OpenJDK Runtime Environment (build 11.0.21+9-post-Ubuntu-0ubuntu122.04)
OpenJDK 64-Bit Server VM (build 11.0.21+9-post-Ubuntu-0ubuntu122.04, mixed mode)
$ /usr/lib/jvm/java-11-openjdk-arm64/bin/java -cp target/classes/ org.example.Main
Sending 1048576 packets on 1 threads took PT12.944786S
Sending 1048576 packets on 2 threads took PT7.062286S
Sending 1048576 packets on 4 threads took PT3.64616S
Sending 1048576 packets on 8 threads took PT1.950277S
Sending 1048576 packets on 16 threads took PT1.870244S
Sending 1048576 packets on 32 threads took PT1.910114S
$ /usr/lib/jvm/java-17-openjdk-arm64/bin/java -version
openjdk version "17.0.10" 2024-01-16
OpenJDK Runtime Environment (build 17.0.10+7-Ubuntu-122.04.1)
OpenJDK 64-Bit Server VM (build 17.0.10+7-Ubuntu-122.04.1, mixed mode, sharing)
$ /usr/lib/jvm/java-17-openjdk-arm64/bin/java -cp target/classes/ org.example.Main
Sending 1048576 packets on 1 threads took PT8.240958667S
Sending 1048576 packets on 2 threads took PT16.665759865S
Sending 1048576 packets on 4 threads took PT17.100608791S
Sending 1048576 packets on 8 threads took PT17.292098818S
Sending 1048576 packets on 16 threads took PT16.406835543S
Sending 1048576 packets on 32 threads took PT16.789614874S
$ /usr/lib/jvm/java-17-openjdk-arm64/bin/java -Djdk.net.usePlainDatagramSocketImpl=true -cp target/classes/ org.example.Main
Sending 1048576 packets on 1 threads took PT11.921313104S
Sending 1048576 packets on 2 threads took PT6.696453422S
Sending 1048576 packets on 4 threads took PT3.482805109S
Sending 1048576 packets on 8 threads took PT1.831249134S
Sending 1048576 packets on 16 threads took PT1.767044991S
Sending 1048576 packets on 32 threads took PT1.769080475S
$ /usr/lib/jvm/java-21-openjdk-arm64/bin/java -version
openjdk version "21.0.2" 2024-01-16
OpenJDK Runtime Environment (build 21.0.2+13-Ubuntu-122.04.1)
OpenJDK 64-Bit Server VM (build 21.0.2+13-Ubuntu-122.04.1, mixed mode, sharing)
$ /usr/lib/jvm/java-21-openjdk-arm64/bin/java -Djdk.net.usePlainDatagramSocketImpl=true -cp target/classes/ org.example.Main
Sending 1048576 packets on 1 threads took PT7.969858442S
Sending 1048576 packets on 2 threads took PT16.498898604S
Sending 1048576 packets on 4 threads took PT16.886177625S
Sending 1048576 packets on 8 threads took PT17.045601581S
Sending 1048576 packets on 16 threads took PT16.923914076S
Sending 1048576 packets on 32 threads took PT16.771734502S
$ nproc
8
$ uname -a
Linux jitsi-net-jvb-74-72-234.chaos.jitsi.net 5.15.0-1049-oracle #55-Ubuntu SMP Mon Nov 20 19:53:49 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux
$ cat /etc/issue
Ubuntu 22.04.3 LTS \n \l
---------- BEGIN SOURCE ----------
package org.example;
public class Main
{
private static final int NUM_PACKETS = 0x100000;
private static final int MAX_THREADS = 32;
public static void main(String[] args)
{
for (int i = 1; i <= MAX_THREADS; i *= 2) {
ParallelSocketSender sender = new ParallelSocketSender(NUM_PACKETS, i);
sender.run();
}
}
}
package org.example;
import java.net.*;
import java.time.*;
import java.util.concurrent.*;
public class ParallelSocketSender
{
private final int numPackets;
private final int numThreads;
private final static int PACKET_SIZE = 1500;
private final Thread[] threads;
private final DatagramSocket socket;
private final CountDownLatch startSignal;
private final CountDownLatch doneSignal;
private final byte[] buf = new byte[PACKET_SIZE];
private final InetSocketAddress dest = new InetSocketAddress("localhost", 9);
private static final Clock clock = Clock.systemUTC();
public ParallelSocketSender(int numPackets, int numThreads)
{
this.numPackets = numPackets;
this.numThreads = numThreads;
startSignal = new CountDownLatch(1);
doneSignal = new CountDownLatch(numThreads);
try
{
socket = new DatagramSocket();
} catch (Exception e)
{
System.err.println("Error creating datagram socket: " + e.getMessage());
throw new RuntimeException(e);
}
threads = new Thread[numThreads];
}
public void run()
{
try {
for (int i = 0; i < numThreads; i++)
{
SocketSender sender = new SocketSender(i, numPackets / numThreads);
threads[i] = new Thread(sender);
threads[i].start();
}
Instant startTime = clock.instant();
startSignal.countDown();
doneSignal.await();
Instant endTime = clock.instant();
for (int i = 0; i < numThreads; i++)
{
threads[i].join();
}
Duration runTime = Duration.between(startTime, endTime);
System.out.println("Sending " + numPackets + " packets on " +
numThreads + " threads took " + runTime);
} catch (Exception e) {
System.err.println("Error running test for " + numThreads + " threads: " + e.getMessage());
}
}
private class SocketSender implements Runnable {
int threadNum;
int numThreadPackets;
private final DatagramPacket packet = new DatagramPacket(buf, PACKET_SIZE, dest);
private SocketSender(int threadNum, int numThreadPackets)
{
this.threadNum = threadNum;
this.numThreadPackets = numThreadPackets;
}
@Override
public void run()
{
int i = 0;
try
{
startSignal.await();
for (i = 0; i < numThreadPackets; i++)
{
socket.send(packet);
}
} catch (Exception e) {
System.err.println("Error sending packet " + i + " on thread " +
threadNum + ":" + e.getMessage());
}
doneSignal.countDown();
}
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
As mentioned, on Java 17 -Djdk.net.usePlainDatagramSocketImpl=true works around the bug. No workaround has been found for Java 21.