JDK-8235674 : JEP 373: Reimplement the Legacy DatagramSocket API
  • Type: JEP
  • Component: core-libs
  • Sub-Component: java.net
  • Priority: P3
  • Status: Closed
  • Resolution: Delivered
  • Fix Versions: 15
  • Submitted: 2019-12-10
  • Updated: 2021-08-16
  • Resolved: 2020-08-04
Related Reports
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Description
Summary
-------

Replace the underlying implementations of the `java.net.DatagramSocket` and `java.net.MulticastSocket` APIs with simpler and more modern implementations that are easy to maintain and debug. The new implementations will be easy to adapt to work with virtual threads, currently being explored in [Project Loom](https://openjdk.java.net/projects/loom).
This is a follow-on to [JEP 353][1], which already reimplemented the legacy Socket API.

Motivation
----------

The code base of the `java.net.DatagramSocket` and `java.net.MulticastSocket` APIs, and their underlying implementations, is old and brittle:
 
 - The implementations date back to JDK 1.0. They are a mix of legacy Java and C code that is difficult to maintain and debug.

 - The implementation of `MulticastSocket` is particularly problematic since it dates back to a time when IPv6 was still under development. Much of the underlying native implementation tries to reconcile IPv4 and IPv6 in ways that are difficult to maintain.

 - The implementation also has several concurrency issues (_e.g._, with asynchronous close) that require an overhaul to address properly.

In addition, in the context of virtual threads that park rather than block underlying kernel threads in system calls, the current implementation is not fit for purpose. As datagram-based transports gain traction again (_e.g._ [QUIC](https://en.wikipedia.org/wiki/QUIC)), a simpler and more maintainable implementation is needed.

Description
-----------

Currently, the `DatagramSocket` and `MulticastSocket` classes delegate all socket calls to a `java.net.DatagramSocketImpl` implementation, for which different platform-specific concrete implementations exist: `PlainDatagramSocketImpl` on Unix platforms, and `TwoStackPlainDatagramSocketImpl` and `DualPlainDatagramSocketImpl` on Windows platforms. The abstract `DatagramSocketImpl` class, which dates back to JDK 1.1, is very under-specified and contains several obsolete methods that are an impediment to providing an implementation of this class based on NIO (see alternatives, discussed below).

Rather than provide a drop-in replacement for implementations of `DatagramSocketImpl`, similar to what was done in [JEP 353][1] for `SocketImpl`, this JEP  proposes to make `DatagramSocket` internally wrap another instance of `DatagramSocket` to which it delegates all calls directly. The wrapped instance is either a socket adapter created from a NIO `DatagramChannel::socket` (the new implementation), or else a clone of the legacy `DatagramSocket` class which then delegates to the legacy `DatagramSocketImpl` implementation (for the purpose of implementing a backward compatibility switch). If a `DatagramSocketImplFactory` is installed by an application, the old legacy implementation is selected. Otherwise, the new implementation is selected and used by default.

To reduce the risk of switching the implementation after more than twenty years, the legacy  implementation will not be removed. A JDK-specific system property, `jdk.net.usePlainDatagramSocketImpl`, is introduced to configure the JDK to use the legacy implementation (see risks and assumptions, below). If set with no value or set to the value `”true"` at startup, the legacy implementation is used. Otherwise, the new (NIO-based) implementation is used. In some future release we will remove the legacy implementation and the system property. At some point we may also deprecate and remove `DatagramSocketImpl` and `DatagramSocketImplFactory`.

> <a href="https://bugs.openjdk.java.net/secure/attachment/87038/ReimplementDS.png"><img src="https://bugs.openjdk.java.net/secure/attachment/87038/ReimplementDS.png" width="500"/></a>

The new implementation is enabled by default. It provides non-interruptible behavior for datagram and multicast sockets by directly using the platform-default implementation of the selector provider (`sun.nio.ch.SelectorProviderImpl` and `sun.nio.ch.DatagramChannelImpl`).  Installing a custom selector provider will thus have no effect on `DatagramSocket` and  `MulticastSocket`.


Alternatives
------------

We investigated, prototyped, and discarded two alternative approaches.

### Alternative 1

Create an implementation of `DatagramSocketImpl` that delegates all its calls to a wrapped `DatagramChannel` and `sun.nio.ch.DatagramSocketAdaptor`. Upgrade `sun.nio.ch.DatagramSocketAdaptor` to extend `java.net.MulticastSocket`.

This approach showed that it would be relatively easy to provide an implementation of `DatagramSocketImpl` based on `DatagramChannel`. Tests were passing, but it also highlighted several limitations:

 - The security checks were performed twice, once in `DatagramSocket`, once more in `DatagramChannel` (or its socket adapter). There were ways to avoid the double security checks but they would have been cumbersome.

 - The connect emulation implemented at the `DatagramSocket` level also got in the way, since we didn't want to perform this emulation with the NIO-based implementation.

 - As with the solution proposed above, the main advantage of this alternative compared to the second alternative below was that no new native code was necessary, since every call could be delegated to `DatagramChannel`.

 - While evaluating this alternative, it quickly became apparent that overriding methods at the `DatagramSocket` level, rather than at the `DatagramSocketImpl` level, would be simpler and more straightforward, which led to the solution proposed in this JEP.

### Alternative 2

Create an implementation of `DatagramSocketImpl` in the `sun.nio.ch` package that invokes low-level `sun.nio.ch.Net` primitives. This allowed the implementation to directly access lower-level NIO primitives instead of relying on `DatagramChannel`. This was somewhat analogous to what was done for reimplementing `Socket` and `ServerSocket` in  [JEP 353][1].

 - The main advantage of this alternative against the first alternative was that it avoided the double security checks, since the implementation could access lower level NIO primitives directly.

 - However, the new implementation had to replicate the non-trivial state and lock management that `DatagramChannel` already implements.

 - It also required the addition of new native code to match the `DatagramSocketImpl` interface.

 - The solution proposed in this JEP thus appeared much simpler, less risky, and easier to maintain.

Testing
-------

The existing tests in the `jdk/jdk` repository will be used to test the new implementation. To ensure a smooth transition, the new implementation should pass the tier2 (`jdk_net` and `jdk_nio`) regression-test suite and the JCK for `java_net/api`. The `jdk_net` test group has accumulated many tests for networking corner case scenarios over the years. Some of the tests in this test group will be modified to run twice, the second time with `-Djdk.net.usePlainDatagramSocketImpl` to ensure that the old implementation does not decay during the time that the JDK includes both implementations. New tests will be added as required, to expand code coverage and increase confidence in the new implementation.

Every effort will be made to publicize the proposal and encourage developers that have code using `DatagramSocket` and `MulticastSocket` to test their code with the early-access builds that are published on [jdk.java.net](https://jdk.java.net).

The microbenchmarks in the `jdk/jdk` repository include benchmarks for `DatagramChannel`. Similar benchmarks for datagram socket will be created if missing, or updated if they already exist, in a way that makes it easy to compare the old and new implementations.


Risks and Assumptions
---------------------

The primary risk of this proposal is that there is existing code that depends upon unspecified behavior in corner cases where the old and new implementations behave differently. To minimize this risk, some preparatory work to clarify the specifications of `DatagramSocket` and `MulticastSocket`, and to minimize the behavioral differences between these classes and the `DatagramChannel::socket` adapter, has already been done in JDK 14 and JDK 15. Some small differences, listed below, might however persist. These differences might be observable in corner-case situations but should be transparent to the vast majority of API users.  The differences we have identified so far are listed here; all but the first two can be mitigated by running with either `-Djdk.net.usePlainDatagramSocketImpl`  or `-Djdk.net.usePlainDatagramSocketImpl=true`.

 - Custom APIs or subclasses of `DatagramSocket` and `MulticastSocket` that synchronize on instances of these classes may need revisiting, since `DatagramSocket` and `MulticastSocket` no longer synchronize on `this`. Any locking or synchronization is left up to the delegate, which is not accessible outside of the `java.net` package, and is free to use any mechanism it sees fit.

 - Similarly, custom classes that extend `DatagramSocket` or `MulticastSocket` and override methods such as `bind` and `setReuseAddress` won’t have the overridden methods invoked during construction. Anyone doing this is depending on undocumented and implementation-specific behavior.

 - The new implementation uses the native `connect` method  on all platforms. The legacy implementation still uses an emulation on macOS. This means, in particular, that port-unreachable conditions cannot be detected with the old implementation while they should be detected with the new one. Also, the old implementation will fall back to using the emulation if the native connect fails; the new implementation will report an error instead. In addition, the new implementation will flush the receive buffer at the time of connect, ensuring that any datagram buffered before `connect` was invoked are discarded. The old implementation used to preserve datagrams that were sent by the connected peer and buffered before the association was performed by the kernel, but the new implementation will simply discard them.

 - On macOS and Linux, invoking `disconnect` on the new implementation might require rebinding the underlying socket. This introduces the possibility that the rebinding might fail, and the underlying implementation might throw an exception, leaving the underlying socket in an unspecified state. Whereas the legacy implementation might have silently left the socket in an unspecified state, the new implementation will throw an `UncheckedIOException` instead.

 - When joining a multicast group on macOS, if no default outgoing interface has been set, and no outgoing network interface is provided, the old implementation of `MulticastSocket::joinGroup` will pick up a default network interface and incorrectly attempt to set it as default before joining, by silently setting the `IP_MULTICAST_IF` option. The new implementation based on NIO will not do this, so the `IP_MULTICAST_IF` option will never be silently set as a side effect of joining.

 - The `java.net` package defines many sub-classes of `SocketException`. The new implementation will attempt to throw the same exceptions in the same situations as the old implementation, but there may be cases where they are not the same. Furthermore, there may be cases where the exception messages differ. On Windows, for example, the old implementation maps Windows Socket error codes to English-only messages, while the new NIO-based implementation uses the system messages.

Other observable behavioral differences:

 - A `DatagramSocket` created via one of its public constructors supports setting options for sending multicast datagrams. The new implementation allows you to configure multicast socket options on base instances of `DatagramSocket` on all platforms. The old implementation still uses a dual-stack implementation on Windows which doesn't support multicast socket options on base `DatagramSocket` instances.  In that case an instance of `MulticastSocket` must be used if such options need to be configured.

 - The new implementation fixes a number of issues, such as [8165653][2], simply by the virtue of delegating to the NIO implementation, where these issues are not present.

Aside from behavioral differences, the performance of the new implementation may differ compared to the old when running certain workloads. This JEP will endeavor to provide some performance benchmarks to gauge the difference.

Dependencies
---------------------

 - Replacing the underlying implementation of DatagramSocket and MulticastSocket is a prerequisite for [Project Loom](https://openjdk.java.net/projects/loom).


[1]: https://openjdk.java.net/jeps/353
[2]: https://bugs.openjdk.java.net/browse/JDK-8165653

Comments
Nicely written! I've done a light copy-editing pass, to tighten up the wording and fix some formatting. If this looks okay to you then assign the issue to me and I'll move this JEP to Candidate.
27-02-2020

The prototype implementation is in the `JDK-8230211-branch` of the JDK Sandbox. hg clone http://hg.openjdk.java.net/jdk/sandbox hg update -r JDK-8230211-branch
26-02-2020