JDK-8170568 : Improve address selection for network clients
  • Type: Sub-task
  • Component: core-libs
  • Sub-Component: java.net
  • Priority: P3
  • Status: Open
  • Resolution: Unresolved
  • Submitted: 2016-12-01
  • Updated: 2023-08-15
Related Reports
Duplicate :  
Duplicate :  
Relates :  
Relates :  
Description
Colleague Paul Marks reports:

The canonical algorithm for a TCP client is as follows:
- Call getaddrinfo(hostname), which returns all IP addresses sorted by RFC6724, or some approximation thereof.
- Loop over the addresses, connect() to each, and break on success.

For example: http://beej.us/guide/bgnet/output/html/singlepage/bgnet.html#simpleclient

However, Java clients typically use this over-simplified algorithm:
- Call getaddrinfo(hostname), and use -Djava.net.preferIPv6Addresses to place either IPv4 or IPv6 first in line.
- Pick the [0]th address, and connect() or fail.

These are the primary APIs that encourage this behavior:
- Socket(String host, int port)
- InetSocketAddress(String hostname, int port)
- InetAddress.getByName(String host)

The Socket() constructor can be fixed by adding a loop around connect(), as Android did a few years ago.  But the latter two purport to resolve a hostname to exactly one IP address, which is an ill-defined operation that should be deprecated.

One case where the algorithm breaks down is when running a client with IPv6-only connectivity using the default JVM flags.  Connecting to a dual-stack hostname picks only the IPv4 address, and the connection fails immediately.  Setting preferIPv6Addresses=true yields the opposite problem for IPv4-only clients.

The direct solution would be to make Inet6AddressImpl.c preserve getaddrinfo()'s ordering, and ignore "preferIPv6Addresses".  This was partially implemented by JDK-8016521, but it did not change the default to "preferIPv6Addresses=system", and in fact such a default would be risky because getaddrinfo() is not always right.  There are networks in the wild where the first TCP connect() fails, and fallback to a later address is required.  (In the extreme case, some pathological networks only work because of RFC6555 Happy Eyeballs, but let's ignore those for now.)  If clients simply switch from "try IPv4 or die" to "try IPv6 or die", then some people are bound to encounter IPv6 connect() failures and complain.

So, what actions do I propose?

1) Remove callers of the above "hostname to one IP address" APIs from OpenJDK, replacing them with getAllByName() and a connect() loop.  The Socket(...host...) and SSLSocketImpl(...host...) constructors are good places to begin.

2) Make InetAddress.getAllByName() never modify getaddrinfo()'s ordering.

3) The "hostname to one IP address" methods are forever cursed, but they could be changed to directly follow the preferIPv6Addresses flag, instead of returning getAllByName()[0].  This would maintain compatibility for most legacy code, while allowing getAllByName() to expose proper address selection by default.
Comments
I still think this would be a valuable change. I've been investigating dropping some custom network client logic for trying to pick which of IPv4 and IPv6 to use. Using -Djava.net.preferIPv6Addresses=system works in the vast majority of situations I've seen, but as this bug describes there are edge cases where the first address fails and retrying would have succeeded. I started looking at fixing one occurrence of this in sun.net.NetworkClient, and Alan provide some helpful notes in the review thread [1]. I think these are all excellent points, I wasn't aware of all of the history here when I opened the PR. I found a couple of net-dev threads with related discussions [2][3]. I'm planning to drop JDK-8313356 for now, and leave this bug as the canonical issue for reaching consensus on what the plan for this should be. [1] https://mail.openjdk.org/pipermail/net-dev/2023-July/021745.html [2] https://mail.openjdk.org/pipermail/net-dev/2017-March/010671.html [3] https://mail.openjdk.org/pipermail/net-dev/2019-April/012371.html
15-08-2023

This sounds very reasonable to me. Don't know if this is something still for JDK9 timeframe?
01-12-2016