JDK-8221675 : URI does not allow underscore in host name, RFC 3986 allows them
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.net
  • Affected Version: 8,11,12,13
  • Priority: P4
  • Status: Closed
  • Resolution: Duplicate
  • OS: linux
  • CPU: x86_64
  • Submitted: 2019-03-29
  • Updated: 2019-03-29
  • Resolved: 2019-03-29
Related Reports
Duplicate :  
Description
ADDITIONAL SYSTEM INFORMATION :
generic

A DESCRIPTION OF THE PROBLEM :
There are claims in other bugs that URI does not allow for underscores in host name, referencing an old RFC (e.g. RFC 952,  etc.), but looking into RFC 3986 underscore is actually allowed:

```
3.2.2.  Host
  host        = IP-literal / IPv4address / reg-name
  reg-name    = *( unreserved / pct-encoded / sub-delims )
  
2.3.  Unreserved Characters
  unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~" 
```

Related to: JDK-8180809
URI and URL conflicts with dealing with hostname contains underscore

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
```
import java.net.URI;
import java.net.URL;

import org.junit.Assert;
import org.junit.Test;

public class UriTest {

  private static final String UNDERSCORE_IN_HOST_NAME = "http://my_host:8081";

  @Test
  public void failingTestUriCreate() {
    URI uri = URI.create(UNDERSCORE_IN_HOST_NAME);
    Assert.assertNull(uri.getHost()); // NOK
  }

  @Test
  public void failingTestNewUri() throws Exception {
    URI uri = new URI(UNDERSCORE_IN_HOST_NAME);
    Assert.assertNull(uri.getHost()); // NOK
  }

  @Test
  public void passingTestNewUrl() throws Exception {
    URL url = new URL(UNDERSCORE_IN_HOST_NAME);
    Assert.assertNotNull(url.getHost()); // OK
  }

  @Test
  public void passingTestUriToUrl() throws Exception {
    URI uri = new URI(UNDERSCORE_IN_HOST_NAME);
    URL url = uri.toURL();
    Assert.assertNull(uri.getHost()); // NOK
    Assert.assertNotNull(url.getHost()); // OK
  }

}
```

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
URI parsing logic, similar to URL parsing logic should accept underscore ('_') in host name.
ACTUAL -
URL accepts underscores in host names, URI does not. See the assertions in Steps to Reproduce

---------- BEGIN SOURCE ----------
import java.net.URI;
import java.net.URL;

import org.junit.Assert;
import org.junit.Test;

public class UriTest {

  private static final String UNDERSCORE_IN_HOST_NAME = "http://my_host:8081";

  @Test
  public void failingTestUriCreate() {
    URI uri = URI.create(UNDERSCORE_IN_HOST_NAME);
    Assert.assertNull(uri.getHost()); // NOK
  }

  @Test
  public void failingTestNewUri() throws Exception {
    URI uri = new URI(UNDERSCORE_IN_HOST_NAME);
    Assert.assertNull(uri.getHost()); // NOK
  }

  @Test
  public void passingTestNewUrl() throws Exception {
    URL url = new URL(UNDERSCORE_IN_HOST_NAME);
    Assert.assertNotNull(url.getHost()); // OK
  }

  @Test
  public void passingTestUriToUrl() throws Exception {
    URI uri = new URI(UNDERSCORE_IN_HOST_NAME);
    URL url = uri.toURL();
    Assert.assertNull(uri.getHost()); // NOK
    Assert.assertNotNull(url.getHost()); // OK
  }

}
---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
Workaround is shown in the last test case: URI.toURL().getHost() returns the correct host name:

    URI uri = new URI(UNDERSCORE_IN_HOST_NAME);
    URL url = uri.toURL();
    Assert.assertNull(uri.getHost()); // NOK
    Assert.assertNotNull(url.getHost()); // OK


FREQUENCY : always