JDK-8170265 : underscore is allowed in java.net.URL while not in java.net.URI
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.net
  • Affected Version: 8,9
  • Priority: P4
  • Status: Resolved
  • Resolution: Not an Issue
  • OS: generic
  • CPU: generic
  • Submitted: 2016-11-23
  • Updated: 2016-11-23
  • Resolved: 2016-11-23
Description
FULL PRODUCT VERSION :
java version "1.8.0_25"
Java(TM) SE Runtime Environment (build 1.8.0_25-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)

ADDITIONAL OS VERSION INFORMATION :
Darwin leitang-MacBook-Pro.local 14.5.0 Darwin Kernel Version 14.5.0: Sun Sep 25 22:07:15 PDT 2016; root:xnu-2782.50.9~1/RELEASE_X86_64 x86_64

A DESCRIPTION OF THE PROBLEM :
public static void main(String[] args)
  {
    try {
      	String xx = "http://coder_1.tanglei.name";
        java.net.URI u = new java.net.URI(xx);
    	System.out.println(u);
    	System.out.println(u.getHost()); // null
      	java.net.URL url = new java.net.URL(xx);
        System.out.println(url);
    	System.out.println(url.getHost()); // coder_1.tanglei.name
        java.net.URI u2 = new java.net.URI(url.getProtocol(), url.getHost(), url.getPath(), "");
      	System.out.println(u2);
    } catch (Exception e){
      	System.out.println("exception got: " + e.getMessage());
    }
  }


"_" is not allowed in java.net.URI but is allowed in java.net.URL. 

output:

http://coder_1.tanglei.name
null
http://coder_1.tanglei.name
coder_1.tanglei.name
exception got: Illegal character in hostname at index 12: http://coder_1.tanglei.name#


REGRESSION.  Last worked in version 8u101

ADDITIONAL REGRESSION INFORMATION: 
java version "1.8.0_101"
Java(TM) SE Runtime Environment (build 1.8.0_101-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode)

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
public static void main(String[] args)
  {
    try {
      	String xx = "http://coder_1.tanglei.name";
        java.net.URI u = new java.net.URI(xx);
    	System.out.println(u);
    	System.out.println(u.getHost()); // null
      	java.net.URL url = new java.net.URL(xx);
        System.out.println(url);
    	System.out.println(url.getHost()); // coder_1.tanglei.name
        java.net.URI u2 = new java.net.URI(url.getProtocol(), url.getHost(), url.getPath(), "");
      	System.out.println(u2);
    } catch (Exception e){
      	System.out.println("exception got: " + e.getMessage());
    }
  }


EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
The same as URL
ACTUAL -
http://coder_1.tanglei.name
null
http://coder_1.tanglei.name
coder_1.tanglei.name
exception got: Illegal character in hostname at index 12: http://coder_1.tanglei.name#

REPRODUCIBILITY :
This bug can be reproduced always.


Comments
The constructors of URL class (e.g., http://download.java.net/java/jdk9/docs/api/java/net/URL.html#URL-java.lang.String-java.lang.String-java.lang.String-) specifically mention about the validation: "No validation of the inputs is performed by this constructor." So not throwing an exception isn't an issue here.
23-11-2016

As per RFC 2396: "Hostnames take the form described in Section 3 of [RFC1034] and Section 2.1 of [RFC1123]: a sequence of domain labels separated by ".", each domain label starting and ending with an alphanumeric character and possibly also containing "-" characters. The rightmost domain label of a fully qualified domain name will never start with a digit, thus syntactically distinguishing domain names from IPv4 addresses, and may be followed by a single "." if it is necessary to distinguish between the complete domain name and any local domain. To actually be "Uniform" as a resource locator, a URL hostname should be a fully qualified domain name. In practice, however, the host component may be a local domain literal. " URI class is following the above, but URL class doesn't seem to follow the same rules. To reproduce the issue , run the attached test case. Following is the output on various JDK versions: JDK 8 - Fail JDK 8u112 - Fail JDK 8u122-ea - Fail JDK 9-ea + 141 - Fail Following is the output : http://coder_1.tanglei.name null http://coder_1.tanglei.name coder_1.tanglei.name exception got: Illegal character in hostname at index 12: http://coder_1.tanglei.name# java.net.URISyntaxException: Illegal character in hostname at index 12: http://coder_1.tanglei.name# at java.net.URI$Parser.fail(java.base@9-ea/URI.java:2882) at java.net.URI$Parser.parseHostname(java.base@9-ea/URI.java:3417) at java.net.URI$Parser.parseServer(java.base@9-ea/URI.java:3266) at java.net.URI$Parser.parseAuthority(java.base@9-ea/URI.java:3185) at java.net.URI$Parser.parseHierarchical(java.base@9-ea/URI.java:3127) at java.net.URI$Parser.parse(java.base@9-ea/URI.java:3083) at java.net.URI.<init>(java.base@9-ea/URI.java:672) at java.net.URI.<init>(java.base@9-ea/URI.java:773) at JI9045595.main(JI9045595.java:13)
23-11-2016