JDK-5049974 : java.net.URI parsing not allowing some "unreserved" chars in hostname
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.net
  • Affected Version: 1.4.2
  • Priority: P4
  • Status: Closed
  • Resolution: Not an Issue
  • OS: windows_2000,windows_xp
  • CPU: x86
  • Submitted: 2004-05-19
  • Updated: 2005-01-01
  • Resolved: 2004-05-26
Description
The

Name: gm110360			Date: 05/19/2004


FULL PRODUCT VERSION :
java version "1.5.0-beta"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0-beta-b32c)
Java HotSpot(TM) Client VM (build 1.5.0-beta-b32c, mixed mode)

ADDITIONAL OS VERSION INFORMATION :
Microsoft Windows 2000 [Version 5.00.2195]

A DESCRIPTION OF THE PROBLEM :
The javadoc for java.net.URI says (under "Character category") "The set of all legal URI characters consists of the unreserved, reserved, escaped, and other characters."
Unreserved category contains "_-!.~'()*"
Yet if the hostname portion of an URI contains any of the unreserved chars other than - and ., URI parsing will fail and default to null and -1 for the hostname and port.
We discovered this problem via java.rmi.Naming.lookup() and Naming.list(). When the url string passed to these methods uses hostname for the server and the hostname contains '_', these methods will try to contact rmiregistry on the local host at default port 1099. The reason is URI produced a parse failure so Naming got back a pair of null/-1 as host/port then defaulted to localhost:1099. Therefore, the use of URI class - new to Java 1.4 - produces a regression bug in class java.rmi.Naming's methods. '_' is perfectly OK in hostnames passed to Naming.lookup() in Java 1.3.
For the fix, please test all char's in the unreserved category are acceptable to hostname.

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
the program takes a hostname from commandline, constructs a URI object and prints out the host and port as parsed from the URI object. If parsed OK, the print out should be hostname passed in and port=11099.

Test 1: pass a hostname containing '_', eg. "abc_def".
Test 2: pass a hostname containing alphanumerics only, eg. "abcdef"

Compare results.


EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Test 1:
abc_def
11099

Test 2:
abcdef
11099
ACTUAL -
Test 1:
null
-1

Test 2: (OK testcase)
abcdef
11099

ERROR MESSAGES/STACK TRACES THAT OCCUR :
none; no exceptions.

REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
import java.net.*;
public class UriTest {
        public static void main(String[] args) throws Exception {
                java.net.URI uri = new java.net.URI("//"+args[0]+":11099/");
                System.out.println(uri.getHost());
                System.out.println(uri.getPort());
        }
}
---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
none, if one has to use hostname instead of IP.
(Incident Review ID: 270431) 
======================================================================

Comments
EVALUATION The spec is clear that if the authority component cannot be parsed as a server-based authority then it is considered to be registry based. Underscore is not a valid character in a hostname according to RFC 2396, RFC 952, and RFC 1123. -- Indeed RFC 2396 specifies: hostname = *( domainlabel "." ) toplabel [ "." ] domainlabel = alphanum | alphanum *( alphanum | "-" ) alphanum toplabel = alpha | alpha *( alphanum | "-" ) alphanum Which clearly states that from the unreserved characters only the '-' is legal. Closing as not a bug. ###@###.### 2004-05-26
26-05-2004