JDK-6810437 : URL.hashcode() and URL.equals() are greedy with DN lookups
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.net
  • Affected Version: 6
  • Priority: P2
  • Status: Closed
  • Resolution: Not an Issue
  • OS: generic
  • CPU: generic
  • Submitted: 2009-02-26
  • Updated: 2010-04-04
  • Resolved: 2009-03-02
Related Reports
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Description
URL often are used as keys for hashtables. E.g. see 6754990. 
Same is applicable to deployment cache, etc.

If nothing is really download from the network then DNS lookups are one of most expensive operations that may happen. Especially if DNS access is slow.

Proposed solution is:
  1) for hashcode() use only file/protocol/port unless file portion is really short
     In later case use host inet address too
  2) for hostEquals() - first check string representations and if they match then do not perform DNS lookup

This version of hashcode() will still obey hashcode()/equals() contract. It will increase hashcode coincidence in some cases (if file paths are the same). But there are only few patterns when this may happen
(e.g. root page of the site) and we can workaround some of them using minimal requirement on length of file portion.

See suggested fix for details.

Comments
EVALUATION The violation to the specification is that the same host can resolve to different IP addresses (DNS round robin), in which case two URL that look equal (their strings are equal) may not actually be equal(URL). The suggested changes to URLStreamHandler.hashCode(URL) would lead to may URL's generating the same hashCode. For Example: http://foo.bar/index.html http://bar.foo/index.html http://xxx.yyy/index.html would all generate the same hashCode. This is not against the spec, but would appear like a poor implementation of a hash. The fact that equals and hashCode can trigger nameservice lookups is problematic, but this has been the behavior of URL since JDK1.0 and changing it would break backward compatibility. Given this dependency, URL's should not be used in HashMap/Sets. If possible, the application should use URI's and convert to URL using URI.toURL if it needs a URL reference.
02-03-2009

WORK AROUND URL's should not be use as a keys in a HashMap or put into a HashSet. Use URI or the String representation.
02-03-2009

EVALUATION The description section says, "This version of hashcode() will still obey hashcode()/equals() contract", but how can this be? From URL equals: "Two hosts are considered equivalent if both host names can be resolved into the same IP addresses; else if either host name can't be resolved, the host names must be equal without regard to case; or both host names equal to null." The suggested changes will violate the URL.equals specification.
26-02-2009

SUGGESTED FIX diff -r af84cae36e3c addon/java/net/URLStreamHandler.java --- a/addon/java/net/URLStreamHandler.java Sun Feb 15 15:25:37 2009 +0300 +++ b/addon/java/net/URLStreamHandler.java Thu Feb 26 13:46:40 2009 +0300 @@ -333,20 +333,24 @@ public abstract class URLStreamHandler { if (protocol != null) h += protocol.hashCode(); - // Generate the host part. - InetAddress addr = getHostAddress(u); - if (addr != null) { - h += addr.hashCode(); - } else { - String host = u.getHost(); - if (host != null) - h += host.toLowerCase().hashCode(); + // Generate the file part. + String file = u.getFile(); + if (file != null) { + h += file.hashCode(); } - // Generate the file part. - String file = u.getFile(); - if (file != null) - h += file.hashCode(); + if (file == null || file.length() < 3) { + // Generate the host part. + InetAddress addr = getHostAddress(u); + if (addr != null) { + h += addr.hashCode(); + } else { + String host = u.getHost(); + if (host != null) { + h += host.toLowerCase().hashCode(); + } + } + } // Generate the port part. if (u.getPort() == -1) @@ -436,16 +440,22 @@ public abstract class URLStreamHandler { * @since 1.3 */ protected boolean hostsEqual(URL u1, URL u2) { - InetAddress a1 = getHostAddress(u1); - InetAddress a2 = getHostAddress(u2); - // if we have internet address for both, compare them - if (a1 != null && a2 != null) { - return a1.equals(a2); - // else, if both have host names, compare them - } else if (u1.getHost() != null && u2.getHost() != null) - return u1.getHost().equalsIgnoreCase(u2.getHost()); - else - return u1.getHost() == null && u2.getHost() == null; + String h1 = u1.getHost(); + String h2 = u2.getHost(); + if (h1 != null && h2 != null) { + if (h1.equalsIgnoreCase(h2)) { + return true; + } + InetAddress a1 = getHostAddress(u1); + InetAddress a2 = getHostAddress(u2); + // if we have internet address for both, compare them + if (a1 != null && a2 != null) { + return a1.equals(a2); + } + return false; + } else { + return h1 == null && h2 == null; + } } /**
26-02-2009

EVALUATION I do not see violation here. Regarding equals: We do not change the check. Suggested change is to change order in which checks are performed. If hostnames are equal as strings then we can safely assume that their IP addresses are the same. It does conform to cited definition. hashcode() must be the same for equal URLs. This is true as well. Decision whether to take into account IP address is based on file part of URL and it must be identical for identical URLs. In case when ip address is not used for hashcode calculation we still get identical hashcodes for equal URLs as file, protocol and port parts must be identical => hashcodes are identical.
26-02-2009