JDK-4434494 : The URL class treats as equal different URLs
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.net
  • Affected Version: 1.3.0,1.3.1
  • Priority: P4
  • Status: Closed
  • Resolution: Won't Fix
  • OS: generic,linux,windows_2000
  • CPU: generic,x86
  • Submitted: 2001-04-05
  • Updated: 2001-10-13
  • Resolved: 2001-10-13
Related Reports
Duplicate :  
Duplicate :  
Duplicate :  
Description

Name: ssT124754			Date: 04/05/2001


java version "1.3.0_02"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.3.0_02)
Java HotSpot(TM) Client VM (build 1.3.0_02, mixed mode)



Run this code:

import java.io.*;
import java.net.*;
import java.util.*;

public class prova {
         private static Set URLSet;

        
        public static void main( String arg[] ) throws InterruptedException {
                URLSet = new HashSet();
                URL a = null, b = null;
                try {
                         URLSet.add(a = new URL("http://ioi.dsi.unimi.it"));
                         URLSet.add(b = new URL("http://gongolo.usr.dsi.unimi.it"));
                } catch(Exception e) { System.err.println(e); };

                System.err.println("Sizeof URLSet :"+URLSet.size());
                System.err.println(b.equals(a));
                System.err.println(a);
                System.err.println(b);

        }
}


The two URLs above refer to the same host, but they point to completely
different web pages. This is absolutely common with providers, that have
hundreds of different virtual hosts. According to the W3C specs, an URL
is just the stream of characters, and two URLs are the same URL if and only
if they match character by character. The behaviour of equals() and hashCode()
on URLs should definitely be revised. The host name _IS_ meaningful, as it
can be interpreted, for instance by the HTTP server, to point to completely
different resources. So considering as equal different resources is not
sensible. This is going to give major headaches around.
This report was submitted four months ago, and closed as fixed in JDK 1.3,
but it was not.


So let me be even
more precise. I can do the following:

1) Compile
import java.net.*;

public class Test {
	 public static void main(String args[]) {
		  try {
				System.out.println((new URL("http://ioi.dsi.unimi.it")).equals(new URL("http://ne.dsi.unimi.it")));
		  }
		  catch(Exception e) {}
	 }

}

2) "java Test" prints "true"

3) "ifdown eth0" (I turn off my ethernet card, and thus my network connection)

4) "java Test" now prints "false"

If I would build now a set with this two elements, it would become incoherent
and all Set methods would work incorrectly as soon as I turned on my ethernet
card (as two equal object would be in a Set).

Does anyone at Sun think this is reasonable? I cannot believe it...

Ciao,

(Review ID: 118791) 
======================================================================

Comments
WORK AROUND Name: ssT124754 Date: 04/05/2001 URL is final, so this is hopeless. ======================================================================
11-06-2004

EVALUATION We are well aware of the problem with URL.equals and URL.hashCode. The cause of the problem is due to the existing spec and implementation, where we will try to compare the two URLs by resolving the host IP addresses, instead of just doing a string comparison. Because hashCode has to maintain certain relationships with equals, namely if two objects are equal, they should have the same hashCode, the implementation of hashCode also tries to resolve the host in the URL into an IP address. As a result, we are facing problems with http virtual hosting, as described in the Description part, and performance hit due to DNS name resolutions. Unfortunately, changing the behavior now would break backward compatibility in a serious way, plus Java Security mechanism depends on it in some parts of the implementation. We can't change it now. However, to address URI parsing in general, we introduced a new class called URI in Merlin (jdk1.4). People are encouraged to use URI for parsing and URI comparison, and leave URL class for accessing the URI itself, getting at the protocol handler, interacting with the protocol etc. So, at present, we don't plan on changing the URL.equals/hashCode behavior and we will leave the bug open until Tiger, when we re-investigate our options. ###@###.### 2001-04-19
19-04-2001