JDK-4426753 : URL insert into MAP as key performance bug
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.net
  • Affected Version: 1.3.0
  • Priority: P5
  • Status: Closed
  • Resolution: Duplicate
  • OS: generic
  • CPU: generic
  • Submitted: 2001-03-16
  • Updated: 2001-04-19
  • Resolved: 2001-04-19
Related Reports
Duplicate :  
Description

Name: boT120536			Date: 03/16/2001


fog1: {9} % java -version
java version "1.3.0"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.3.0)
Java HotSpot(TM) Client VM (build 1.3.0, mixed mode)

When I insert URLs into a Map is runs VERY!!! slowly, yet uses no CPU.
When I insert URLs with the same Host, is runs fast, and no problem.
When the URLs have different Hosts, the bug occurs.
URL is final class, so I couldn't subclass from it to try and work around it, 
by changing the hash funtion, etc.

I've already tried setting the loadfactor to .5 and setting the initial
capacity to like 10000.  And I've tried setting it small.  It barely get to
10 or 15 URLs and has problems.  The performance it very bursty, like it
does 5 and then hangs, then does 10 and hangs, then 2 and hangs.   For
example, in 10 min, I only was able to put 653 URLs into the map.  VERY bad
performance.

I thought at first it was a hashing problem (too many hashings going on),
but if that were the case, the CPU should be stressed.  CPU is virtually 0.

I've tried increasing threads priority, with little effect.

I'm running on a Dual 666MHz PIII, 256MB ram with Linux.  Same thing on win2k.


Code that causes the problem:
import java.lang.*;
import java.util.*;
import java.net.*;
import java.io.*;

public class h{
    public static void main(String[] args){
	//This CODE goes very slow!!!!  But the CPU is not in used, so probably a locking problem
        try{
	    Map m_TrackedLinks = new HashMap();
	    int count = 0;
            while(true){
		Integer I = new Integer(count++);
		URL url = new URL("http://www.a" + I.toString() + ".com/" +I.toString());
                m_TrackedLinks.put(url, "Hello world");
		System.out.println("Adding url: " + url.toString());
            }
	    
        }
        catch(Exception e){System.out.println("Had a problems");}
    }
}

Same code, but this time with the same Host.  It works really fast.
import java.lang.*;
import java.util.*;
import java.net.*;
import java.io.*;

public class g{
    public static void main(String[] args){
        try{
	    Map m_TrackedLinks = new HashMap();
	    int count = 0;
            while(true){
		Integer I = new Integer(count++);
		URL url = new URL("http://www.a.com/" + I.toString());
                m_TrackedLinks.put(url, "Hello world");
		System.out.println("Adding url: " + url.toString());
            }
	    
        }
        catch(Exception e){System.out.println("Had a problems");}
    }
}
(Review ID: 118970) 
======================================================================

Comments
EVALUATION I think the way we generate hashCode for a URL is wrong. Among other things, we would always do a DNS lookup for the hostname contained in the URL and do call InetAddress.hashCode() on the result if we can resolve the hostname. But I really don't think this is necessary. For this part, we should just get a hashCode on the hostname itself. yingxian.wang@eng 2001-03-16 We can't change the behaviour now. It would introduce a serious backward compatibility problem. Also, some parts of our security mechanism depends on it. So use the workaround or use URI, a newly introduced class in Merlin. yingxian.wang@eng 2001-03-27
27-03-2001

WORK AROUND Name: boT120536 Date: 03/16/2001 None, class is final. Please fix!!!! ====================================================================== The workaround is don't use URL object as key into the hashtable, instead, use something like url.toString() as the key into it. yingxian.wang@eng 2001-03-16
16-03-2001