JDK-4406592 : HttpURLConnection fails on some valid URLs with FileNotFoundException
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.net
  • Affected Version: 1.3.0
  • Priority: P4
  • Status: Closed
  • Resolution: Duplicate
  • OS: solaris_2.6
  • CPU: sparc
  • Submitted: 2001-01-21
  • Updated: 2001-01-24
  • Resolved: 2001-01-24
Related Reports
Duplicate :  
Description

Name: boT120536			Date: 01/21/2001


java version "1.3.0"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.3.0)
Java HotSpot(TM) Client VM (build 1.3.0, mixed mode)

The sun.net.www.protocol.http.HttpURLConnection class fails to correctly allow
access to some valid URLs.  The failure occurs under Solaris 2.6.  It occurs
both in JDK 1.3.0 and also JDK 1.2.2.  It also occurs under MacOS X beta (which
I realize it not supported by Sun).  It does NOT fail under Linux.

The URL in question is one that is not at our site, but I include a test
function that should illustrate the failure.  The test code below will
successfully access the first 3 URLs:
   http://www.isi.edu
   http://citeseer.nj.nec.com/
   http://citeseer.nj.nec.com/correct/

   http://citeseer.nj.nec.com/correct/163004
   http://citeseer.nj.nec.com/correct/163004/

 and then fail on the fourth and fifth.
All five URLs can be successfully read via the Netscape browser.
All five URLs can be successfully read, and return response code 200 when
a telnet connection is made to port 80 on the host and the request sent manually
in either HTTP/1.0 or HTTP/1.1 format.

Source code to demonstrate the problem:

import java.net.*;
import java.util.*;
import java.io.*;

public class HttpTest {

    public static void getPage(String urlString)  throws IOException {
        // Retrieve the page at `urlString' and print the first 500 bytes.
	
        URL url = new URL(urlString);
 	InputStream pageStream;
	int ch;
	int count = 0;
	HttpURLConnection connection = null;

 	if(url.getProtocol().equalsIgnoreCase("http")){
	  try {
	    System.out.println("====  RETRIEVING " + url + "  ====");
	    System.out.println();
	    connection = (HttpURLConnection) url.openConnection();
	    System.err.println("HttpURLConnection opened: " + connection);
	    pageStream = connection.getInputStream();
	    System.err.println("HttpURLConnection input Stream: " + pageStream);
	    System.err.print("Response: ");
	    System.err.print(connection.getResponseCode());
	    System.err.println(" " + connection.getResponseMessage());
	    System.out.println();
	    System.out.println("====  CONTENT  ====");
	    System.out.println();
	    for (ch = pageStream.read() ; ch !=-1 ; ch = pageStream.read()) {
	      if (++count < 500) {
		System.out.write(ch);
	      } else if (count == 500) {
		System.out.println();
		System.out.println("<More...>");
	      }
	    }
	    System.out.println();
	    System.out.println("====  DONE " + count + " bytes  ====");
	    System.out.println();
	  } catch (Exception e) {
	    System.err.println();
	    System.err.println("*** ERROR: " + e);
	    e.printStackTrace();
	    System.err.println();
	  }
	}
    }

  public static void main(String args[]) {
    // Run a loop with test URLs via the Java http URL support.

    // All of these urls work from a browser.
    // All of them work from Linux
    // All of them work manually telnetting to port 80
    //   and issuing a "GET <url> HTTP/1.0" command.
    //
    // Two different "failures" occur in other systems:
    // The latter two fail on Solaris, jdk 1.3.0 and jdk 1.2.2
    // The latter two fail on MacOS X jdk 1.2.2
    String[] urls
      = new String [] {"http://www.isi.edu",
			 "http://citeseer.nj.nec.com/",
			 "http://citeseer.nj.nec.com/correct/",
			 "http://citeseer.nj.nec.com/correct/163004",  // Fails with error
			 "http://citeseer.nj.nec.com/correct/163004/"  // Fails with busy
			 };

    for (int i = 0; i < urls.length; i++) {
      try {
	getPage(urls[i]);
      } catch (Exception e) {
	System.err.println();
	System.err.println("**** Error: " + e);
	e.printStackTrace();
	System.err.println();
      }
    }
  }
}



Sample Trace of the program running on our system:

====  RETRIEVING http://www.isi.edu  ====

HttpURLConnection opened:
sun.net.www.protocol.http.HttpURLConnection:http://www.isi.edu
HttpURLConnection input Stream: sun.net.www.http.KeepAliveStream@4b222f
Response: 200 OK

====  CONTENT  ====

<HTML>

<HEAD><TITLE>USC Information Sciences
Institute</TITLE></HEAD>

<BODY BACKGROUND="images/bg-nologo.jpg"
TEXT="#000000" LINK="#AA0000" VLINK="#111111">


<MAP NAME="ISI">

	<AREA SHAPE=rect HREF="http://www.isi.edu/about.html"
COORDS="18,112,122,151">
	<AREA SHAPE=rect
HREF="http://www.isi.edu/publications.html" COORDS="16,154,121,192">

	<AREA SHAPE=rect HREF="http://www.isi.edu/servicelist.html"
COORDS="17,195,120,233">
	<AREA SHAPE=rect
HREF="http://www.isi.edu/divisions/main/index.
<More...>

====  DONE 3120 bytes  ====

====  RETRIEVING http://citeseer.nj.nec.com/  ====

HttpURLConnection opened:
sun.net.www.protocol.http.HttpURLConnection:http://citeseer.nj.nec.com/
HttpURLConnection input Stream: sun.net.www.http.KeepAliveStream@2125f0
Response: 200 OK

====  CONTENT  ====

<html><head><TITLE>ResearchIndex: The NECI Scientific Literature Digital Library
[Steve Lawrence, Kurt Bollacker, Lee Giles, NEC Research Institute]</TITLE>
 <!70>
<META name="description" content="ResearchIndex (formerly CiteSeer): The NECI
Scientific Literature Digital Library. Autonomously creates citation indexes of
scientific literature. Advantages in terms of availability, coverage,
timeliness, and efficiency. Generates citation statistics and allows easy
browsing of the context of citati
<More...>

====  DONE 11077 bytes  ====

====  RETRIEVING http://citeseer.nj.nec.com/correct/  ====

HttpURLConnection opened:
sun.net.www.protocol.http.HttpURLConnection:http://citeseer.nj.nec.com/correct/
HttpURLConnection input Stream: sun.net.www.http.KeepAliveStream@41cd1f
Response: 200 OK

====  CONTENT  ====

<html><head><TITLE>ResearchIndex: The NECI Scientific Literature Digital Library
[Steve Lawrence, Kurt Bollacker, Lee Giles, NEC Research Institute]</TITLE>
<!9>
<META name="description" content="ResearchIndex (formerly CiteSeer): The NECI
Scientific Literature Digital Library. Autonomously creates citation indexes of
scientific literature. Advantages in terms of availability, coverage,
timeliness, and efficiency. Generates citation statistics and allows easy
browsing of the context of citation
<More...>

====  DONE 11097 bytes  ====

====  RETRIEVING http://citeseer.nj.nec.com/correct/163004  ====

HttpURLConnection opened:
sun.net.www.protocol.http.HttpURLConnection:http://citeseer.nj.nec.com/correct/163004

*** ERROR: java.io.FileNotFoundException:
http://citeseer.nj.nec.com/correct/163004
java.io.FileNotFoundException: http://citeseer.nj.nec.com/correct/163004
	at
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:545)
	at ir_tools.HttpTest2.getPage(HttpTest2.java:24)
	at ir_tools.HttpTest2.main(HttpTest2.java:62)

====  RETRIEVING http://citeseer.nj.nec.com/correct/163004/  ====

HttpURLConnection opened:
sun.net.www.protocol.http.HttpURLConnection:http://citeseer.nj.nec.com/correct/163004/
HttpURLConnection input Stream: sun.net.www.MeteredStream@31f71a
Response: 503 System busy

====  CONTENT  ====

<!DOCTYPE HTML
	PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
	"http://www.w3.org/TR/html4/loose.dtd">
<HTML LANG="en-US"><HEAD><TITLE>ResearchIndex [NEC Research Institute; Steve
Lawrence, Kurt Bollacker, Lee Giles; Computer Science]</TITLE>
<LINK REV=MADE HREF="mailto:lawrence%40research.nj.nec.com">
<BASE HREF="http://citeseer.nj.nec.com/correct/163004/">
<META NAME="description" CONTENT="ResearchIndex (CiteSeer): Scientific
Literature Digital Library incorporating autonomous citation inde
<More...>

====  DONE 1530 bytes  ====
(Review ID: 115411) 
======================================================================

Comments
WORK AROUND Name: boT120536 Date: 01/21/2001 Use Linux on Intel hardware instead of Solaris 2.6 on Sparc. Our Linux test used java version: java version "1.3.0beta_refresh" Java(TM) 2 Runtime Environment, Standard Edition (build 1.3.0beta_refresh-b09) Java HotSpot(TM) Client VM (build 1.3.0beta-b07, mixed mode) ======================================================================
11-06-2004

EVALUATION There are two issues that have come together to cause FileNotFoundException to be thrown for this test case :- 1. Prior to merlin our http client throws a FNF exception for all http response codes >= 400 if the filename isn't the standard .html, .htm, .txt, ... This is a bug and has already been fixed so that we get the following behaviour in merlin :- (a) If the response code is 404 or 410 we throw a FNF (b) For other response codes >= 400 we throw a more general IOException When a response code >= 400 is received the application can read the entity body from the server via the error stream (ie: use getErrorStream to read the error page from the server). The changes to the http client error handling is being tracked in 4160499. 2. The second issue is that there is a CGI application on citeseer.nj.nec.com doesn't like the default user-agent which we present in the http request. It appears that citeseer.nj.nec.com is seeking a user-agent of the format name/version. If the format is not correct it returns a 503 error. Here is the response from the server when we request /correct/163004 with a user-agent of Java1.4 :- HTTP/1.1 503 System busy Date: Wed, 24 Jan 2001 10:53:10 GMT Server: Apache/1.3.12 (Unix) Connection: close Content-type: text/html; charset=ISO-8859-1 Due to issue one we are incorrectly throwing a FNF exception when we see the 503 status. If I run the test case with -Dhttp.agent=Java/1.4 the server is happy and we get 200 OK. Based on the above I am closing this bug as a duplicate of 4160499 which has already been fixed in merlin. alan.bateman@ireland 2001-01-24
24-01-2001