JDK-8145388 : URLConnection.guessContentTypeFromStream returns image/jpg for some JPEG images
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.net
  • Affected Version: 8u60,9
  • Priority: P4
  • Status: Closed
  • Resolution: Fixed
  • OS: windows_7
  • CPU: x86_64
  • Submitted: 2015-10-21
  • Updated: 2016-07-21
  • Resolved: 2015-12-25
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 8 JDK 9
8u102Fixed 9 b100Fixed
Description
FULL PRODUCT VERSION :
java version "1.8.0_60"
Java(TM) SE Runtime Environment (build 1.8.0_60-b27)
Java HotSpot(TM) 64-Bit Server VM (build 25.60-b23, mixed mode)

A DESCRIPTION OF THE PROBLEM :
For some JPEG images the method java.net.URLConnection#guessContentTypeFromStream will return the MIME type image/jpg, which is not a valid, registered IANA mime type for JPEG images. 

For other images, the correct MIME type (image/jpeg) is returned.

It could be open for debate how invalid image/jpg truly is, but here's the IANA list:
http://www.iana.org/assignments/media-types/media-types.xhtml

Regardless of the validity of the MIME type, I could not find any information on *why* Java will return two different MIME types for JPEGs. 

Sadly I cannot provide the image that triggered the relevant detection code (it's from a customer), but after some googling, it seems like the relevant JPEG APP14 header is an application specific header, in this case from Adobe (see http://www.sno.phy.queensu.ca/~phil/exiftool/TagNames/JPEG.html).


EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
The function should return image/jpeg for all JPEG images, not just some.
ACTUAL -
The function returned image/jpg for some JPEG images encoded with Adobe software.

REPRODUCIBILITY :
This bug can be reproduced always.

CUSTOMER SUBMITTED WORKAROUND :
Manually replacing image/jpg in the result with image/jpeg.


Comments
Attached image provided by the submitter.
07-01-2016

It has been like this since 1996, which means JDK 1.1 Here's the SCCS history D 1.24 96/09/16 21:29:32 brown 27 26 00028/00001/00524 MRs: COMMENTS: guessType public, recognize more types sccsdiff -r1.23 -r1.24 SCCS/s.URLConnection.java ... .. > if (c1 == 0xFF && c2 == 0xD8 && c3 == 0xFF && c4 == 0xE0) > return "image/jpeg"; > if (c1 == 0xFF && c2 == 0xD8 && c3 == 0xFF && c4 == 0xEE) > return "image/jpg"; ... ... No explanation for the difference and it is unclear if it was intentional or not. If intentional then I do not know why unless APP14 was not yet widely accepted and it was some kind of attempt to distinguish this case.
23-12-2015

I was able to reproduce this bug with APP14 Adobe image. Bug also reproducible in jdk9-ea. This is the forum link where you can find the example images: http://130.15.24.88/exiftool/forum/index.php?topic=6448.0
22-12-2015

I think the point made in the bug is a valid one, the the ContentType is image/jpeg, as per the standard. The check is on the image format determine from the APPn segment, so there is E0, E1, and EE, checks. The first return image/jpeg, and as such EE should return image/jpeg, also. File extension can be jpeg, jpeg, and jpe, but the content type is image/jpeg. Looks like a typo, unless there is some very subtle reason.
21-12-2015

Tried reproducing with the attached test case on a few jpeg images, but not successful.
15-12-2015