Bug ID: JDK-4212077 API: SimpleDateFormat.parse throws StringIndexOutOfBoundException w/ GMT+0100

Type: Bug
Component: core-libs
Sub-Component: java.text
Affected Version: 1.2.0,1.3.0,1.3.0_02

Priority: P4
Status: Closed
Resolution: Duplicate
OS: generic,windows_nt,windows_2000
CPU: generic,x86

Submitted: 1999-02-17
Updated: 2001-06-20
Resolved: 2001-06-20


Name: gsC80088			Date: 02/16/99


Maybe related to: 4106807, 4029994

To reproduce, compile and run this application:

/* SOURCE */
import java.text.*;
import java.util.*;

public class DateParser {

	public DateParser() {
		DateFormat formatter = new SimpleDateFormat("EEE, dd MMM yyyy HH:mm:ss zzz");

		try {
			Date myDate = formatter.parse("Fri, 31 Dec 1999 00:00:00 GMT+0100");
			System.out.println("Date is: " + myDate);
		} catch(ParseException pe2) {
			System.out.println("ParseException");
		}
	}

	public static void main (String[] args) {
		new DateParser();
	}
}
/* END SOURCE */

/* OUTPUT */
Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String ind
ex out of range: 34
        at java.lang.String.charAt(Compiled Code)
        at java.text.SimpleDateFormat.subParse(Compiled Code)
        at java.text.SimpleDateFormat.parse(Compiled Code)
        at java.text.DateFormat.parse(Compiled Code)
        at DateParser.<init>(Compiled Code)
        at DateParser.main(Compiled Code)
/* END OUTPUT */

I think that in the current implementation of SimpleDateFormat, a ParseException would have been thrown
if this bug had not existed.

However, it may be advisable to make the format of the timezone specification user-definable too.
(Review ID: 52227)
======================================================================

WORK AROUND Name: gsC80088 Date: 02/16/99 For the application not to break, it is sufficient to catch the StringIndexOutOfBoundsException. ======================================================================

11-06-2004

SUGGESTED FIX From alan.liu@eng 1999-03-10: 2c2 < * @(#)SimpleDateFormat.java 1.44 99/03/02 --- > * @(#)SimpleDateFormat.java 1.46 99/03/10 173c173,180 < * GMT-hours:minutes. --- > * GMT-hours:minutes. The following formats are accepted: > * <pre> > * GMT[+-]hh:mm GMT[+-]h:mm GMT[+-]hhmm GMT[+-]hmm GMT[+-]hh GMT[+-]h > * [+-]hh:mm [+-]h:mm [+-]hhmm [+-]hmm [+-]hh [+-]h > * </pre> > * where [+-] is a required "+" or "-" sign, and h and m are hour and minute > * digits, as defined by <code>Character.digit()</code>. The offset must > * fall in the range -23:59 to +23:59, inclusive. Minutes must not exceed 59. 1033d1039 < int sign = 0; 1036,1040c1042,1046 < // For time zones that have no known names, look for strings < // of the form: < // GMT[+-]hours:minutes or < // GMT[+-]hhmm or < // GMT. --- > // For time zones that have no known names, look for strings of > // the form: "GMT" <offset specifier>, where the offset > // specifier is of the form [+-]hh:mm and several variants (see > // parseZoneOffset). > 1044,1045d1049 < calendar.set(Calendar.DST_OFFSET, 0); < 1048,1055c1052,1055 < if( text.charAt(pos.index) == '+' ) < sign = 1; < else if( text.charAt(pos.index) == '-' ) < sign = -1; < else { < calendar.set(Calendar.ZONE_OFFSET, 0 ); < return pos.index; < } --- > // If the next character is a '+' or a '-' then we expect > // a following offset expression, [+-]hh:mm or its variants. > char c = pos.index < text.length() ? text.charAt(pos.index) : 0; > boolean expectOffset = c == '+' || c == '-'; 1057,1064c1057,1061 < // Look for hours:minutes or hhmm. < pos.index++; < // WORK AROUND BUG IN NUMBER FORMAT IN 1.2B3 < int parseStart = pos.getIndex(); < Number tzNumber = numberFormat.parse(text, pos); < if( tzNumber == null < // WORK AROUND BUG IN NUMBER FORMAT IN 1.2B3 < || pos.getIndex() == parseStart) { --- > int initial = pos.index; > offset = parseZoneOffset(text, pos); > > // Fail if we expected an offset and didn't get one. > if (expectOffset && pos.index == initial) { 1067,1090c1064,1068 < if( text.charAt(pos.index) == ':' ) { < // This is the hours:minutes case < offset = tzNumber.intValue() * 60; < pos.index++; < // WORK AROUND BUG IN NUMBER FORMAT IN 1.2B3 < parseStart = pos.getIndex(); < tzNumber = numberFormat.parse(text, pos); < if( tzNumber == null < // WORK AROUND BUG IN NUMBER FORMAT IN 1.2B3 < || pos.getIndex() == parseStart) { < return -start; < } < offset += tzNumber.intValue(); < } < else { < // This is the hhmm case. < offset = tzNumber.intValue(); < if( offset < 24 ) < offset *= 60; < else < offset = offset % 100 + offset / 100 * 60; < } < < // Fall through for final processing below of 'offset' and 'sign'. --- > > // Fall through for final processing below of 'offset'. If > // there is no offset specifier, that is, if the string is > // just "GMT", then 'offset' will be zero, and the code > // below will still set up the calendar properly. 1120,1142c1098 < // [+-]hhmm as specified by RFC 822. This code is actually < // a little more permissive than RFC 822. It will try to do < // its best with numbers that aren't strictly 4 digits long. < DecimalFormat fmt = new DecimalFormat("+####;-####"); < fmt.setParseIntegerOnly(true); < // WORK AROUND BUG IN NUMBER FORMAT IN 1.2B3 < int parseStart = pos.getIndex(); < Number tzNumber = fmt.parse( text, pos ); < if( tzNumber == null < // WORK AROUND BUG IN NUMBER FORMAT IN 1.2B3 < || pos.getIndex() == parseStart) { < return -start; // Wasn't actually a number. < } < offset = tzNumber.intValue(); < sign = 1; < if( offset < 0 ) { < sign = -1; < offset = -offset; < } < if( offset < 24 ) < offset = offset * 60; < else < offset = offset % 100 + offset / 100 * 60; --- > // [+-]hh:mm and several variants (see parseZoneOffset). 1144c1100,1102 < // Fall through for final processing below of 'offset' and 'sign'. --- > offset = parseZoneOffset(text, pos); > > // Fall through for final processing below of 'offset' 1147,1151c1105,1113 < // Do the final processing for both of the above cases. We only < // arrive here if the form GMT+/-... or an RFC 822 form was seen. < if (sign != 0) < { < offset *= millisPerMinute * sign; --- > if (pos.index == start) { > // All efforts to parse a zone failed. > return -start; > } else { > // Do the final processing of 'offset', as parsed from > // GMT[+-]hh:mm or [+-]hh:mm and their several variants > // (see parseZoneOffset). > calendar.set(Calendar.DST_OFFSET, 0); > calendar.set(Calendar.ZONE_OFFSET, offset * millisPerMinute); 1153,1159d1114 < if (calendar.getTimeZone().useDaylightTime()) < { < calendar.set(Calendar.DST_OFFSET, millisPerHour); < offset -= millisPerHour; < } < calendar.set(Calendar.ZONE_OFFSET, offset); < 1164,1166d1118 < // All efforts to parse a zone failed. < return -start; < 1197a1150,1174 > /** > * Parse a zone offset string in RFC 822 format, [+-]hhmm, and several > * common variants: > * [+-]hh:mm [+-]hhmm [+-]hh > * [+-]h:mm [+-]hmm [+-]h > * where 'h' and 'm' indicate digits base 10 as determined by calling > * Character.digit(). Only offsets from -23:59 to +23:59, inclusive, are > * accepted. If the parse is successful, then the pos parameter will be > * advanced. Return the number of minutes offset, a number in the range > * -1439 to +1439, inclusive. 1440 minutes is 24 hours. > * > * @param text the text to parse > * @param pos the position at which to begin parsing. Upon exit, > * the position of the first unparsed character. If the position has > * not changed upon exit, no characters were parsed. > * @return the minutes offset, from -1439 to +1439 inclusive. > */ > private static final int parseZoneOffset(String text, ParsePosition pos) { > // Minimum string length is 2; check for this > if ((pos.index + 2) <= text.length()) { > char ch = text.charAt(pos.index); > if (ch == '-' || ch == '+') { > int start = pos.index; > int hours; > int minutes = 0; 1198a1176,1223 > int len = ++pos.index; > int value = parseInt(text, pos); > len = pos.index - len; > > if (len >= 1 && len <= 2 && value < 24) { > // Handle "h", "hh", "h:mm", and "hh:mm" > hours = value; > // Parse minutes if they are specified > if (pos.index <text.length() > && text.charAt(pos.index) == ':') { > // Handle "h:mm" and "hh:mm" > int backup = pos.index; > len = ++pos.index; > value = parseInt(text, pos); > len = pos.index - len; > if (len == 2) { > minutes = value; > } else { > // At this point we have ":", ":m", ":mmm", etc. > // we reject the whole expression, since "[+-]h:m", > // "[+-]h:mmm", etc. is ill-formed. > pos.index = start; > return 0; > } > } > } else if (len >= 3 && len <= 4 && value < 2400) { > // Handle "hmm" and "hhmm" > minutes = value; > hours = minutes / 100; > minutes = minutes % 100; > } else { > pos.index = start; > return 0; > } > > if (minutes >= 60) { > // Reject minutes out of range. E.g., "+1:90" is illegal. > pos.index = start; > return 0; > } > > minutes += 60 * hours; > return (ch == '-') ? -minutes : minutes; > } > } > return 0; > } > 1199a1225,1248 > * Parse an integer. We use this rather than a NumberFormat because we only > * want to parse digits; we don't want grouping symbols, minus signs, etc. > * This method is simple and efficient and does exactly what we want. > * > * @param text the text to be parsed > * @param pos the position at which to begin parsing. Upon exit, > * the position of the first unparsed character. If the position has > * not changed upon exit, no characters were parsed. > * @return the parsed value > */ > private static final int parseInt(String text, ParsePosition pos) { > int n = 0; > while (pos.index < text.length()) { > int d = Character.digit(text.charAt(pos.index), 10); > if (d < 0) { > break; > } > n = 10*n + d; > ++pos.index; > } > return n; > } > > /**

11-06-2004

EVALUATION This bug exists in both 1.1.8 and 1.2.1. The fix is a one-liner in SimpleDateFormat.java. Change the line in subParse() handling the zone parsing from: if( text.charAt(pos.index) == ':' ) { to: if( pos.index < text.length() && text.charAt(pos.index) == ':' ) { I believe this is a regression vs. 1.1.x. alan.liu@eng 1999-03-04 -------------------------------------------------------------------------------- From alan.liu@eng 1999-03-04: Bug 4212077 is a new regression on 1.1.8 and 1.2.1. It has to do with how time zone strings like "GMT-8:00" are parsed. The immediate problem is that strings without the colon, e.g., "GMT-0800", cause a string indexing exception. It should instead parse the string successfully. There is a quick one-line fix, but looking over the code, I realize that it may contain other problems. For example, I think the following string parses the current code, but it shouldn't: GMT--01:00 (that's two '-' signs in a row). I propose that we fix this so that the following kinds of string work, and nothing else: GMT+08:00 GMT-08:00 GMT+8:00 GMT+0800 GMT-800 GMT+12 GMT-4 +0800 -0800 +12 -4 Offsets from -23:59 to -23:59 would be legal; others wouldn't work. The last four notations are defined by RFC 822, and are already supported. -------------------------------------------------------------------------------- From alan.liu@eng 1999-03-08: It is indeed a regression; the bug does NOT occur in the following builds. 1.1.2 through 1.1.5 1.2beta1 Based on this, I believe this bug should be fixed in both 1.1.8 and 1.2.1. The fix should consist of an explicit tightening up of the specification, followed by an update to the code to follow that specification. I suggest we support the following notations. The following are examples; in all cases, "GMT-" may be substituted for "GMT+", and other offsets would of course be recognized. notation meaning --------- -------- GMT+11:00 (+11:00) GMT+1100 (+11:00) GMT+800 (+08:00) GMT+11 (+11:00) +11:00 (+11:00) +1100 (+11:00) +800 (+08:00) +11 (+11:00) -------------------------------------------------------------------------------- From norbert@eng 1999-03-12: After reading the current specification of SimpleDateFormat, I believe this proposed "tightening up" of the spec is in fact a watering down. The current specification of the GMT-based notation for the timezone is the following sentence: "For time zones that have no names, use strings GMT+hours:minutes or GMT-hours:minutes." This sentence may not be 100% tight (it doesn't specify how many digits are required for hours and minutes), but it's reasonably clear that the string "GMT", a sign, and two separate numbers for hours and minutes, separated by a colon, are required. Strings like "GMT+1100" or "+11" are clearly not acceptable. And I think that's good - adding leniency here means just opening another opportunity for bugs. The correct result for the code shown in the bug report is a ParseException. If JDK 1.1.5 produced that result, the bug may just barely qualify as a regression. If it didn't, this bug is not a regression. -------------------------------------------------------------------------------- From alan.liu@eng 1999-03-12: Previous releases (1.1.5, 1.2beta2) did accept strings of the form "GMT+1100" in addition to "GMT+11:00", "+1100", and variations thereof. That's the basis for calling this a regression. As you say, the javadoc does not match the code -- but I must point out, if you were to go through the javadoc and rewrite the code against it, you'd come up with entirely different code! And I'm not just talking about SimpleDateFormat -- this applies to most classes that I've come across in java.util and java.text. In general, the javadoc is a very approximate description of the code, usually written after the fact -- calling the javadoc the spec doesn't really reflect how it was written and what its content it, even though ideally the javadoc would be the spec. If we had the time and resources, a good thing to do would be to take the javadoc, remove all the descriptive text, and rewrite it, making it completely explicit and accurate. Following this, we'd have to rewrite some (a lot?) of the code to match the new spec. -------------------------------------------------------------------------------- From norbert@eng 1999-03-31: I still believe the proposed change is inappropriate because it changes the specification, and it changes it in the wrong direction. You say that the javadoc does not match the code. The correct way to look at it is that the code does not match the javadoc. If code doesn't match its specification, that's generally called a bug, and needs to be fixed. That code and specification came into life through an imperfect process, doesn't matter. However, if there's reason to believe that a substantial number of shipping applications rely on a bug, or if we believe that the actual behavior is more desirable than the specified behavior, then we may consider changing the specification in a feature release to turn the bug into a feature. So we should ask: - Which formats for the zone are actually supported in all shipping releases, from JDK 1.1 to 1.1.7B and Java 2? - Is there any evidence that shipping applications rely on any of them? - Why would it be desirable to allow more formats than the current one? What are the disadvantages? Totally independent of the specification discussion, 4212077 requires a fix that ensures that invalid input strings result in a ParseException, not a StringIndexOutOfBoundException. -------------------------------------------------------------------------------- As far as it doesn't specify to parse GMT+/-hhmm in JDK1.1.5 and before, it was just a bug. The parse() method should support GMT+/-hh:mm as specified in javadoc. +/-hhmm support should be documented since it's worth supporting the RFC822-style time zone format. masayoshi.okutsu@Eng 1999-05-24 The fix for this will change API and thus need to be addressed with more closed study as with RFC822 RFE. Given the code freeze data for Kestrel less than two week, it cannot make it now. koushi.takahashi@japan 1999-09-20 This problem will be clarified together with the TimeZone custom ID. Closing as a duplicate of 4322313. masayoshi.okutsu@Eng 2001-06-20

20-09-1999

Duplicate :	JDK-4460757 - SimpleDateFormat.parse throws StringIndexOutOfBoundsException
Duplicate :	JDK-4322313 - API:Clarification on custom tz ID in TimeZone and tz formats in SimpleDateFormat
Duplicate :	JDK-4460765 - Timezone with 1/2 hour offset parsed incorrectly