JDK-4165985 : javadoc tool: Use BreakIterator to determine end of first sentence
  • Type: Enhancement
  • Component: tools
  • Sub-Component: javadoc(tool)
  • Affected Version: 1.2.0,1.2.2
  • Priority: P5
  • Status: Closed
  • Resolution: Fixed
  • OS: generic,solaris_2.5
  • CPU: generic,sparc
  • Submitted: 1998-08-12
  • Updated: 2014-05-05
  • Resolved: 2001-08-07
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
1.4.0 beta2Fixed
Related Reports
Duplicate :  
Relates :  
Relates :  
Relates :  
Description
Just prior to 1.2 Beta4 we tried using BreakIterator for the first
sentence breaks, and it had too many serious bugs for us to use,
so javadoc now special cases English and uses our old 1.1 algorithm 
that looks for a period (.) followed by white space.

Once the BreakIterator bugs are fixed, we should consider returning
to using BreakIterator for English.

The BreakIterator bugs are described in:

4140384 design bug: ambiguous "first sentence" rule
4158381 sentence BreakIterator stops too soon (submitted by Bill Shannon)
4113835 Some of BreakIterator's rules are not correct in JDK1.1.6G.

Comments
CONVERTED DATA BugTraq+ Release Management Values COMMIT TO FIX: merlin-beta2 FIXED IN: merlin-beta2 INTEGRATED IN: merlin-beta2 VERIFIED IN: merlin-rc1
14-06-2004

EVALUATION The JLS first edition, section 18.3 says: The first sentence of each documentation comment should be a summary sentence. This sentence ends at the first period that is followed by a blank, tab, or line terminator. It doesn't care if the next letter is upper or lowercase. (We infer that this rule was intended to apply only to languages for which period is a sentence terminator.) This is demonstrated in the following processMouseEvent method in javax.swing.MenuElement, where the above rule would interpret the first sentence to be "Process a mouse event": Process a mouse event. event is a MouseEvent with source being the receiving element's component. path is the path of the receiving element in the menu hierarchy including the receiving element itself. manager is the MenuSelectionManager for the menu hierarchy. (In my opinion, starting the second and third sentences with lowercase words is poorly-constructed (but understandable) English. They should be rewritten so as not to begin with lowercase letters. But that aside...) The engineer for BreakIterator is Rich Gillam (###@###.###). Atul and I tested Rich Gillam's fixes to bugs in 4158381, and they are fixed. However, we discovered the BreakIterator follows this rule (which differs from the above rule): If a period is followed by white space and then a lowercase letter (or digit), it is not considered the end of a sentence. See "Comments" for the exact rules that BreakIterator uses. This rule would interpret the entire processMouseEvent paragraph shown above to be treated as one sentence. For this reason, we are not using BreakIterator for English, while we are for all other languages. Does it make sense to keep it this way? Is there an upcoming change to BreakIterator to allow it to work with Javadoc in English? doug.kramer@Eng 1998-08-12 Neal, I'm just passing this bug on to you, for you to be aware of. You can close it out if you feel nothing should be done, or we could talk to the java.text people if we want to do more research into it. doug.kramer@Eng 2001-03-01 This should, indeed, be fixed. To help people migrate their doc comments to the new definition, I would have javadoc emit a warning when the new interpretation of the first sentence differs from the old interpretation of the first sentence. neal.gafter@Eng 2001-03-06 This RFE has been implemented. Location of implementation: src/share/javac/com/sun/tools/javadoc/DocEnv.java src/share/javac/com/sun/tools/javadoc/DocLocale.java src/share/javac/com/sun/tools/javadoc/JavadocTool.java src/share/javac/com/sun/tools/javadoc/Start.java src/share/javac/com/sun/tools/javadoc/resources/javadoc.properties jamie.ho@Eng 2001-07-17
17-07-2001

PUBLIC COMMENTS This RFE has been implemented. Javadoc now has two modes for computing the end of the first English sentence. The default is the old behavior but it generates a new warning when the new behavior would be different. The new behavior uses BreakIterator and is enabled by a new command-line flag called -breakiterator. The main differences are (1) we would now accept a sentence ending in a question mark (which some people find useful in the synopsis of a boolean-returning method), and (2) We would NOT accept a period followed by a lower case letter as ending a sentence, which would allow you to use abbreviations in the first sentence. It's not major, but people have complained. Fixing this is ultimately a code simplification because English would be treated the same as other languages. We would hope to make the new mode the default in the next (Tiger) version of javadoc. jamie.ho@Eng 2001-07-17
17-07-2001