United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
JDK-4165985 : javadoc tool: Use BreakIterator to determine end of first sentence

Details
Type:
Enhancement
Submit Date:
1998-08-12
Status:
Closed
Updated Date:
2014-05-05
Project Name:
JDK
Resolved Date:
2001-08-07
Component:
tools
OS:
solaris_2.5,generic
Sub-Component:
javadoc(tool)
CPU:
sparc,generic
Priority:
P5
Resolution:
Fixed
Affected Versions:
1.2.0,1.2.2
Fixed Versions:
1.4.0 (beta2)

Related Reports
Duplicate:
Relates:
Relates:
Relates:

Sub Tasks

Description
Just prior to 1.2 Beta4 we tried using BreakIterator for the first
sentence breaks, and it had too many serious bugs for us to use,
so javadoc now special cases English and uses our old 1.1 algorithm 
that looks for a period (.) followed by white space.

Once the BreakIterator bugs are fixed, we should consider returning
to using BreakIterator for English.

The BreakIterator bugs are described in:

4140384 design bug: ambiguous "first sentence" rule
4158381 sentence BreakIterator stops too soon (submitted by Bill Shannon)
4113835 Some of BreakIterator's rules are not correct in JDK1.1.6G.

                                    

Comments
CONVERTED DATA

BugTraq+ Release Management Values

COMMIT TO FIX:
merlin-beta2

FIXED IN:
merlin-beta2

INTEGRATED IN:
merlin-beta2

VERIFIED IN:
merlin-rc1


                                     
2004-06-14
EVALUATION

The JLS first edition, section 18.3 says:

  The first sentence of each documentation comment should be 
  a summary sentence.  This sentence ends at the first period 
  that is followed by a blank, tab, or line terminator.

It doesn't care if the next letter is upper or lowercase.

(We infer that this rule was intended to apply only to languages 
 for which period is a sentence terminator.)

This is demonstrated in the following processMouseEvent method in
javax.swing.MenuElement, where the above rule would interpret the 
first sentence to be "Process a mouse event":

   Process a mouse event. event is a MouseEvent with source being the 
   receiving element's component. path is the path of the receiving 
   element in the menu hierarchy including the receiving element itself. 
   manager is the MenuSelectionManager for the menu hierarchy. 

(In my opinion, starting the second and third sentences with lowercase
words is poorly-constructed (but understandable) English.  
They should be rewritten so as not to begin with lowercase letters.  
But that aside...)

The engineer for BreakIterator is Rich Gillam (###@###.###).
Atul and I tested Rich Gillam's fixes to bugs in 4158381, and
they are fixed.  However, we discovered the BreakIterator follows 
this rule (which differs from the above rule):

   If a period is followed by white space and then a lowercase letter
   (or digit), it is not considered the end of a sentence.

See "Comments" for the exact rules that BreakIterator uses.

This rule would interpret the entire processMouseEvent paragraph 
shown above to be treated as one sentence.

For this reason, we are not using BreakIterator for English,
while we are for all other languages.  

Does it make sense to keep it this way?  Is there an upcoming change
to BreakIterator to allow it to work with Javadoc in English?

doug.kramer@Eng 1998-08-12

Neal, I'm just passing this bug on to you, for you to be aware of.
You can close it out if you feel nothing should be done, or
we could talk to the java.text people if we want to do more
research into it.

doug.kramer@Eng 2001-03-01

This should, indeed, be fixed. To help people migrate their doc comments
to the new definition, I would have javadoc emit a warning when the new
interpretation of the first sentence differs from the old interpretation
of the first sentence.

neal.gafter@Eng 2001-03-06

This RFE has been implemented.  Location of implementation:

src/share/javac/com/sun/tools/javadoc/DocEnv.java
src/share/javac/com/sun/tools/javadoc/DocLocale.java
src/share/javac/com/sun/tools/javadoc/JavadocTool.java
src/share/javac/com/sun/tools/javadoc/Start.java
src/share/javac/com/sun/tools/javadoc/resources/javadoc.properties

jamie.ho@Eng 2001-07-17
                                     
2001-07-17
PUBLIC COMMENTS

This RFE has been implemented.  Javadoc now has two modes for computing the end of the first English sentence.  The default is the old behavior but it generates a new warning when the new behavior would be different.  The new behavior uses BreakIterator and is enabled by a new command-line flag called -breakiterator.

The main differences are (1) we would now accept a sentence ending in a question mark (which some people find useful in the synopsis of a boolean-returning method), and (2) We would NOT accept a period followed by a lower case letter as ending a sentence, which would allow you to use abbreviations in the first sentence.

It's not major, but people have complained. Fixing this is ultimately a code simplification because English would be treated the same as other languages.

We would hope to make the new mode the default in the next (Tiger) version of javadoc.

jamie.ho@Eng 2001-07-17
                                     
2001-07-17



Hardware and Software, Engineered to Work Together