JDK-4158381 : sentence BreakIterator stops too soon (more)
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.text
  • Affected Version: 1.1.6,1.2.0
  • Priority: P3
  • Status: Closed
  • Resolution: Fixed
  • OS: solaris_2.6
  • CPU: generic,sparc
  • Submitted: 1998-07-17
  • Updated: 1999-01-15
  • Resolved: 1999-01-15
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
1.2.0 1.2fcsFixed
Related Reports
Duplicate :  
Relates :  
Relates :  
Description
javadoc now uses a sentence BreakIterator to find the end of
the first sentence to use for the summary.  Some common
constructs cause it to break too soon.  For example:

CASE #1 --------------------------------------------------
import java.text.BreakIterator;

public class SentenceBug {
    public static void main(String[] argv) {
        BreakIterator bi = BreakIterator.getSentenceInstance();
        String test = "Test <code>Flags.Flag</code> class.  Another test.";
        bi.setText(test);
        System.out.println(test.substring(bi.first(), bi.next()));
        System.exit(0);
    }
}

This prints "Test <code>Flags."
A period followed by a capital letter should not be a sentence boundary;
there should be whitespace between them.

CASE #2 --------------------------------------------------
import java.text.BreakIterator;

public class SentenceBug2 {
    public static void main(String[] argv) {
        BreakIterator bi = BreakIterator.getSentenceInstance();
        String test = "<P>Provides a set of &quot;lightweight&quot; (all-Java<FONT SIZE=\"-2\"><SUP>TM</SUP></FONT> language) components that, to the maximum degree possible, work the same on all platforms. Another test.";
        bi.setText(test);
        System.out.println(test.substring(bi.first(), bi.next()));
        System.exit(0);
    }
}

This prints:

   <P>Provides a set of &quot;lightweight&quot; (all-Java<FONT SIZE="-2"

Notice that it stops between the double quote (") and greater-than symbol (>). 
There is no period, exclamation mark or question mark anywhere near.    
                  
----
For sample files, see /java/web/docs/bugs/javadoc-bugs/bug4158381-breakiterator
doug.kramer@Eng 1998-09-17

Comments
CONVERTED DATA BugTraq+ Release Management Values COMMIT TO FIX: 1.2fcs FIXED IN: 1.2fcs INTEGRATED IN: 1.2fcs
14-06-2004

SUGGESTED FIX A tar file with the diffs necessary to fix thesr problems (along with a unit test) is included as an attachment to this bug report. richard.gillam@eng 1998-07-30
30-07-1998

EVALUATION Added case #2 doug.kramer@Eng 1998-07-29 Both problems are due to errors in the state tables, and both are relatively easy to fix. This bug was fixed in a putback on 8/10/1998 by Rich Gillam laura.werner@eng 1998-09-15
29-07-1998