JDK-8195867 : Methods for comparing CharSequence, StringBuilder, and StringBuffer
  • Type: CSR
  • Component: core-libs
  • Sub-Component: java.lang
  • Priority: P3
  • Status: Closed
  • Resolution: Approved
  • Fix Versions: 11
  • Submitted: 2018-01-22
  • Updated: 2021-10-19
  • Resolved: 2018-02-16
Related Reports
CSR :  
Description
Summary
-------

Add a static method to CharSequence to allow comparison between 
two CharSequence implementations; Add support for the Comparable 
interface to StringBuilder and StringBuffer.

Problem
-------

A CharSequence is semantically comparable. It is expected therefore 
its implementations, specifically, StringBuffer and StringBuilder are comparable.
String is already comparable.

Solution
--------

1. Add a static method to CharSequence

    It would be desirable for CharSequence to implement Comparable. 
Unfortunately since String  that implements CharSequence already 
implements Comparable<String>, it is not feasible to change CharSequence 
to implement Comparable<CharSequence>. For more detailed discussion, 
please refer to the comment section of the original enhancement request.

    The alternative solution therefore is to introduce a static method to CharSequence:

        static int compare(CharSequence cs1, CharSequence cs2) 

    This compare method will allow the comparison between CharSequence 
implementations such as String, StringBuilder and StringBuffer.<br><br>


2. Implement Comparable for StringBuilder and StringBuffer

    The StringBuilder and StringBuffer shall implement Comparable&lt;StringBuilder> 
and Comparable&lt;StringBuffer> respectively, the same way as String implements 
Comparable&lt;String>. The addition will extend the functionality to allow a 
StringBuilder to StringBuilder, or StringBuffer to StringBuffer comparison.



Specification
-------------

* CharSequence: change in general description (the wording changes are in bold)

>`- ` This interface does not refine the general contracts of the equals and hashCode methods. The result of __comparing__ two objects that implement CharSequence is therefore, in general, undefined. Each object may be implemented by a different class, and there is no guarantee that each class will be capable of testing its instances for equality with those of the other. It is therefore inappropriate to use arbitrary CharSequence instances as elements in a set or as keys in a map 

>`+ ` This interface does not refine the general contracts of the equals and hashCode methods. The result of **testing** two objects that implement CharSequence **for equality** is therefore, in general, undefined. Each object may be implemented by a different class, and there is no guarantee that each class will be capable of testing its instances for equality with those of the other. It is therefore inappropriate to use arbitrary CharSequence instances as elements in a set or as keys in a map 

* CharSequence: add a static method

        static int compare(CharSequence cs1, CharSequence cs2) 

>Compares two `CharSequence` instances lexicographically. Returns a negative value, zero, or a positive value if the first sequence is lexicographically less than, equal to, or greater than the second, respectively.

>The lexicographical ordering of `CharSequence` is defined as follows. Consider a `CharSequence` `cs` of length `len` to be a sequence of char values, `cs[0]` to `cs[len-1]`. Suppose `k` is the lowest index at which the corresponding char values from each sequence differ. The lexicographic ordering of the sequences is determined by a numeric comparison of the char values `cs1[k]` with `cs2[k]`. If there is no such index `k`, the shorter sequence is considered lexicographically less than the other. If the sequences have the same length, the sequences are considered lexicographically equal.

     
>Parameters:<br>

>    cs1 - the first `CharSequence`. <br>
>    cs2 - the second `CharSequence`. 

>Returns:

>    the value 0 if the two `CharSequence` are equal; a negative integer if the first `CharSequence` is lexicographically less than the second; and a positive integer if the first `CharSequence` is lexicographically greater than the second. 

>Since:

>    `11 `
<br>

* StringBuilder

    `public final class StringBuilder
    extends AbstractStringBuilder
    implements java.io.Serializable, Comparable<StringBuilder>, CharSequence`

>API Note:

>   `StringBuilder` implements `Comparable` but does not override `equals`. Thus, the natural ordering of `StringBuilder` is inconsistent with equals. Care should be exercised if `StringBuilder` objects are used as keys in a `SortedMap` or elements in a `SortedSet`. See `Comparable`, `SortedMap`, or `SortedSet` for more information. 

        public int compareTo​(StringBuilder another)

>Compares two `StringBuilder` instances lexicographically. This method follows the same rules for lexicographical comparison as defined in the `CharSequence.compare(this, another)` method.

> For finer-grained, locale-sensitive String comparison, refer to `Collator`. 

>Specified by:

>    `compareTo` in interface `Comparable<StringBuilder> `

>Parameters:

>    another - the `StringBuilder` to be compared with. 

>Returns:

>    the value 0 if this `StringBuilder` contains the same character sequence as that of the argument `StringBuilder`; a negative integer if this `StringBuilder` is lexicographically less than the `StringBuilder` argument; and a positive integer if this `StringBuilder` is lexicographically greater than the `StringBuilder` argument. 

>Since:

>    11 

* StringBuffer:  implements Comparable&lt;StringBuffer>

   `public final class StringBuffer
    extends AbstractStringBuilder
    implements java.io.Serializable, Comparable<StringBuffer>, CharSequence`

>API Note:

>   `StringBuffer` implements `Comparable` but does not override `equals`. Thus, the natural ordering of `StringBuffer` is inconsistent with equals. Care should be exercised if `StringBuffer` objects are used as keys in a `SortedMap` or elements in a `SortedSet`. See `Comparable`, `SortedMap`, or `SortedSet` for more information. 

        public int compareTo​(`StringBuffer` another)

>Compares two `StringBuffer` instances lexicographically. This method follows the same rules for lexicographical comparison as defined in the `CharSequence.compare(this, another)` method.

> For finer-grained, locale-sensitive String comparison, refer to `Collator`. 
     
>Specified by:

>    `compareTo` in interface `Comparable<StringBuffer> `

>Implementation Note:

>    This method synchronizes on `this`, the current object, but not `StringBuffer another` with which `this StringBuffer` is compared. 

>Parameters:

>    another - the `StringBuffer` to be compared with. 

>Returns:

>    the value 0 if this `StringBuffer` contains the same character sequence as that of the argument `StringBuffer`; a negative integer if this `StringBuffer` is lexicographically less than the `StringBuffer` argument; and a positive integer if this `StringBuffer` is lexicographically greater than the `StringBuffer` argument. 

>Since:

>    11 

<br><br>
specdiffs attached. Below is a convenient link:<br>
http://cr.openjdk.java.net/~joehw/jdk11/8137326/specdiff/overview-summary.html
<br><br>

Comments
Moving to re-approve the request with apiNote's added.
16-02-2018

Refer to the core-libs-dev review thread [1], an @apiNote is added to the class documentation for both StringBuilder and StringBuffer. [1] http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-February/051467.html
16-02-2018

Moving amended request to Approved (on the assumption that the previous comment was implicitly re-finalizing the request).
08-02-2018

The spec is updated with the new wording and added reference to Collator similar to that of the String.compareTo method. New specdiff is also attached.
08-02-2018

I think the proposed wording is an improvement. If one has some knowledge of the rich complexity of sequences of characters and Unicode, there are possibilities about different ways little-s strings of characters can be compared. The lexicographical ordering proposed is well-defined, but does not admit the existence of other reasonable possibilities. I'd prefer to see an informative, brief, reference to the other ordering possibilities somewhere in the spec updates for this changeset.
07-02-2018

Indeed, that's intended meaning. Referring to the definition of lexicographical ordering instead of stating it behaves the same does look like it's clearer. How about changing the spec for the StringBuilder/Buffer.compareTo methods to the following: Compares two StringBuilder/Buffer instances lexicographically. This method follows the same rules for lexicographical comparison as defined in the {@link CharSequence.compare} method. String.compareTo refers to the Collator API for finer-grained comparison. We can't do that for StringBuffer/Builder yet. But it might as well become a request for a locale sensitive comparison. Then if we decide to add support for a CharSequence comparison to the Collator API, we could re-visit these methods and add such a reference.
07-02-2018

The new compare methods allow more implementation flexibility than String.compareTo, which is specifies that it returns (this.length()-anotherString.length()) if the strings are equal for the entire shared length, etc. I'm just noting a difference between the new methods and String.compareTo; I'm not necessarily recommending that this (over) specified case be propagated to the new methods. What is intended by the statement "This method behaves as if {@linkplain java.lang.CharSequence#compare(java.lang.CharSequence, java.lang.CharSequence) CharSequence.compare(this, another)} had been called." is not entirely clear. I believe what is intended is that "This [compareTo method in StringBuffer or StringBuilder] uses the definition of lexicographical comparison from {@link CharSequence.compare}." This kind of phrasing would clearly *not* imply that operational aspects of directly calling CharSequence.compare were intended. In other words, the StringBuffer and StringBuilder methods could have different synchronization behavior and be sanctioned to use class-internal methods. If developers need more nuanced comparisons, String.compareTo contains references to APIs which handle that. Perhaps these compareTo methods should do that as well.
07-02-2018

When defining comparison for CharSequence, we were purposefully avoiding touching or changing the 'equality' because in the description of the CharSequence class it was stated that essentially "comparing two CharSequences is undefined". The sense of "comparing" here is not the same as the compare() method, but instead it is used to mean "testing for equality" as it relates to equals() and hashCode(). We thus changed the wording of that statement to be clear that "testing" two objects that implement CharSequence "for equality" is undefined. The equality could have been specified as a lexicographic comparison. Unfortunately, for compatibility, we couldn't change the above specification for equals() and hashCode(). All of the new methods are specified to compare Java char values so that they work fine with any char values, even if they are undefined in Unicode or invalid sequences. This is detailed in the 2nd paragraph of the compare() method. The StringBuilder/Buffer's reference to CharSequence.compare is referring to the above definition, that is, how the lexicographical ordering is calculated (by Java char values). The purpose is to have one single place for such a definition. It did not mean that these methods are required to be implemented by calling CharSequence.compare (and therefore foregoes using StringBuilder fields). Refer to the webrev below (currently in open review): webrev: http://cr.openjdk.java.net/~joehw/jdk11/8137326/webrev/ The StringBuilder/Buffer's implementation is more efficient than CharSequence.compare by avoiding charAt calls for each char. For the concurrency situation of StringBuffer, as a result of the open review, we've add an implNote, similar to statements in methods such as StringBuffer::append(CharSequence). Considering StringBuffer's ill-fated synchronization, the implNote shall be sufficient. In practical usages, there's scarcely any case where StringBuffer would be better than StringBuilder. In a sense, we could even consider about deprecating StringBuffer.
05-02-2018

A few comments. For API additions around comparing the equality of CharSequence, I was expecting some more explicit discussion (or disavowing) of code points, logically equivalent sequences of characters, invalid sequences of characters, etc. The given behavior is well-defined, but it may or may not be what one expects. Are StringBuffer.compareTo and StringBuilder.compareTo consistent with equals? The note "This method behaves as if CharSequence.compare(this, another) had been called. " in StringBuilder may be undesirable since calling CharSequence.compare foregoes directly using StringBuilder fields, etc. Does there need to be a broader discussion of concurrent modification complications? Pending the request until these concerns are addressed.
02-02-2018