JDK-8200437 : String::isBlank
  • Type: CSR
  • Component: core-libs
  • Sub-Component: java.lang
  • Priority: P3
  • Status: Closed
  • Resolution: Approved
  • Fix Versions: 11
  • Submitted: 2018-03-29
  • Updated: 2018-05-09
  • Resolved: 2018-05-02
Related Reports
CSR :  
Description
Summary
-------

Add a new instance method to `java.lang.String` that returns true
if the string is empty or contains only white space, where white space
is defined as any codepoint that returns true when passed to
Character#isWhitespace(int).

Problem
-------

A traditional solution requires the construction of a new string by
stripping the instance string of leading or trailing white space.

```
Ex 1.
    // true iff is empty or only contains white space
    boolean blank = string.trim().isEmpty();
    
```

In Java 8, the in place solution is awkward in its complexity.

```
Ex. 2
    // true iff is empty or only contains white space
    boolean blank = string.codePoints().allMatch(Character::isWhitespace);
```

Solution
--------

The introduction of a new method that avoids any object construction
and reduces use complexity.

```
Ex 3.
    // true iff is empty or only contains white space
    boolean blank = string.isBlank();

```

Specification
-------------

```
    /**
     * Returns {@code true} if the string is empty or contains only
     * {@link Character#isWhitespace(int) white space} codepoints,
     * otherwise {@code false}.
     *
     * @return {@code true} if the string is empty or contains only
     *         {@link Character#isWhitespace(int) white space} codepoints,
     *         otherwise {@code false}
     *
     * @see Character#isWhitespace(int)
     *
     * @since 11
     */
    public boolean isBlank() {
```

Comments
Moving amended request to Approved.
02-05-2018

Modified accordingly.
02-05-2018

I think this should be specified in terms of Character.isWhitespace(int), that is, by looking at code points. It may be that, in the current version of Unicode, there are no supplementary characters that are whitespace. The implementation might choose to take advantage of this. However, I don't think this merits a mention, not even in an implNote. Future versions of Unicode might introduce a whitespace supplementary character. It's a maintenance choice as to whether this code would need to be updated at that point, or whether it should be written to accommodate the general case in order to handle potential future additions to Unicode. It seems to me that if a shortcut is taken now, it's like to introduce a bug in the future if a supplementary whitespace character were to be introduced. Example 1 isn't particularly compelling. It's possible in Java 8 to determine whether a string is all whitespace using the following snippet: str.codePoints().allMatch(Character::isWhitespace); (Note, this uses the codepoint overload of isWhitespace.) That said, it's still useful to have an isBlank() method, as the above is a bit non-obvious and a dedicated isBlank() method is more concise and can probably be written more efficiently. Aside on terminology: the character range that fits into 16 bits is referred to as the Basic Multilingual Plane (BMP), and characters that are not in the BMP are referred to as "supplementary characters."
02-05-2018

There are at least two basic ways of interpreting a java.lang.String value: 1) A sequence of 16-bit char values 2) A sequence of Unicode code points (#include long story of "16-bits is all we'll every need" for characters, etc.) Some methods on String have variants working on both interpretations. The second interpretation seems more conceptually correct, but there are use-cases for both. I'm not familiar enough with Unicode to guess whether or not white space characters will be defined outside of the 16-bit range. Even if white space characters are not defined outside of the 16-bit range, the isBlank method could still be defined as operating over code points as opposed to 16-bit values.
02-05-2018

Should there be some explicit discussion in the spec over whether or blank-ness is determined over char values or code points? The spec uses a link to a code point capable char method, but that is arguably too subtle. Marking the request as pended until this point is sorted out.
02-05-2018

Reviewed. -Sundar
27-04-2018

Reviewed -Sundar
29-03-2018