JDK-8260738 : (fs) Path should have a method to obtain the filename extension
  • Type: CSR
  • Component: core-libs
  • Sub-Component: java.nio
  • Priority: P3
  • Status: Closed
  • Resolution: Approved
  • Fix Versions: 20
  • Submitted: 2021-02-01
  • Updated: 2022-12-07
  • Resolved: 2022-11-01
Related Reports
CSR :  
Relates :  
Description
Summary
-------

Add a new method `getExtension()` to `java.nio.file.Path`.

Problem
-------

No notion of file name extension is codified in the `java.nio` specification. Obtaining the file name extension from a `Path`, usually the portion of the last element after the last period character ('.', U+002E FULL STOP) optionally including the period character itself, currently devolves to a "manual" search of the `String` version of the file name.

Solution
--------

To `java.nio.file.Path` add the method
```
default String getExtension()
```
to return the extension of the filename, the last element of the `Path`;

Note that this method deals with `String`, whereas `Path` objects maintain data in an internal platform-specific format. `Path` creation (via `Path.of`) and conversion to `String` (via `toString`) may involve codeset conversion or Unicode normalization. These conversions are potentially lossy, and they might produce surprising or unexpected results. (An example of Unicode normalization involves accented characters, which have multiple valid representations consisting of different sequences of code points.) For this reason, many `Path` operations, such as `getName(int)` or `resolve(Path)`, operate on the platform-specific representation and avoid converting to and from `String`. 

The lossiness problems can occur in the current `Path` API, independent of the API addition proposed here. They occur when the application operates on the path data as a `String`. For example, the following expression returns true on most platforms:
```
Path.of("avant.apres").toString().endsWith("apres")
```
However, if an accent is added by inserting U+0300 COMBINING GRAVE ACCENT in the right place,
```
Path.of("avant.apre\u0300s").toString().endsWith("apre\u0300s")
```
this exposes the application to platform-specific behavior. In particular, this expression returns `true` on Linux and `false` on macOS.

Despite the potential lossiness, the API proposed here converts `Path` data to a Java `String` and enables further operations in the String domain. The reason is that there is no `Path` abstraction for operating on a *fragment* of a `Path` element in the platform-specific domain. Creating an abstraction that operates on fragments in the platform-specific domain is unwarranted. It would be quite complex (essentially re-creating a string-like API) in order to support most operations. In addition, there is no cross-platform notion of a filename extension. On some platforms, the file extension is purely a naming convention and is thus fundamentally a string operation.

Existing code that deals with filename extensions already likely does these operations in the `String` domain and so is already exposed to such lossiness. For example, to find the filename extension, such code would convert the last `Path` element to a `String`, call `lastIndexOf(".")`, and use `substring()` and string concatenation to manipulate portions of the name. The API added here does not attempt to solve the lossiness problem, nor is it feasible for it to attempt to solve it. This API is intended only to make `String`-based `Path` manipulation more convenient. 


Specification
-------------

    --- a/src/java.base/share/classes/java/nio/file/Path.java
    +++ b/src/java.base/share/classes/java/nio/file/Path.java
    @@ -49,7 +50,7 @@
      * file system. {@code Path} defines the {@link #getFileName() getFileName},
      * {@link #getParent getParent}, {@link #getRoot getRoot}, and {@link #subpath
      * subpath} methods to access the path components or a subsequence of its name
    - * elements.
    + * elements, and {@link #getExtension() getExtension} to obtain its extension.
      *
      * <p> In addition to accessing the components of a path, a {@code Path} also
      * defines the {@link #resolve(Path) resolve} and {@link #resolveSibling(Path)
    @@ -248,6 +249,63 @@ public static Path of(URI uri) {
          */
         Path getFileName();
     
    +    /**
    +     * Returns the file extension of this path's file name as a {@code String}.
    +     * The extension is derived from this {@code Path} by obtaining the
    +     * {@linkplain #getFileName file name element}, deriving its {@linkplain
    +     * #toString string representation}, and then extracting a substring
    +     * determined by the position of a period character ('.', U+002E FULL STOP)
    +     * within the file name string. If the file name element is {@code null},
    +     * or if the file name string does not contain a period character, or if
    +     * the only period in the file name string is its first character, then
    +     * the extension is {@code null}. Otherwise, the extension is the substring
    +     * after the last period in the file name string. If this last period is
    +     * also the last character in the file name string, then the  extension is
    +     * {@linkplain String#isEmpty empty}.
    +     *
    +     * @implSpec
    +     * The default implementation is equivalent for this path to:
    +     * <pre>{@code
    +     * int lastPeriod = fileName.lastIndexOf('.');
    +     * if (lastPeriod <= 0)
    +     *     return null;
    +     * return (lastPeriod == fileName.length() - 1)
    +     *     ? ""
    +     *     : fileName.substring(lastPeriod + 1);
    +     * }</pre>
    +     *
    +     * @return  the file name extension of this path, which might be the
    +     *          empty string, or {@code null} if no extension is found
    +     *
    +     * @since 20
    +     */
    +    default String getExtension() {}
Comments
Moving to Approved.
01-11-2022

This CSR will not be finalized prior to reaching consensus. If need be, the Optional<String> return values could be replaced with Strings: Optional.empty() -> null Optional.of("") -> "" Optional.of("someString") -> "someString"
06-05-2022

I agree with Alan. This CSR should not be approved in its present form.
06-05-2022

The updated CSR introduces new methods that are very inconsistent with the existing API. I think this feature will require more discussion and agreement before it can be proposed.
06-05-2022

Please hold off submitting this CSR as there are still API options to explore. There are several inconsistencies with the existing APIs that need further analysis too. Also as currently specified, it is impossible to override in a useful way so we need to explore that too.
02-02-2021

The "extension of the file name" -> "file name extension" suggestion could also be applied to the `@return`.
01-02-2021

All above changes committed.
01-02-2021

@since: thanks, that was lame!
01-02-2021

If this is a new method, it needs @since tag.
01-02-2021

The first sentence is too long when used in the context of the method summary. Split at the first ",".
01-02-2021

IMO, something like "Returns the file[name] extension of this path..." would read more concisely than, "Returns the extension of the file name of this path...".
01-02-2021