JDK-8304246 : Compiler Implementation for Unnamed patterns and variables (Preview)
  • Type: CSR
  • Component: tools
  • Sub-Component: javac
  • Priority: P4
  • Status: Closed
  • Resolution: Approved
  • Fix Versions: 21
  • Submitted: 2023-03-15
  • Updated: 2023-09-27
  • Resolved: 2023-05-09
Related Reports
CSR :  
Relates :  
Relates :  
Description
Summary
-------

Enhance the Java language with _unnamed patterns_, which match a record
component without stating the component's name or type, and with _unnamed
variables_, which can be initialized but not used. Both are denoted with an
underscore: `_`. 

Problem
-------

Java developers use [record patterns](https://openjdk.org/jeps/432) to
disaggregate a record instance into its components. In the following code, one
part of a program creates a `ColoredPoint` instance, while another part of the
program uses pattern matching with `instanceof` to test whether a variable is a
`ColoredPoint`, and extract its two components if so:

```
record Point(int x, int y) {}
enum Color { RED, GREEN, BLUE }
record ColoredPoint(Point p, Color c) {}

... new ColoredPoint(new Point(3,4), Color.GREEN) ...

if (r instanceof ColoredPoint(Point p, Color c)) {
    ... p.x() ... p.y() ...
}
```

The code above needs only `p` in the `if` block, not `c`, however today
developers have to spell out all the components of a record class every time
they perform pattern matching. Furthermore, it is not visually clear that the
`Color` component is irrelevant. This is especially evident when record patterns
are _nested_ to extract data within components, such as:

```
if (r instanceof ColoredPoint(Point(int x, int y), Color c)) {
    ... x ... y ...
}
```

As a result omitting unnecessary components such as `Color c` in both of the
previous examples would be desirable for clearer code.

In some other occasions, developers may not need to initialize any pattern
variables during pattern matching but they will need to explore the shape of the
structure at runtime. As a highly simplified example, consider the following
`Box` and `Ball` classes, and a `switch` that explores the content of a `Box`:

```
record Box<T extends Ball>(T content) {}

sealed abstract class Ball permits RedBall, BlueBall, GreenBall {}
final  class RedBall   extends Ball {}
final  class BlueBall  extends Ball {}
final  class GreenBall extends Ball {}

Box<? extends Ball> b = ...
switch (b) {
    case Box(RedBall   red)   -> processBox(b);
    case Box(BlueBall  blue)  -> processBox(b);
    case Box(GreenBall green) -> stopProcessing();
}
```

Since the variables are unused it would be ideal if the developer could elide
their names, while keeping the explicit type for shape analysis reasons.

Furthermore, if the `switch` was hypothetically refactored to group the first two patterns in
one `case` (something that is not allowed in Pattern Matching for Switch):

```
case Box(RedBall red), Box(BlueBall blue) -> processBox(b);
```

then it would be erroneous to name the components: Neither of the names is
usable on the right-hand side because either of the patterns on the left-hand
side could have matched. Since the names are unusable it would be ideal to elide
them.

Turning to traditional imperative code, most developers will have encountered
the situation of having to declare a variable that they did not intend to use.
This typically occurs when the side effect of a statement is more important than
its result. For example, the following code uses an enhanced-`for` statement to
step through a collection, calculating `total` as a side effect, without using
the loop variable `order`:

```
int total = 0;
for (Order order : orders) {
    if (total < LIMIT) { 
        ... total++ ...
    }
}
```

The prominence of `order`'s declaration is unfortunate given that `order` is not
used. Here is another example where the side effect of a expression is more
important than its result, leading to an unused variable. The following code
dequeues data but only needs two out of every three elements:

```
Queue<Integer> q = ... // x1, y1, z1, x2, y2, z2 .. 
while (q.size()>=3) {
   int x = q.remove();
   int y = q.remove();
   int z = q.remove(); // z is unused
    ... new Point(x, y) ...
}
```

The third call to `remove()` has the desired side effect -- dequeuing an element
-- regardless of whether its result is assigned to a variable, so the
declaration of `z` could be elided--while satisfying the desire to show that
`remove` indeed could returns a value.

Unused variables occur frequently in two other kinds of statement that focus on
side effects:

- The `try`-with-resources statement is always used for its side effect: the
  automatic closing of resources. For example the following code acquires and
  (automatically) releases a context; the name `acquiredContext` is merely
  clutter:

```
try (var acquiredContext = ScopedContext.acquire()) {
    ... acquiredContext not used ...
}
```

- Exceptions are the ultimate side effect, and handling one often gives rise to
  an unused variable. For example, most Java developers will have written
  `catch` blocks as shown below, where the name of the exception parameter is
  irrelevant:

```
String s = ...;
try { 
    int i = Integer.parseInt(s);
    ... i ...
} catch (NumberFormatException ex) { 
    System.out.println("Bad number: " + s);
}
```

Even code without side effects is sometimes forced to declare unused variables.
For example, the following code generates a map where each key mapped to the
same placeholder value; since the lambda parameter `v` is not used, its name is
irrelevant:

```
...stream.collect(Collectors.toMap(String::toUpperCase, v -> "NODATA"));
```

In all these scenarios where variables are unused and their names are
irrelevant, it would be ideal if developers could declare variables with no
name. 


Solution
--------

The Java language is enhanced as follows:

- Allow the underscore `_` to denote an _unnamed pattern_ in place of a whole type pattern or record pattern.
- Allow the underscore `_` to denote an _unnamed pattern variable_ in a type pattern.
- Allow the underscore `_` to denote an _unnamed variable_ when either the local
  variable in a local variable declaration statement, or an exception parameter
  in a catch clause, or a lambda parameter in a lambda expression, are unused.
  The following kinds of declaration can introduce either a named variable
  (denoted by an identifier) or an unnamed variable (denoted by an underscore):

    - a local variable declaration statement in a block (JLS 14.4.2)
    - a resource specification of a try-with-resources statement (JLS 14.20.3)
    - the header of a basic for statement (JLS 14.14.1)
    - the header of an enhanced for loop (JLS 14.14.2)
    - an exception parameter of a catch block (JLS 14.20)
    - a formal parameter of a lambda expression (JLS 15.27.1)
- Allow unnamed pattern variables in a switch that needs to execute the same action for multiple cases.
  The grammar of switch labels is enhanced to allow multiple patterns. 
  Those are semantically correct only when unnamed pattern variables are used in all pattern cases and no binding variables are introduced.
- Neither the unnamed pattern nor `var _` may be used at the top level of a
  pattern: both `... instanceof _` and `... instanceof var _` are prohibited, as are
  `case _` and `case var _`.
- The linter for TWR + underscore needs to mute the lint warning for `_` not being referenced. This is not applicable anymore for unnamed variables.
- Update the javax.lang.model for unnamed variables. Tracked in a separate CSR: [8307577: Implementation for javax.lang.model for unnamed variables (Preview)](https://bugs.openjdk.org/browse/JDK-8307577).

Specification
-------------

The updated JLS draft for unnamed patterns and variables is attached as jep443-20230322.zip.
Also in https://cr.openjdk.org/~abimpoudis/unnamed/jep443-20230322/specs/unnamed-jls.html

The proposed API enhancements are attached as specdiff.preliminary.00.zip. Those will mostly reflect the introduction of a new tree kind to support an `AnyPatternTree`. Changes in javax.lang.model are included in [8307577: Implementation for javax.lang.model for unnamed variables (Preview)](https://bugs.openjdk.org/browse/JDK-8307577).

The changes to the specification and API are a subject of change until the CSR is finalized.


Comments
Moving to Approved.
09-05-2023

All comments addressed. Thank you!
08-05-2023

To complete the CSR work here please: * Create a CSR for JDK-8307007: "javax.lang.model for unnamed variables" to cover those changes * Update this CSR with any spec updates made during code review. * Have one or more engineers review the above CSRs before they are Finalized for the second phase of review.
05-05-2023

[~darcy] what value would be best to return for the name of unnamed elements? The symbol `_` (current behavior) or `null`? If we change the behavior for local variables, thus defining the behavior of `getSimpleName()`, we would need to adjust the other places too, such as binding pattern variable names, etc. If we keep `_` (current behavior), we may not need additional changes in the model. Correct?
19-04-2023

Moving to Provisional,not Approved. The interface javax.lang.model.element.VariableElement may also need updates to describe the behavior on unnamed variables.
06-04-2023