Bug ID: JDK-8220715 JEP 358: Helpful NullPointerExceptions

JDK-8220715 : JEP 358: Helpful NullPointerExceptions

Type: JEP
Component: hotspot
Sub-Component: runtime

Priority: P4
Status: Closed
Resolution: Delivered
Fix Versions: 14

Submitted: 2019-03-15
Updated: 2021-12-22
Resolved: 2019-12-12

Related Reports

Relates :	JDK-8233014 - Enable ShowCodeDetailsInExceptionMessages by default
Relates :	JDK-8234223 - Release Note: Detailed Message in NullPointerExceptions
Relates :	JDK-8233268 - Improve integration of Objects.requireNonNull and JEP 358 Helpful NPE
Relates :	JDK-8218628 - Add detailed message to NullPointerException describing what is null.

Description

Summary
-------

Improve the usability of `NullPointerException`s generated by the JVM by describing precisely which variable was `null`.


Goals
-----

- Offer helpful information to developers and support staff about the premature termination of a program.

- Improve program understanding by more clearly associating a dynamic exception with static program code.

- Reduce the [confusion and concern](https://stackoverflow.com/questions/218384/what-is-a-nullpointerexception-and-how-do-i-fix-it) that new developers often have about `NullPointerException`s.


Non-Goals
---------

- It is not a goal to track down the ultimate producer of a `null` reference, only the unlucky consumer.

- It is not a goal to throw more `NullPointerException`s, or to throw them at a different point in time.


Motivation
----------

Every Java developer has encountered `NullPointerException`s (NPEs). Since NPEs can occur almost anywhere in a program, it is generally impractical to attempt to catch and recover from them. As a result, developers rely on the JVM to pinpoint the source of an NPE when it actually occurs. For example, suppose an NPE occurs in this code:

    a.i = 99;

The JVM will print out the method, filename, and line number that caused the NPE:

    Exception in thread "main" java.lang.NullPointerException
        at Prog.main(Prog.java:5)

Using the message, which is typically included in a bug report, the developer can locate `a.i = 99;` and infer that `a` must have been null. However, for more complex code, it is impossible to decide which variable was null without using a debugger. Suppose an NPE occurs in this code:

    a.b.c.i = 99;

The filename and line number do not pinpoint exactly which variable was null. Was it `a` or `b` or `c`?

A similar problem occurs with array access and assignment. Suppose an NPE occurs in this code:

    a[i][j][k] = 99;

The filename and line number do not pinpoint exactly which array component was null. Was it `a` or `a[i]` or `a[i][j]`?

A single line of code may contain several access paths, each one potentially the source of an NPE. Suppose an NPE occurs in this code:

    a.i = b.j;

The filename and line number do not pinpoint the offending access path. Was `a` null, or `b`?

Finally, an NPE could stem from a method call. Suppose an NPE occurs in this code:

    x().y().i = 99;

The filename and line number do not pinpoint which method call returned null. Was it `x()` or `y()`?

[Various strategies](https://stackoverflow.com/questions/410890/how-to-trace-a-nullpointerexception-in-a-chain-of-getters?rq=1) can mitigate the lack of accurate pinpointing by the JVM. For example, a developer faced with an NPE can break up the access paths by assigning to intermediate local variables. (The `var` keyword may be [helpful](https://openjdk.java.net/jeps/286) here.) The result will be a more accurate report of the `null` variable in the JVM's message, but reformatting code to track down an exception is undesirable. In any case, most NPEs occur in production environments, where the support engineer who observes the NPE is many steps removed from the developer whose code caused it.

The entire Java ecosystem would benefit if the JVM could give the information needed to pinpoint the source of an NPE and then identify its root cause, without using extra tooling or shuffling code around. SAP's commercial JVM has done this since 2006, to great acclaim from developers and support engineers.


Description
-----------

The JVM throws a `NullPointerException` (NPE) at the point in a program where code tries to dereference a `null` reference. By analyzing the program's bytecode instructions, the JVM will determine precisely which variable was `null`, and describe the variable (in terms of source code) with a _null-detail message_ in the NPE. The null-detail message will then be shown in the JVM's message, alongside the method, filename, and line number.

> _Note: The JVM displays an exception message on the same line as the exception type, which can result in long lines. For readability in a web browser, this JEP shows the null-detail message on a second line, after the exception type._

For example, an NPE from the assignment statement `a.i = 99;` would generate this message:

    Exception in thread "main" java.lang.NullPointerException: 
            Cannot assign field "i" because "a" is null
        at Prog.main(Prog.java:5)

If the more complex statement `a.b.c.i = 99;` throws an NPE, the message would dissect the statement and pinpoint the cause by showing the full access path which led up to the `null`:

    Exception in thread "main" java.lang.NullPointerException: 
            Cannot read field "c" because "a.b" is null
        at Prog.main(Prog.java:5)

Giving the full access path is more helpful than giving just the name of the `null` field because it helps the developer to navigate a line of complex source code, especially if the line of code uses the same name multiple times.

Similarly if the array access and assignment statement `a[i][j][k] = 99;` throws an NPE:

    Exception in thread "main" java.lang.NullPointerException:
            Cannot load from object array because "a[i][j]" is null
        at Prog.main(Prog.java:5)

Similarly if `a.i = b.j;` throws an NPE:

    Exception in thread "main" java.lang.NullPointerException:
            Cannot read field "j" because "b" is null
        at Prog.main(Prog.java:5)

In every example, the null-detail message in conjunction with the line number is sufficient to spot the expression that is `null` in the source code. Ideally, the null-detail message would show the actual source code, but this is difficult to do given the nature of the correspondence between source code and bytecode instructions (see below). In addition, when the expression involves an array access, the null-detail message is unable to show the actual array indices which led to a `null` element, such as the run-time values of `i` and `j` when `a[i][j]` is `null`. This is because the array indices were stored on the method's operand stack, which was lost when the NPE was thrown.

Only NPEs that are created and thrown directly by the JVM will include the null-detail message. NPEs that are explicitly created and/or explicitly thrown by programs running on the JVM are not subject to the bytecode analysis and null-detail message creation described below. In addition, the null-detail message is not reported for NPEs caused by code in _hidden methods_, which are special-purpose low-level methods generated and called by the JVM to, e.g., optimize string concatenation. A hidden method has no filename or line number that could help to pinpoint the source of an NPE, so printing a null-detail message would be futile.

### Computing the null-detail message

Source code such as `a.b.c.i = 99;` is compiled to several bytecode instructions. When an NPE is thrown, the JVM knows exactly which bytecode instruction in which method is responsible, and uses this information to compute the null-detail message. The message has two parts:

- The first part -- `Cannot read field "c"` -- is the _consequence_ of the NPE. It says which action could not be performed because a bytecode instruction popped a `null` reference from the operand stack.

- The second part -- `because "a.b" is null` -- is the _reason_ for the NPE. It recreates the part of the source code that pushed the `null` reference on to the operand stack.

The first part of the null-detail message is computed from the bytecode instruction that popped `null`, as detailed here in Table 1:

<table>
  <tr>
    <th>bytecode</th>  <th>1st part</th>
  </tr>	
  <tr>
    <td><code>aload</code></td>  <td>"Cannot load from &lt;element type&gt; array"</td>
  </tr>	
  <tr>
    <td><code>arraylength</code></td>  <td>"Cannot read the array length"</td>
  </tr>	
  <tr>
    <td><code>astore</code></td>  <td>"Cannot store to &lt;element type&gt; array"</td>
  </tr>	
  <tr>
    <td><code>athrow</code></td>  <td>"Cannot throw exception"</td>
  </tr>	
  <tr>
    <td><code>getfield</code></td>  <td>"Cannot read field "&lt;field name&gt;""</td>
  </tr>	
  <tr>
    <td><code>invokeinterface</code>, <code>invokespecial</code>, <code>invokevirtual</code></td>  <td>"Cannot invoke "&lt;method&gt;""</td>
  </tr>	
  <tr>
    <td><code>monitorenter</code></td>  <td>"Cannot enter synchronized block"</td>
  </tr>	
  <tr>
    <td><code>monitorexit</code></td>  <td>"Cannot exit synchronized block"</td>
  </tr>	
  <tr>
    <td><code>putfield</code></td>  <td>"Cannot assign field "&lt;field name&gt;""</td>
  </tr>	
  <tr>
    <td>Any other bytecode</td>  <td>No NPE possible, no message</td>
  </tr>	
</table>

&lt;method&gt; breaks down to &lt;class name&gt;.&lt;method name&gt;(&lt;parameter types&gt;)

The second part of the null-detail message is more complex. It identifies the access path that led to a `null` reference on the operand stack, but complex access paths involve several bytecode instructions. Given a sequence of instructions in a method, it is not obvious which previous instruction pushed the `null` reference. Accordingly, a simple data flow analysis is performed on all the method's instructions. It computes which instruction pushes to which operand stack slot, and propagates this information to the instruction which pops the slot. (The analysis is linear in the number of instructions.) Given the analysis, it is possible to step back through the instructions which make up an access path in source code. The second part of the message is assembled step-by-step, given the bytecode instruction at each step as detailed here in Table 2:

<table>
  <tr>
    <th>bytecode</th><th>2nd part</th>
  </tr>	
  <tr>
    <td><code>aconst</code>_null</td><td>"null"</td>
  </tr>	
  <tr>
    <td><code>aaload</code></td><td>compute the 2nd part for the instruction which pushed the array reference, then append "[", then compute the 2nd part for the instruction that pushed the index, then append "]"</td>
  </tr>	
  <tr>
    <td><code>iconst_*</code>, <code>bipush</code>, <code>sipush</code></td><td>the constant value</td>
  </tr>	
  <tr>
    <td><code>getfield</code></td><td>compute the 2nd part for the instruction which pushed the reference that is accessed by this getfield, then append ".&lt;field name&gt;"</td>
  </tr>	
  <tr>
    <td><code>getstatic</code></td><td>"&lt;class name&gt;.&lt;field name&gt;"</td>
  </tr>	
  <tr>
    <td><code>invokeinterface</code>, <code>invokevirtual</code>, <code>invokespecial</code>, <code>invokestatic</code></td>
	<td>If in the first step, "the return value of &lt;method&gt;", else "&lt;method&gt;"</td>
  </tr>	
  <tr>
    <td><code>iload*</code>, <code>aload*</code></td><td>For local variable 0, "this". For other local variables and parameters, the variable name if a local variable table is available, otherwise "&lt;parameter <em>i</em> &gt;" or "&lt;local <em>i</em> &gt;".</td>
  </tr>
  <tr>
    <td>Any other bytecode</td><td>Not applicable to the second part.</td>
</table>

Access paths can be made up of an arbitrary number of bytecode instructions. The null-detail message does not necessarily cover all of these. The algorithm takes only a limited number of steps back through the instructions in order to limit the complexity of the output. If the maximum number of steps is reached, placeholders such as "..." are emitted. In rare cases, stepping back over instructions is not possible, and then the null-detail message will contain only the first part ("Cannot ...", with no "because ..." explanation).

The null-detail message -- `Cannot read field "c" because "a.b" is null` -- is computed on demand, when the JVM calls `Throwable::getMessage` as part of its message. Usually, a message carried by an exception must be supplied
[when the exception object is created](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/Throwable.html#%3Cinit%3E(java.lang.String)), but the computation is expensive and may not always be needed, since many NPEs are caught and discarded by programs. The computation requires the bytecode instructions of the method which caused the NPE, and the index of the instruction which popped `null`; fortunately, the implementation of `Throwable` includes this information about the origin of the exception.

The feature can be toggled with the new boolean command-line option 
`-XX:{+|-}ShowCodeDetailsInExceptionMessages`. The option will first 
have default 'false' so that the message is not printed. It is intended to 
enable code details in exception messages by default in a later release.

### Example of computing the null-detail message

Here is an example based on the following snippet of source code:

    a().b[i][j] = 99;

The source code has the following representation in bytecode:

       5: invokestatic  #7    // Method a:()LA;
       8: getfield      #13   // Field A.b, an array
      11: iload_1             // Load local variable i, an array index
      12: aaload              // Load b[i], another array
      13: iload_2             // Load local variable j, another array index
      14: bipush        99
      16: iastore             // Store to b[i][j]

Suppose `a().b[i]` is `null`. This will cause an NPE to be thrown when storing to `b[i][j]`. The JVM will execute bytecode `16: iastore` and throw an NPE because bytecode `12: aaload` pushed `null` on to the operand stack. The null-detail message will be computed as follows:

    Cannot store to int array because "Test.a().b[i]" is null

The computation starts with the method containing the bytecode instructions, and the bytecode index 16. Since the instruction at index 16 is `iastore`, the first part of the message is "Cannot store to int array", per Table 1.

For the second part of the message, the algorithm steps back to the instruction that pushed the `null` which `iastore` was unfortunate enough to pop. Data flow analysis reveals this is `12: aaload`, an array load. Per Table 2, when an array load is responsible for a `null` array reference, we step back to the instruction which pushed the array reference (rather than the array index) on to the operand stack, `8: getfield`. Then again per Table 2, when a `getfield` is part of the access path, we step back to the instruction that pushed the reference used by `getfield`, `5: invokestatic`. We can now assemble the second part of the message:

- For `5: invokestatic`, emit "Test.a()"
- For `8: getfield`, emit ".b"
- For `12: aaload`, emit "[" and stepback  to the instruction that pushed the index, `11: iload_1`. Emit "i", the name of local variable #1, then "]".

The algorithm never steps to `13: iload_2` which pushes the index `j`, or to `14: bipush` which pushes `99`, because they are not related to the cause of the NPE.

Files with many examples of null-detail messages are attached to this JEP: output_with_debug_info.txt lists messages when class files contain a local variable table.  and output_no_debug_info.txt messages when class files do not contain a local variable table.


Alternatives
------------

### The presence of the null-detail message

The JVM could use other means to supply null-detail information, such as writing to stdout or using a tracing or logging facility. However, exceptions are the standard way to report problems on the JVM, and NPE already gives information about where the exception was raised by including the stack trace with line number information. As this information is insufficient to locate the cause, it is natural to enhance NPE by adding the missing information.

The null-detail message is switched off per default and can be enabled by command-line option `-XX:+ShowCodeDetailsInExceptionMessages`. There is no way to specify that only some NPE-raising bytecodes are of interest.  For the following reasons the null-detail message might not be wanted in all circumstances:

1. Performance. The algorithm adds some overhead to the production of a stack trace. However, this is comparable to the stack walking done when raising the exception. If an application frequently throws and prints messages so that the printing affects performance, already throwing the exception imposes an overhead that definitely should be avoided.

2. Security. The null-detail message gives insight into source code that is otherwise not easy to obtain. The message could be switched off to avoid this, but exception messages are supposed to carry information about the cause of an exception so that a problem can be fixed. If exposing this information is not acceptable, the message should not be printed by an application, but caught and discarded. This should not be handled by configuration of the JVM.

3. Compatibility. The JVM has not traditionally included a message for an NPE, and including a message now might cause problems for tools that parse stack traces in overly sensitive ways. However, Java programs have always been able to throw NPEs with messages, so tools are expected to adapt to messages on NPEs from the JVM. A related risk is that tools might depend on the precise format of the null-detail message.

We intend to enable the null-detail message by default in a future release.

### The computation of the null-detail message

Computing the null-detail message on demand has consequences for the message's availability in advanced scenarios:

1. When executing remote code via RMI, any exception thrown by the remote code is delivered to the caller via serialization. Serializing an exception object does not preserve its internal data structures, so if remote code throws and thus serializes NPE, the eventual deserialization will produce an NPE for which no null-detail message can be computed on demand.

2. If the bytecode instructions of a method change while a program is running, such as due to redefinition of the method by a Java agent using JVMTI, then the original instructions are preserved for a while but can be discarded during a GC cycle. As the original instructions are required to compute the null-detail message, the null-detail message will not be computed on demand if this happens.

The choice not to support serialization was made in order to minimize changes in the `NullPointerException` class itself. If persisting the null-detail message for serialization became desirable, then `writeReplace` could be implemented in that class. Alternatively, the null-detail message could be computed when the exception object is created, and this would persist the null-detail message across both serialization and method redefinition.

### The format of the null-detail message

The null-detail message is constructed of two parts: the first part describes an action that could not be performed (the _consequence_ of the NPE) while the second part describes the expression that earlier pushed a `null` reference on to the operand stack (the _reason_ for the NPE). In some cases, this results in verbose text where only a fraction of the message is really needed to pinpoint the `null` expression in source code. For example, it could be helpful to shorten the message in these two scenarios:

1. In a failed array access -- `Cannot load from object array because "a[i][j]" is null.` -- the second part `"a[i][j]" is null` suffices to pinpoint the `null` expression in source code `a[i][j][k] = 99;`.

2. In a failed method invocation -- `Cannot invoke "NullPointerExceptionTest.callWithTypes(String[][], int[][][], float, long, short, boolean, byte, double, char)" because...` -- the method's declaring type and parameter types are often bulky, and can be omitted without seriously harming the developer's ability to pinpoint the `null` expression.

Nevertheless, the null-detail message does not leave out this information. The algorithm computing the message deals with arbitrary sequences of bytecode instructions, so it does not always succeed in assembling a useful message. For example, for a failed array access, it might be unable to compute the second part altogether, so that no message would be printed at all if the first part was left out; in this case, the first part alone may be sufficient to pinpoint the `null` expression in source code. In general, due to assembling the message from individual building blocks for each instruction visited, it is not feasible to decide algorithmically whether enough information has been gathered at some point to leave out further parts without harming the usefulness of the message. Thus, the choice was made to print all the information to make the message helpful in as many situations as possible.


Risks and Assumptions
---------------------

In a helpful NPE, the null-detail message may contain variable names from the source code. Specifically, if debug information is included in the `class` file (via `javac -g`), then local variable names are printed. These names were not previously exposed by reflection APIs directly; a program would have had to obtain them via the indirect route of inspecting a `class` file via `ClassLoader::getResourceAsStream()`. Exposing these names in NPEs might be considered a security risk, but leaving them out would limit the benefit of the null-detail message.

It is assumed that computation of the null-detail message will be extended if new bytecodes are added to the JVM Specification.


Testing
-------

A prototype of this feature is implemented by [JDK-8218628](https://bugs.openjdk.java.net/browse/JDK-8218628). The prototype contains a unit test that exercises every message part. A predecessor implementation has been in SAP's commercial JVM since 2006 and has proven to be stable. 

To avoid regressions some larger amounts of code should be run. The jtreg tests should be run to detect other tests that handle the message and need to be adapted.

Comments

Updating the comments: the name of the flag is -XX:+ShowCodeDetailsInExceptionMessage which by default will be off. This is reflected in the latest version of the description.
22-08-2019
Hi Mark, I am fine with your edits, thanks for looking at the issue. Distinguishing "printout" (the text the VM prints if it terminates with an exception) and "message" (The content of the String object returned from Throwable.getMessage()) was intentional, but using "message" for both is fine, too. I added '-XX:+/-' to the option and removed mentioning "experimental", I don't think protecting this with -XX:+UnlockExperimentalVMOptions is necessary. I chose the negative expression 'Suppress' as the feature is to be enabled per default in the long run if no bad experiences are made with it. So in the long run, there would not be a double negation. And the meaning of the flag better fits what people might want to do once the feature is on per default. A similar discussion is ongoing with the CSR for the flag: JDK-8227717, and, unfortunately, in a private communication. As JEP and CSR should be in sync, I might have to change the name of the flag at some point. I will not change anything else any more (at least on my behalf) than the name of the flag.
26-07-2019
I’ve done a light copy-editing pass on the text. Please let me know if this looks okay to you and I’ll move it to Candidate. You mention “the new command-line option SuppressCodeDetailsInExceptionMessages”, by which I assume you mean something like `-XX:-SuppressCodeDetailsInExceptionMessages`. It’d be helpful to spell out the full option name, including the `-XX:-` part. Enabling the extended messages while they’re not enabled by default requires the use of an option that disables the suppression of those messages, i.e., a double negative, which is likely to confuse some users. You describe this as an “experimental” feature. Does that mean that it’s protected by the `-XX:+UnlockExperimentalVMOptions` option? If so, then please mention that. If not, then using the word “experimental” could be confusing; it might be better just to day “disabled by default.”
24-07-2019
I edited the text to mention that the feature is off per default and must be enabled by setting a command line flag. I am also saying that I would like to enable it per default in a later release.
15-07-2019
I exchanged mails with Goetz every day for over a month to improve the clarity of this JEP and the readability of the null-detail messages. Given that the messages will be seen by millions of people in the coming decades, I believe it is right for this JEP to fully describe when and how the messages are computed. I do not agree with all of the design decisions (notably, I think it is inappropriate that the feature cannot be turned off), but I am glad to have worked with Goetz to document his rationale for those decisions.
18-06-2019
I reread this JEP and fixed typos and some wording. I also fixed the section that if the redefined (old) version of the method is no longer available to the JVM, no message can be printed. I have also reviewed the implementation and made some suggestions to Goetz offline. I think this is a reasonable addition to the the JVM for NPE error message printing. The enhanced message has been available to SAP customers since JDK 8 and is a useful feature that we should have in the open platform as well. I don't have concerns about maintainability with this code. There is already bytecode parsing and analysis in the JVM. Adding new bytecodes must be done to several places already, so this isn't an additional burden with this code. Lastly, the algorithm to data flow the bytecodes is clever and minimal. With some renaming, it should be self-describing. The new messages are great. I think it's generally agreed that they would be helpful to a user than just that a NullPointerException has been thrown.
03-05-2019
I pushed the prototye to jdk/sandbox as branch JEP-8220715-NPE_messages http://hg.openjdk.java.net/jdk/sandbox/shortlog/acdec92db672
19-03-2019