JDK-8267650 : Better-defined JVM class file validation
  • Type: JEP
  • Component: specification
  • Sub-Component: vm
  • Priority: P4
  • Status: Closed
  • Resolution: Withdrawn
  • Submitted: 2021-05-24
  • Updated: 2023-12-22
  • Resolved: 2023-12-22
Related Reports
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Description
Summary
-------

Update JVMS to more clearly define the requirements and timing of JVM `class`
file validation. Align HotSpot with these rules.

Goals
-----

Specification and implementation updates will impact format checking, which
occurs during class loading, and verification, which occurs between class
loading and class initialization.

Special attention will be given to the following areas:

- Distinguishing between validation rules and unenforced recommendations
- The treatment of method names and descriptors
- Eliminating the unused `ACC_SUPER` flag
- Selective validation of attributes
- The timing of `Code` and `StackMapTable` attribute checks
- The distinct roles of static and structural constraints
- Correcting and eliminating redundant verification rules

Most HotSpot changes will be subtle, as necessary to reconcile differences
between the specification and implementation.

APIs that provide information about loaded classes (such as core reflection and
JDI) may also need to make subtle adjustments to their validation behavior.



Non-Goals
---------

The changes will not address any anomalies in constant pool resolution or
runtime execution—this effort is only concerned with `class` file validation.

The use of Prolog rules to specify verification since Java SE 6 can make some
parts of JVMS difficult to read, but this JEP will not alter that approach.

The specifications of APIs that operate on `class` files, like core reflection,
often elide many details about API-specific validation behavior. This JEP
will not attempt to fill in those details.



Motivation
----------

The [Valhalla Project](https://openjdk.java.net/projects/valhalla/) is pursuing
significant changes to the Java programming model and the Java Virtual Machine.
It anticipates extending the `class` file format with a number of new opcodes,
constant pool entries, descriptor forms, verification types, special methods,
and attributes.

In anticipation of these changes, it will be useful to get the rules for `class`
file verification on solid footing.

Broadly, the JVM processes `class` files in stages; at each stage, certain categories
of validation rules are enforced.

-   When a class is *loaded*, the bytes of the `class` file are parsed, and some
    basic structural rules are enforced. This is called *format checking* (JVMS 4.8, 5.3.5).
    If format checking fails, the class cannot be loaded.

-   Before the class can be *initialized*, the bytecode of every method is checked for
    both valid syntax and consistent use of types. This is called *verification* (JVMS 4.10, 5.4.1).
    If verification fails, no code in the class can be executed.

-   At some point before a specific instruction is executed, a search for any referenced
    class, field, or method must be performed. This is called *resolution* (JVMS 5.4.3, 6.5).
    If resolution fails, execution of the instruction throws an error.

-   Some `class` file attributes are interpreted by APIs or tools. (For example, the `Signature`
    attribute is interpreted by both `javac` and core reflection methods like
    `Class.getGenericSuperclass`.) These APIs and tools have their own validation rules,
    which may lead to errors or other exceptional behavior when the API or tool is invoked.

This JEP is focused on the validation rules enforced by format checking and verification.
It also has a subtle impact on the rules some APIs and tools are expected to enforce.

Historically, the lines between different validation stages were sometimes blurred,
and some anomalies persist in the JVM specification. Readers and implementers of the
specification may be left with questions such as:

-   Which rules about class, field, and method references actually lead to
    load-time `ClassFormatError`s? For example, how can the JVM
    know whether a named class is actually an interface, or whether a field
    exists?

-   What happens if an array type is used in place of a class name in contexts like
    a class's `this_class` or a field or method reference's `class_index`?

-   When are references to the special method names `<init>` or `<clinit>`
    allowed? Under what conditions is it a `ClassFormatError` to reference one
    of these names with an inappropriate descriptor?

-   Why are some attributes, like `InnerClasses` or `LocalVariableTable`,
    "optional" but still validated? Under what conditions is an inconsistency in
    such an attribute considered a load-time `ClassFormatError`?

-   Which rules about the `StackMapTable` attribute are enforced during format
    checking, and which rules are enforced during verification?

This JEP addresses these and similar questions by carefully reviewing both the
specification and longstanding HotSpot behavior, clarifying the specification
text where necessary, and reconciling any behavioral differences.



Description
-----------

This work can be organized into four different areas of focus, as outlined
below.

In addition to the specification and behavioral changes described here, this is
an opportunity to review the treatment of format checking and verification in
JVMS and the HotSpot implementation code, potentially identifying further
discrepancies or unnecessary complexity.



### Format checking

Chapter 4 of JVMS will be updated to distinguish between assertions that are
meant to be enforced as format checks ("The `constant_pool` entry at that index
must be a `CONSTANT_Class_info` structure") and assertions that are merely
informational ("the `class_index` item should name a class or an array type, not
an interface"). The conditions under which predefined attributes are recognized
and checked will also be clarified. The `ACC_SUPER` flag, which has no effect
since Java 8, will no longer be specified.

Two changes to HotSpot behavior with respect to attribute checking will be made:

-   Rejecting class or interface declarations with `Module`, `ModulePackages`,
    or `ModuleMainClass` attributes, on the basis that if these attributes
    appear in the `attributes` table of a `ClassFile` structure in any
    appropriately-versioned class file, they should be recognized as predefined
    attributes, and thus checked.

-   Rejecting non-static field declarations with `ConstantValue` attributes,
    similarly on the basis that if the attribute appears in the `attributes`
    table of a `field_info` structure, it should be recognized and checked.

`javac` will be updated to no longer set `ACC_SUPER`.


### Special methods

To improve consistency of JVMS and align with longstanding HotSpot behavior, the
definitions of special methods will be revised to include any methods with the
names `<init>` or `<clinit>`; a number of special restrictions apply to these
method declarations and references to them. The constraints on names and
descriptors in references to methods (like `Methodref` and `InvokeDynamic`) will
be clarified.

In HotSpot, the following validation behaviors are changed:

-   Unspecified checks on `NameAndType` constants are no longer performed—for
    example, the `NameAndType` `<init>:()D` is legal, per the specification,
    even though it cannot be used in a `Fieldref` or `Methodref`.

-   The check that an `invokedynamic` does not use the name `<init>` with a
    `void`-returning descriptor is moved from verification to format checking of
    the `InvokeDynamic` constant, for consistency with other similar checks.
    (For example, other return types are already rejected during format
    checking.)

-   Enforcing, for all class file version numbers, the requirement
    that a `<clinit>` method declaration must have no parameters. (This check
    is not currently specified or enforced in version 50 and older class files.)



### Optional attributes

Eleven attributes—most of which are for use by the Java programming language or
debuggers—are considered "optional" and have no impact on JVM behavior, but are
subject to certain restrictions during format checking. In some cases, the
specification makes assertions about these attributes that implementations cannot
enforce, leaving the implementations to approximate the desired behavior with ad hoc
checks.

Specifically, the contents of the following optional attributes are currently
subject to some format checks:

- `Exceptions`
- `InnerClasses`
- `EnclosingMethod`
- `Synthetic`
- `Signature`
- `SourceFile`
- `LineNumberTable`
- `LocalVariableTable`
- `LocalVariableTypeTable`
- `Deprecated`
- `Record`

Meanwhile, JVMS requires that a number of other optional attributes be ignored
during format checking. The rationale for distinguishing between the two
categories is not clear, and in practice, some checks do end up being performed
on these "ignored" attributes.

For simplicity and improved performance, format checking will be changed to
uniformly parse the names and lengths of all optional attributes, but otherwise
completely ignore their contents. (Rules related to the existence of the
attributes—e.g., that at most one `Exceptions` attribute is allowed per
method—will continue to be enforced.)

Where HotSpot provides an interface for accessing these attributes (such as
via JDI or core reflection), validation errors can be thrown by the API, as
necessary, when the API is invoked—sometime *after* the class is loaded.



### Verification

The `StackMapTable` attribute and the `exception_table` of the `Code` attribute
must be interpreted with respect to the bytecode of the corresponding `Code`
attribute. But because bytecode is not parsed until verification, many specified
format checks on `StackMapTable` and `exception_table` are, in HotSpot,
verification-time checks.

To resolve this inconsistency, the specification of verification will be updated
to formally include all validation of `StackMapTable` and `exception_table`
contents, and the corresponding format checking assertions will be expressed as
recommendations, not rules.

In addition, the specification will be updated to clarify the relationship
between the static and structural constraints on bytecode (JVMS 4.9) and the
verification algorithms (JVMS 4.10). Various bugs in the rules for verification
by type checking will be fixed, and a number of redundant assertions will be
removed (such as the check, already enforced at class loading, that a class's
superclass is not `final`).

In HotSpot, the following behavioral changes will be made:

-   Moving a few simple checks on the `exception_table` contents (such as the
    requirement that each `start_pc < end_pc`) from format checking to
    verification time.

-   Changing the error type of verification-time `StackMapTable` and
    `exception_table` errors from `ClassFormatError` to `VerifyError`.

-   Treating an invalid `Uninitialized_variable_info` in a `StackMapTable` as
    an unrecoverable static constraint violation, preventing fallback to
    verification by type inference in version 50 class files. (This aligns its
    treatment with that of the similar `Object_variable_info`.)

-   Consistently performing the same verification checks on an `invokespecial`
    whether the instruction references a `Methodref` or an `InterfaceMethodref`.
    (Currently, there's an assumption that an interface name will only appear in
    an `InterfaceMethodref`.)



Risks and Assumptions
---------------------

Changing JVM validation behavior is often a risk, because it may cause legacy
`class` files to fail with new errors, or, more subtly, new class files with old
version numbers to be accepted, but then fail on older JVMs.

In general, the HotSpot changes proposed in this JEP are narrow in scope, often
in corner cases that real world code is unlikely to probe. And many of the
changes only modify the type of error being thrown or the timing of an error
check. That said, the most likely areas of concern are:

-   New errors caused by improper appearances of the `Module`, `ModulePackages`,
    `ModuleMainClass`, and `ConstantValue` attributes. 

-   New errors caused by pre-51 class files that declare a useless method with
    name `<clinit>` and 1 or more parameters.

-   Accepting class files with malformed optional attributes, even though those
    class files could fail to load on an older JVM.

Besides the risk to JVM users, there is some risk that, by relaxing the
constraints on optional attributes, downstream tools will be surprised by
unvalidated attribute contents in `class` files that can be successfully loaded.

These risks need to be balanced against the cost of the extra complexity
required to fully specify and maintain longstanding, often ad hoc HotSpot
behavior.



Dependencies
------------

These changes are a soft prerequisite to new JVM feature work in Valhalla,
including [JEP 401][jep401].

[jep401]: https://openjdk.java.net/jeps/401

Comments
Withdrawn: there is no longer a strong dependency between Valhalla and many of these areas. And anyway, a more effective approach seems to be breaking proposed changes up into bite sized pieces and processing those through separate JBS bugs. That allows each piece to be considered and reviewed separately, as necessary.
22-12-2023

I was also considering doing something about ACC_SUPER. I've updated the JEP description to include removing it. Tooling to check unused access flags and other lint-style warnings seems like it might be a good idea, but I'll leave that to a separate effort, if anyone wants to pick it up. (I wonder, though, if rather than working on a product feature, it would be more effective for us to do some checks on these things on our own with a corpus search, and then reach out to the offending tools.)
03-06-2021