JDK-8315737 : JEP 483: Ahead-of-Time Class Loading & Linking
  • Type: JEP
  • Component: hotspot
  • Sub-Component: runtime
  • Priority: P3
  • Status: Integrated
  • Resolution: Unresolved
  • Fix Versions: 24
  • Submitted: 2023-09-06
  • Updated: 2024-11-20
Related Reports
Blocks :  
Blocks :  
Blocks :  
Blocks :  
Blocks :  
Duplicate :  
Relates :  
Relates :  
Relates :  
Sub Tasks
JDK-8331497 :  
Description
Summary
-------

Improve startup time by making the classes of an application instantly available, in a loaded and linked state, when the HotSpot Java Virtual Machine starts. Achieve this by monitoring the application during one run and storing the loaded and linked forms of all classes in a cache for use in subsequent runs. Lay a foundation for future improvements to both startup and warmup time.


Goals
-----

  - Improve startup time by exploiting the fact that most applications start up in roughly the same way every time they run.

  - Do not require any change to the code of applications, libraries, or frameworks.

  - Do not require any change to how applications are started from the command line with the `java` launcher, beyond the command-line options related directly to this feature.

  - Do not require the use of the `jlink` or `jpackage` tools.

  - Lay a foundation for continued improvements to startup time and also to warmup time, i.e., the time required for the HotSpot JVM to optimize an application’s code for peak performance.


## Non-Goals

  - It is not a goal to cache classes that are loaded by user-defined class loaders. Only classes loaded from the class path, the module path, and the JDK itself, by the JDK’s [built-in class loaders], can be cached. We may address this limitation in future work.

[built-in class loaders]: https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/lang/ClassLoader.html#builtinLoaders


## Motivation

The Java Platform is highly dynamic. This is a source of great strength.

Features such as dynamic class loading, dynamic linkage, dynamic dispatch, and dynamic reflection give vast expressive power to developers. They can create frameworks which use reflection to determine an application’s configuration by inspecting application code for annotations. They can write libraries which dynamically load and then link to plug-in components discovered at run time. They can, finally, assemble applications by composing libraries which dynamically link to other libraries, leveraging the rich Java ecosystem.

Features such as dynamic compilation, dynamic deoptimization, and dynamic storage reclamation give broad flexibility to the JVM. It can compile a method from bytecode to native code when it detects, by observing an application’s behavior, that doing so will be worthwhile. It can speculatively optimize native code, assuming a particular frequent path of execution, and revert to interpreting bytecode when it observes that the assumption no longer holds. It can reclaim storage when it observes that doing will be profitable. By these and related techniques, the JVM can achieve higher peak performance than is possible with traditional static approaches.

All this dynamism comes at a price, however, which must be paid every time an application starts.

The JVM does a lot of work during the startup of a typical server application, interleaving several kinds of activities:

  - It scans hundreds of JAR files on disk and reads and parses thousands of class files into memory;

  - It [loads the parsed class data into class objects][load] and [links them together][link] so that classes can use each others’ APIs, which involves [verifying bytecodes][verify] and [resolving symbolic references][resolve], which in turn may involve [instantiating lambda objects][lambda]; and

  - It executes the static initializers of classes — their `static` field initializers and `static { ... }` blocks — which can create many objects and even perform I/O operations such as opening log files.

If, additionally, the application uses a framework, e.g., the Spring Framework, then the framework’s startup-time discovery of `@Bean`, `@Configuration`, and related annotations will trigger yet more work.

All this work is done on demand, lazily, just in time. It is heavily optimized, however, so many Java programs start up in milliseconds. Even so, a large server application which uses a web application framework plus libraries for XML processing, database persistence, etc., may require seconds or even minutes to start up.

Yet applications tend to repeat themselves, often doing essentially the same thing every time they start: Scanning the same JAR files, reading and parsing and loading and linking the same classes, executing the same static initializers, and using reflection to configure the same application objects. The key to improving startup time is to try to do at least some of this work eagerly, ahead of time, rather than just in time. To put it another way, in the terms of [Project Leyden], we aim to [shift some of this work earlier in time][shift].

[load]: https://docs.oracle.com/javase/specs/jvms/se22/html/jvms-5.html#jvms-5.4.3
[link]: https://docs.oracle.com/javase/specs/jvms/se22/html/jvms-5.html#jvms-5.3
[verify]: https://docs.oracle.com/javase/specs/jvms/se22/html/jvms-5.html#jvms-5.4.1
[resolve]: https://docs.oracle.com/javase/specs/jvms/se22/html/jvms-5.html#jvms-5.4.3
[lambda]: https://cr.openjdk.org/~briangoetz/lambda/lambda-translation.html
[shift]: https://openjdk.org/projects/leyden/notes/02-shift-and-constrain
[Project Leyden]: https://openjdk.org/projects/leyden/


## Description

We extend the HotSpot JVM to support an _ahead-of-time cache_ which can store classes after reading, parsing, loading, and linking them. Once a cache is created for a specific application, it can be re-used in subsequent runs of that application to improve startup time.

To create a cache takes two steps. First, run the application once, in a _training run_, to record its AOT configuration, in this case into the file `app.aotconf`:

<pre><code>$ java <b>-XX:AOTMode=record -XX:AOTConfiguration=app.aotconf</b> \
       -cp app.jar com.example.App ...
</code></pre>

Second, use the configuration to create the cache, in the file `app.aot`:

<pre><code>$ java <b>-XX:AOTMode=create -XX:AOTConfiguration=app.aotconf</b> \
       <b>-XX:AOTCache=app.aot</b> -cp app.jar
</code></pre>

(This second step doesn’t run the application, it just creates the cache. We intend to streamline the process of cache creation in future work.)

Subsequently, in testing or production, run the application with the cache:

<pre><code>$ java <b>-XX:AOTCache=app.aot</b> -cp app.jar com.example.App ...
</code></pre>

(If the cache file is unusable or does not exist then the JVM issues a warning message and continues.)

With the AOT cache, the reading, parsing, loading, and linking work that the JVM would usually do just-in-time when the program runs in the third step is shifted ahead-of-time to the second step, which creates the cache. Subsequently, the program starts up faster in the third step because its classes are available instantly from the cache.

For example, here is a program which, though short, uses the [Stream API] and thus causes almost 600 JDK classes to be read, parsed, loaded, and linked:

```
import java.util.*;
import java.util.stream.*;

public class HelloStream {

    public static void main(String ... args) {
        var words = List.of("hello", "fuzzy", "world");
        var greeting = words.stream()
            .filter(w -> !w.contains("z"))
            .collect(Collectors.joining(", "));
        System.out.println(greeting);  // hello, world
    }

}
```

[Stream API]: https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/util/stream/package-summary.html

This program runs in 0.031 seconds on JDK&nbsp;23. After doing the small amount of additional work required to create an AOT cache it runs in in 0.018 seconds on JDK&nbsp;NN — an improvement of 42%. The AOT cache occupies 11.4 megabytes.

For a representative server application, consider [Spring PetClinic], version 3.2.0. It&nbsp;loads and links about 21,000 classes at startup. It starts in 4.486 seconds on JDK&nbsp;23 and in 2.604 seconds on JDK&nbsp;NN when using an AOT cache — also an improvement of 42%, by coincidence. The AOT cache occupies 130 megabytes.

[Spring PetClinic]: https://github.com/spring-projects/spring-petclinic


### How to train your JVM

A training run captures application configuration and execution history for use in subsequent testing and production runs. A good candidate for a training run is, therefore, a production run. Using a production run for training, however, is not always practical, especially for server applications which, e.g., create log files, open network connections, and access databases. For such cases we recommend creating a synthetic training run that resembles actual production runs as much as possible. It should, among other things, fully configure itself and exercise typical production code paths.

One way to achieve this is to add a second main class to your application specifically for training, e.g., `com.example.AppTrainer`. This class can invoke the production main class to exercise the common modes of the application using a temporary log-file directory, a local network configuration, and a mocked database if required. You might already have such a main class in the form of an integration test.

Some additional tips:

  - To optimize for startup time, structure the training run so that it loads the same classes that a production run loads when it starts. You can check which classes are loaded via the `-verbose:class` command-line option or the `jdk.ClassLoad` event of the [JDK Flight Recorder](https://dev.java/learn/jvm/jfr/).

  - To minimize the size of the AOT cache, avoid loading classes in the training run that are not used in production runs. Do not, e.g., use a test suite written with a rich test framework. We may provide a way to filter such classes from the cache in future work.

  - If, in production, your application interacts with other hosts on the network or accesses a database then, in training, you may want to mock those interactions to ensure that the necessary classes are loaded. Such mocking, if done in Java code, will cause additional classes to be cached which are not needed in production. Again, we may provide a way to filter such classes from the cache in future work. If, for some reason, you cannot mock these kinds of interactions, and therefore cannot include them in the training run, then the classes required in production to handle them will be loaded from the class path or from modules, just-in-time, as usual.

  - Focus on running a broad set of short verification scenarios, sometimes called “smoke tests” or “sanity tests.” This is often enough to load most of the classes you will need in production. Avoid large test suites that cover rare corner cases and seldom-used functionality. Also avoid stress and regression tests, which generally do not characterize typical startup activities.

  - Keep in mind that an AOT cache only helps insofar as the training run does similar things as production runs. If the training run stops short of that then the cache will be less useful.


### Consistency of training and subsequent runs

To enjoy the benefits of the AOT cache generated during a training run, the training run and all subsequent runs must be essentially similar.

  - All runs must use the same JDK release and be on the same hardware architecture (e.g., `x64` or `aarch64`) and operating system.

  - All runs must have consistent class paths. A subsequent run may specify extra class-path entries, appended to the training class path; otherwise, the class paths must be identical. Class paths must contain only JAR files; directories in class paths are not supported because the JVM cannot efficiently check them for consistency.

  - All runs must have consistent module options on the command line, and consistent module graphs. The arguments to the `-m` or `--module` options, if present, must be identical. The `--limit-modules`, `--patch-module`, and `--upgrade-module-path` options must not be used.

If any of these constraints are violated then the JVM, by default, issues a warning and ignores the cache. You can insist that the JVM use the cache by adding the option `-XX:AOTMode=on` to the command line:

<pre><code>$ java -XX:AOTCache=app.aot <b>-XX:AOTMode=on</b> \
       -cp app.jar com.example.App ...
</code></pre>

If this option is present then the JVM reports an error and exits if any of the above constraints are violated, or if the cache does not exist.

(If needed, you can disable the AOT cache entirely via `-XX:AOTMode=off`. You can also specify the default mode via `-XX:AOTMode=auto`, in which case the JVM tries to use the AOT cache specified via the `-XX:AOTCache` option; if the cache is unusable or does not exist then it issues a warning message and continues.)

A useful exception to the requirement for consistency is that training and subsequent runs may use different garbage collectors. Another useful exception is that training and subsequent runs may use different main classes; this gives flexibility in constructing training runs, as noted above.


### History

The ahead-of-time cache proposed here is a natural evolution of an old feature in the HotSpot JVM, [_class-data sharing_][cds] (CDS).

CDS was [first introduced][intro] in an update to JDK&nbsp;5, in 2004. It initially aimed to shrink the memory footprint of multiple Java applications running on the same machine. It achieved this by reading and parsing JDK class files, storing the resulting metadata in a read-only archive file that could later be mapped directly into memory by multiple JVM processes using the same virtual-memory pages. We later extended CDS so that it could also store metadata for application classes.

Nowadays the sharing benefit of CDS has been reduced by new security practices such as [address space layout randomization][aslr] (ASLR), which makes the address at which a file is mapped into memory unpredictable. CDS still, however, offers a significant startup-time improvement — so much so that builds of JDK&nbsp;12 and later include a [built-in CDS archive][default-cds] containing the metadata of over a thousand commonly-used JDK classes. CDS is, therefore, ubiquitous, even though many Java developers have never heard of it and few have used it directly.

[cds]: https://dev.java/learn/jvm/cds-appcds/
[intro]: https://web.archive.org/web/20040604034719/http://java.sun.com/j2se/1.5.0/docs/guide/vm/class-data-sharing.html
[aslr]: https://en.wikipedia.org/wiki/Address_space_layout_randomization
[default-cds]: https://openjdk.org/jeps/341

The AOT cache builds upon CDS by not only reading and parsing class files ahead-of-time but also loading and linking them. You can see the effect of the latter two optimizations by disabling them via the `-XX:-AOTClassLinking` option when creating a cache:

<pre><code>$ java -XX:AOTMode=create -XX:AOTConfiguration=app.aotconf \
       -XX:AOTCache=app.aot <b>-XX:-AOTClassLinking</b>
</code></pre>

When we use this option, we can see that most of the improvement to the startup time of the `HelloStream` program is due to ahead-of-time loading and linking, while most of the improvement to the startup time of the PetClinic application is due to the ahead-of-time reading and parsing already done by CDS today (all times are in seconds, and percentages are cumulative):

<table>
  <tr>
    <td/>
    <td><code>HelloStream</code></td>
    <td>PetClinic</td>
  </tr>
  <tr>
    <td>JDK 23</td>
    <td>0.031</td>
    <td>4.486</td>
  </tr>
  <tr>
    <td>AOT cache, no loading or linking</td>
    <td>0.027 (+13%)</td>
    <td>3.008 (+33%)</td>
  </tr>
  <tr>
    <td>AOT cache, with loading and linking</td>
    <td>0.018 (+42%)</td>
    <td>2.604 (+42%)</td>
  </tr>
</table>

Users of [Spring Boot][sb-cds] and, more generally, the [Spring Framework][sf-cds], can therefore enjoy significant startup-time improvements, today, simply by using the CDS feature already available in previous JDK releases.

[sb-cds]: https://docs.spring.io/spring-boot/how-to/class-data-sharing.html
[sf-cds]: https://docs.spring.io/spring-framework/reference/integration/cds.html

The new `-XX:AOT*` command-line options are, for the most part at this time, macros for existing CDS options such as `-Xshare`, `-XX:DumpLoadedClassList`, and `-XX:SharedArchiveFile`. We are introducing the `-XX:AOT*` options in order to provide a uniform user experience for both this and future ahead-of-time features, and to drop the potentially confusing words “share” and “shared.”


### Compatibility

Ahead-of-time class loading and linking works with every existing Java application, library, and framework. It requires no changes to source code and no changes to build configurations, aside from the additional step of creating the AOT cache. It fully supports the highly dynamic nature of the Java Platform, including run-time reflection.

This is so because the timing and ordering of class reading, parsing, loading, and linking is immaterial to Java code. The Java language and virtual-machine specifications give the JVM broad freedom in scheduling these operations. When we shift these operations from just-in-time to ahead-of-time, the application observes classes being loaded and linked as if the JVM did that work at the exact moment requested — though unaccountably fast.


## Future work

  - The two-step workflow proposed here is cumbersome. In the near future we expect to reduce this to one step which both performs the training run and creates the AOT cache.

  - At present, the only way to do a training run is to have the application run a representative workload, at least through startup, and then exit. In future work we may create new tools to help developers more flexibly define and evaluate such training runs and workloads, and perhaps also allow them to manually adjust what is stored in AOT caches. We may also enable training data to be gathered unobtrusively during production runs.

  - ZGC is not yet supported. We intend to address this limitation in [future work](https://bugs.openjdk.org/browse/JDK-8326035).

  - In some cases the JVM cannot load classes ahead of time, much less link them. These include classes loaded by user-defined class loaders, old classes which require an old version of the bytecode verifier, and signed classes. If a class cannot be AOT-loaded then other, AOT-loadable classes cannot be AOT-linked to it. In all such cases the JVM falls back to loading and linking just-in-time, as usual. We may address these limitations in future work, if and when they prove significant.

  - Loading and linking classes ahead-of-time enables future improvements to warmup time. In the future, during training runs we can record statistics about which code runs most frequently and cache any optimized code that is generated. This will enable applications to start immediately in an optimized state.


## Testing

  - We will create new unit-test cases to cover the new command-line options.

  - Ahead-of-time loading and linking is independent of existing CDS features. Most CDS tests should pass when run with the `-XX:+AOTClassLinking` option. A few tests are sensitive to the order in which classes are loaded; we will revise them as appropriate.


## Risks and Assumptions

  - We assume that the [consistency] required across training and subsequent runs is tolerable to developers who want to use this feature. They must, especially, ensure that class paths and module configurations are consistent in all runs.

  - We assume that the limited support for user-defined class loaders is tolerable. Conversations with some potential users suggest that they are willing to accept fixed class paths and module configurations, and thus a fixed set of built-in class loaders, and to use specialized class loaders only when that flexibility is required.

  - We assume that the low-level side effects of ahead-of-time loading and linking are immaterial in practice. These include the timing of filesystem accesses, log messages, JDK-internal bookkeeping activities, and changes in CPU and memory usage. Applications that observe and depend on such subtle effects may become unstable if classes are loaded and linked ahead-of-time. We assume that such applications are rare, and that they can be adjusted to compensate.

[consistency]: #Consistency-of-training-and-subsequent-runs

Comments
I'm sure you're all aware of Jiangli Zhou's great write up on "Java Class Pre-resolution and Pre-initialization" [1]. I'm a little surprised to see that the corresponding JBS issues: - JDK-8233887: Archived class pre-resolution and pre-initialization [2] - JDK-8245858: Set state to 'linked' when an archived boot class is restored at runtime [3] - JDK-8232222: Enhance Java heap object (subgraph) archiving for more general support of selective class/static field pre-initialization [4] have recently (January 2024) all be closed as "Won't fix". I think we should at least link them to this and the "CDS Implementation note JEP" or mark them as duplicates of these new JEPs or the more general Leyden tasks? [1] https://cr.openjdk.org/~jiangli/Leyden/Java%20Class%20Pre-resolution%20and%20Pre-initialization%20(OpenJDK).pdf [2] https://bugs.openjdk.org/browse/JDK-8233887 [3] https://bugs.openjdk.org/browse/JDK-8245858 [4] https://bugs.openjdk.org/browse/JDK-8232222
17-07-2024

Thank you again to all the reviewers. At the advice of architects I trust, and perhaps to the dismay of other peers (even my co-authors), I have split the content of this JEP. All of “implementation notes”, the small details that most fascinate me and some others, are now in a draft informational JEP https://openjdk.org/jeps/8336232 which will evolve over time as a general commentary on CDS, specifically the details of its implementation and the finer points of its usage. The present JEP will continue to provide a helpful but brief introduction to CDS, with the level of detail limited (approximately) to what can be usefully read in one sitting. Some of us might prefer to have all the details in one place, but surely it is not so bad to have them all in two places. Especially since the informational JEP applies to not only this particular JEP but to CDS as a whole, and to the JEPs that build on this one. I expect the informational JEP to stay in a draft state for some time, as we collect requests for clarification about CDS, and populate the JEP with answers to those requests.
11-07-2024

Thanks for the comments everyone. Ioi, you are right; I changed Cached to Cache. The title acronym is expanded at Mark’s request. In general, we shouldn’t lead with acronyms; we lead with their expansions, and later on use acronyms when they are unambiguous and helpful. Volker, I responded via email on leyden-dev; thanks for the detailed suggestions. Here is the “history bit” for Volker which I added to the section which teaches the reader about CDS: > As a matter of history, CDS has been built into the HotSpot VM [since JDK 5] in 2004. At first its expanded name was “Class Data Sharing”, which is reflected in options like `-Xshare:…`. Over the years it has expanded its capabilities beyond the storage of shareable class data, so the preferred alternative expansion for CDS is now “Cache Data Storage”. Either expansion refers to the same technology. [since JDK 5]: <https://web.archive.org/web/20040604034719/http://java.sun.com/j2se/1.5.0/docs/guide/vm/class-data-sharing.html> (Thanks for the link, Volker!) I updated the Hello World example with numbers provided by Ioi. The example uses a single Java 8 stream, in order to exercise the pre-linking feature to a measurable extent.
09-07-2024

I prefer the shorter form of "Cache Data Store" that "Cached Data Storage".
08-07-2024

Thank you, John, for addressing my comments. Now "JIT" usage here is less confusing and I am fine with it. I agree with Volker point about adding statement that CDS (Cached Data Storage) is "evolution of the existing CDS ("Class Data Sharing") implementation".
08-07-2024

After reading the full JEP it became evident that "Cached Data Storage" is actually an extension of the existing and well known "Class Data Sharing" feature. However, the current description introduces "CDS" (which is the well known acronym for "Class Data Sharing" since JDK 5 times [1]) as "Cached Data Storage". This is surprising and confusing for everybody already familiar with the existing "CDS" feature. I'd therefore suggest to move the very last sentence of the JEP to the very beginning. E.g. something like: This JEP is an evolution of the existing CDS ("Class Data Sharing") implementation. In the remainder of the document we will refer to CDS as "Cached Data Storage" to emphasize its extended functionality compared to the classic Class Data Sharing. Another point I think the JEP should address in some more detail is the effect of "Cached Data Storage" on the overall memory footprint of the JVM process. The classic "Class Data Sharing" already slightly increased the memory footprint of the JVM and only amortized if more than one JVM used the same CDS archive at runtime. Please describe the effects of the new "Cached Data Storage" on memory footprint. Finally some minor text corrections: > CDS is built into the HotSpot VM since JDK 6. It's actually there since JDK 5, see [1] :) > Using CDS The way you describe the manual usage of CDS is "old" and quite cumbersome. Why not use the new -XX:+AutoCreateSharedArchive option which automatically creates the archive if non is available. Or does -XX:+AOTLoadedClasses not work together with -XX:+AutoCreateSharedArchive and if so, why not? > This is can be viewed as.. Remove redundant "is" in the sentence above. > from CDS assets already present VM memory, Should probably read "..present in VM memory.." > such as method profiles and compiled code, can stored as new assets Should probably read "..can be stored.." > The implementation CDS already does enforce Should probably read "The CDS implementation.." > Therefore, we can use run existing CDS test cases with this option explicitly enabled. Should probably read "we can use and run existing.." Regards, Volker [1] https://web.archive.org/web/20040604034719/http://java.sun.com/j2se/1.5.0/docs/guide/vm/class-data-sharing.html [2] https://docs.oracle.com/en/java/javase/22/docs/specs/man/java.html#creating-dynamic-cds-archive-file-with--xxautocreatesharedarchive
08-07-2024

In several places JDK 23 is mentioned. Why not JDK 24 (or JDK 21)? "CDS is built into the VM in JDK 23." Did you mean "since JDK X"? [~iklam] When CDS was added to JDK? There are several (FIXME) in text. JIT compilers don't do loading and linking (unless you mean when code is deoptimized): "Thus, -XX:+AOTLoadedClasses shifts JIT loading and linking activity to a cache of AOT loaded classes, saving more startup work than before." The benefit from these changes for JIT compilation is less uncommon traps in code and less deoptimizations. Which may improve startup a little.
04-07-2024