Bug ID: JDK-8326035 JEP 516: Ahead-of-Time Object Caching with Any GC

Type: JEP
Component: hotspot
Sub-Component: gc

Priority: P4
Status: Candidate
Resolution: Unresolved

Submitted: 2024-02-16
Updated: 2025-05-14

Summary
-------

Enhance the [ahead-of-time cache][JEP 483], which enables the HotSpot Java Virtual Machine to improve startup and warmup time, so that it can be used with any garbage collector, including the low-latency Z&nbsp;Garbage Collector (ZGC). Achieve this by making it possible to load cached Java objects sequentially into memory from a neutral, GC-agnostic format, rather than map them directly into memory in a GC-specific format.

[JEP 483]: https://openjdk.org/jeps/483

Goals
-----

- Allow all garbage collectors to work smoothly with the [AOT cache][JEP 483] introduced by [Project Leyden](https://openjdk.org/projects/leyden/).

- Separate the AOT cache from GC implementation details and policies.

- Ensure that use of the AOT cache does not materially impact startup time, relative to previous releases.

Motivation
----------

Most of the HotSpot JVM's garbage collectors pause application threads in order to reclaim memory. This causes applications to take significantly longer than usual to handle some requests, increasing their _[tail latency](https://brooker.co.za/blog/2021/04/19/latency.html)_. For example, 99% of all requests may be handled within 10ms, but 1% of the requests may take 100ms or more. You can minimize the tail latency caused by garbage collection by using the [Z Garbage Collector (ZGC)](https://wiki.openjdk.org/display/zgc/Main). ZGC reclaims memory concurrently, never pausing application threads for more than a millisecond.

Garbage collection is, however, not the only cause of high tail latency.

Java applications are often scaled by starting new JVM instances to handle more requests, but requests sent to a new instance take significantly longer than requests sent to a warmed-up instance. To address this source of tail latency, you can enable [ahead-of-time class loading and linking](https://openjdk.org/jeps/483), introduced in JDK&nbsp;24. This improves application startup by caching your application’s classes in a training run so that they are available immediately in production. For example, the [Spring PetClinic](https://github.com/spring-projects/spring-petclinic) demo application starts 41% more quickly in production because the cache enables some 21,000 classes to appear already loaded and linked when the application starts. Forthcoming features, such as ahead-of-time [method profiling](https://openjdk.org/jeps/515) and [code compilation](https://openjdk.org/jeps/8335368), will further leverage the ahead-of-time cache to further extend these gains.

Unfortunately, the way that classes and other Java objects are cached is incompatible with ZGC. This forces you to choose between suffering GC-induced tail latency or suffering startup-induced tail latency. If you use ZGC to reduce the former then you cannot enable ahead-of-time class loading and linking to reduce the latter, and vice versa.

We could avoid this painful choice if AOT caches could be used with any of the HotSpot JVM's garbage collectors, including ZGC.

Description
-----------

An [AOT cache](https://openjdk.org/jeps/483#Description) contains, among other things, representations of the Java `Class` objects for classes that were loaded and linked during the training run. It also contains Java objects referenced by those `Class` objects, such as strings and byte arrays.

Today, cached Java objects are stored in a GC-specific format which is bitwise-compatible with the format of heap objects as understood by the GC. This enables the JVM to map them directly into the heap memory managed by the GC. (The other data in AOT cache files is not GC-specific.)

We propose to, optionally, cache Java objects in a neutral, _GC-agnostic_ format that works with all garbage collectors, regardless of which GC is used in training or in production. As an additional benefit, this will allow the JDK to include a baseline AOT cache that works in all environments.

### Obstacles to GC-agnostic object caching

The main challenge of caching objects in a GC-agnostic manner is in how to handle object references. From the perspective of Java code, the value of a field that holds a reference to an object is opaque. From the perspective of the JVM, however, each GC has its own policies for laying out objects in memory and representing references from one object to another:

- *Heap size policies (Serial, Parallel, G1)* — For heaps larger than 32 GB, object references are represented as 64-bit addresses and stored directly in reference fields. For heaps smaller than 32 GB, object references are stored in reference fields as 32-bit values, using compression if necessary since addresses can have up to 35 bits. There are three compression schemes, selected heuristically at run time based on heap size and other factors.

- *Object size policies (G1, ZGC)* — Objects are placed within heap regions, according to their sizes. In G1, the high-order bits of a 64-bit address identify the region, the low-order bits encode an offset into the region, and objects never cross region boundaries. An object that G1 considers large gets its own exclusive heap region, and any reference to the object must have all zero low-order bits. ZGC, on the other hand, distinguishes between small, medium, and large objects, using a different reference format for each.

- *Metadata* — ZGC encodes metadata bits in object references. These bits are used to manage concurrent garbage collection. No other GC supports this reference format.

The multitude of reference formats makes it challenging to take objects managed by one GC, cache them, and reify them later for a different GC.

### Object caching today

The representation of a Java object in an AOT cache mirrors its representation in memory. For example, consider a `String` object, which has these fields:

```
public class String {
private final byte[] value;
private final byte coder;
private int hash;
private boolean hashIsZero;
}
```

In the cached form of a `String` object, the `value` field contains the 64-bit memory address of a byte array:

```
header: ... | value: 0x4002045278 | coder: ... | hash: ... | hashIsZero: ...
```

The address is in a lowest-common-denominator format that is valid across the Serial, Parallel, and G1 collectors. Objects are stored in AOT caches such that none crosses the boundaries of heap regions, using a predetermined region size. This allows you to run in production with G1 even if you trained with Serial or Parallel.

ZGC does not use 64-bit addresses as object references, however, and it does not support a global size for regions. Hence ZGC cannot be used with AOT caches.

### GC-agnostic object caching

We cache Java objects in a GC-agnostic manner by storing object references in a GC-agnostic format, namely as logical indices. In a `String` object cached in this format, the `value` field contains the logical index of the byte array:

```
header: ... | value: 5 | coder: ... | hash: ... | hashIsZero: ...
```

Using objects cached in this format requires converting the logical indices back into memory addresses. The JVM therefore reads objects from the cache sequentially, i.e., _streams_ them, into memory. When the cache is opened, a background thread eagerly starts materializing objects, one by one. Materializing an object involves allocating memory in the heap, initializing the object's fields according to the data in the cache, and building object references to other materialized objects via lookups in a side table. When the application uses a class for the first time, it synchronizes with the background thread to ensure that the `Class` object for the class, and any related objects, are materialized. (The other data in the cache continues to be mapped into memory, as it is today.)

### Choosing GC-specific vs. GC-agnostic object caching

GC-specific cached objects are mapped directly into memory, while GC-agnostic cached objects are streamed into memory. Both create the appearance of instantly-loaded objects, but in some scenarios mapping performs better than streaming — and vice versa.

A _cold start_ of an application is the first start of that application on a particular machine in a while. Cold starts can happen frequently when deploying applications in a cloud. The AOT cache is unlikely to be in the filesystem cache, and the larger the cache, the larger the cost of loading it from disk. Streaming GC-agnostic cached objects can hide some of the latency of reading data from the disk, at the cost of requiring an additional CPU core.

Conversely, a _warm start_ is when an application starts close in time to a previous start, such as when running over and over on the same machine. Because the AOT cache stays in the filesystem cache between runs, mapping GC-specific cached objects can be done instantly.

The least advantageous situation for streamable, GC-agnostic object caching is a warm start in a constrained environment that does not have a spare CPU core. The JVM tries to avoid this situation in production by applying a heuristic when creating an AOT cache after a training run:

- It caches objects in the streamable, GC-agnostic format if, in training, either ZGC was used, or the <code style="white-space: nowrap">-XX:-CompressedOops</code> option was used, or the heap was larger than 32 GB. Training with ZGC, compressed pointers, or a large heap implies that the training environment was large, with more than a single core available. We assume that the production environment is similarly unconstrained, meaning that streaming will be most effective.

- It caches objects in the mappable, GC-specific format if, in training, the <code style="white-space: nowrap">-XX:+UseCompressedOops</code> option was used. This option indicates that the training environment had a heap smaller than 32 GB and did not use ZGC. This implies that the training environment was a constrained system without a spare core. We assume that the production environment is similar, meaning that mapping will be most effective.

You can explicitly create a cache whose objects are in the streamable, GC-agnostic format by specifying <code style="white-space: nowrap">-XX:+AOTStreamableObjects</code>, even if you also specify <code style="white-space: nowrap">-XX:+UseCompressedOops</code>.

The JDK includes two baseline AOT caches, one with GC-agnostic cached objects and one with GC-specific cached objects, which the JVM uses when the application does not provide a cache. This ensures that the JVM can use streaming or mapping, as appropriate, to achieve the best startup performance.

Alternatives
------------

- Enabling ZGC to support AOT caches does not require a GC-agnostic solution. We could, instead, continue the GC-specific approach by creating ZGC-specific caches containing ZGC-specific object references. This would optimize startup performance. However, the GC-agnostic solution, with objects materialized in the background, does not materially affect startup performance as long as an extra core is available, so the only situation in which a ZGC-specific cache would perform better is when using ZGC on a single-core machine. This is an unusual environment for the highly-concurrent ZGC, and thus does not motivate creating ZGC-specific caches. We prefer to rely on the maxim that the best way to reduce tail latency is with a systems approach, where the design of discrete components is coordinated, which leads us to the GC-agnostic approach.

- We could modify ZGC so it can interpret both its own object references and the G1-influenced object references currently found in AOT caches. The Serial and Parallel GCs were modified in this way, but ZGC is significantly more complex. This approach would effectively couple the implementations of all the GCs to each other, which is undesirable. In contrast, the GC-agnostic approach decouples the implementations, allowing GC implementations to evolve while allowing you to choose from the full range of GCs in training and again in production. Furthermore, since the bitwise layout of GC-agnostic cached objects is not entangled with the memory layout of objects in the heap, we expect to be able to optimize the layout of the cache to shrink its static footprint without significantly affecting GC implementations.

Testing
-------

Many object-caching tests already exist. We will adapt them to test with ZGC and the new streaming, GC-agnostic approach.

[~eosterlund] "Training with ZGC, compressed pointers, or a large heap implies that the training environment was large, with more than a single core available. " Seems there is a typo here. It should read : "Training with ZGC, non-compressed pointers, or a large heap".
14-05-2025
Looks good.
03-04-2024
Thanks for the comments, [~kvn]. I edited the JEP as requested.
03-04-2024
Can the title be "CDS Object Streaming"? "CDS" assumes already that it is archive. Add labels: gc, cds, leyden I like that this is general (for all GCs) JEP. But I don't see clear statement that currently CDS does not support ZGC for objects archiving. May be in "Goals" mention it. My wording looks not good, may be you can come up with better statement: - Add support for CDS object archiving for the Z Garbage Collector (which is not supported currently) I assume "Secondary goals not visible to users are" both bullets apply to any GC and not only ZGC. Should "Success Metrics" also mention performance for ZGC with archived objects vs current state (no archived objects)? You did not mentioned `-XX:+UseCompressedClassPointers` which also affects object layout. "Loading of roots can therefore be done lazily, and the bulk of the work can be done in an extra CDS thread." It is not clear to me when this "extra" thread starts and how it coordinates with main thread. I assume current CDS code is executed in main thread during startup. But from your description this "extra" thread will run concurrently with main thread and any Java thread. What "lazy" means here? What requests trigger work in "extra" thread? You need Ioi and someone from GC group review it too (listed as reviewers in JEP).
02-04-2024

Duplicate :	JDK-8274789 - Support archived heap objects in ZGC
Duplicate :	JDK-8310823 - CDS archived object streaming
Duplicate :	JDK-8242315 - Execute patch_archived_heap_embedded_pointers in a GC thread
Relates :	JDK-8308854 - G1 archive region allocation may expand/shrink the heap above/below -Xms
Relates :	JDK-8328886 - Lilliput: Build COH archives