Bug ID: JDK-8336232 CDS Implementation Notes

<!-- v.4 2024-07-13 -->

## Description

This JEP presents technical information about the Java virtual machine’s Cache Data Store (_CDS_), its concepts, its internal operations, and its current limitations.  This material is not intended as a tutorial, but rather as a detailed reference, and as such it assumes familiarity with the basic concepts of CDS.

Tutorials on CDS may be found elsewhere, such as in these places:

 - [A brief introduction to CDS] as part of the motivation for the [Ahead-of-Time Class Linking JEP].
 - _FIXME_ (More here, please.)

[A brief introduction to CDS]: <https://openjdk.org/jeps/8315737#cds-explained>
[Ahead-of-Time Class Linking]: <https://openjdk.org/jeps/8315737>

### Implementation notes table of contents

<!-- NOTE: short tag definitions are set manually next to each header -->

 - [How Java’s dynamism is supported by CDS]
 - [What’s in a CDS archive file]
 - [Kinds of AOT processing]
 - [Consistency between training and production]
 - [Additional limitations]
 - [Choosing a training run]
 - [Measuring startup and warmup]
 - [A brief history of CDS]
 - [CDS and sharing]
 - [Glossary]

### How Java’s dynamism is supported by CDS

[How Java’s dynamism is supported by CDS]: <#dynamism>
<a name="dynamism"></a>

Java applications can be reliably and easily composed from a huge menu of libraries and frameworks, and can be configured for testing and deployment easily, with little ceremony.  Programmers enjoy fast development cycles, easy observability, and a powerful set of tools.  The foundation for all this is a pair of Java’s “super powers”: separate compilation and dynamic linking.  Classes can be managed and inspected in isolation by inspecting their classfiles.  When they are composed by dynamic linking, their integrity is protected by the VM, and yet the VM also gives them high performance access to each others’ API points.  (Such API points include fields and methods, accessed both reflectively and directly via bytecode.)  Crucially, the configuration of an application arises naturally from the classes presented at run time, as they connect to each other; there is no “linking ceremony” required, at build time, to exhaustively define the application configuration.  Most of the mechanical steps of Java application configuration happen on the fly, invisibly to the programmer.

This works, in part, because Java, despite being staticly typed, is a highly dynamic language: Loading, linking, machine code generation, and storage reclamation are some of the dynamic behaviors.  All of this dynamism, while it provides great flexiblity to the programmer, comes at a low-level cost.  Each execution of the application must repeat the same work over and over, each time finding the right classfile bytes for a given class name, or the right addresses of methods or fields, or the right runtime support data, or the right machine code to optimize the application.  This repetition is necessary in today’s Java VMs, as long as they perform most of their operations lazily, just in time.  Dynamism allows computed decisions to be deferred until the last moment; dynamism allows loading and linking and optimization to be organized as just-in-time operations, maximizing flexibility.

When deploying an application, many of these dynamically computed decisions have stabilized and can be expected to have the same result as previous runs.  Such stability does not cancel dynamism.  If an application in production decides to perform a new behavior not previously expected, the VM can respond dynamically to the change, perhaps loading some new classes, perhaps discarding some previously optimized code and data, perhaps reoptimizing.  Only the smallest and simplest Java applications are immune to such unpredicted behavior, but just-in-time processing, allowed by dynamism, covers all the possibilities in every application.

The overall set of configuration and optimization decisions made by an application (with the VM that runs it) are thus predictable, in many cases.  The specification of the Java VM allow much freedom to schedule decisions, however dynamically they are requested.  An unpredicted decision must always be handled as a just-in-time service, but a predictable one can also be handled ahead of time.  In many cases, it is straightforward to provide AOT resources, serving them up without delay to the application, whenever it needs them.  The information required to make this shift from JIT processing to AOT processing is prediction, foreknowledge of the decisions made to configure or optimize the application.  The predictions do not need to be 100% accurate, as long as there is a way to recover from misprediction.  Often, the most direct way to make these predictions is to perform a training run of the application and observe the decisions made during that run.  Assuming similar future runs will make similar decisions, the VM can prepare, ahead of time, to execute them for the next run.  This is the basis for the CDS technology.

Optimizations which optimistically assume some prediction, but have a fallback in case of misprediction, are sometimes called _speculative optimizations_. They are very common in the Java VM, since many conditions in Java programs are dynamically determined but also amenable to prediction ahead of time.  The VM acts as though some fact is true, while also having fallback paths to compensate for speculation failure - that is, if the supposed true fact turns out to be false after all.  Outside of CDS, the VM might speculate that some method is never overridden (at least until a class is loaded that defines an override), or that some branch of code is never taken (at least until it is taken), or that some local variable has exactly one dynamic type (at least until an object not of that type shows up), or that some method deserves extra compilation effort it is used often (and if the application stops using it, the method code can be removed).

When creating a CDS archive, the VM can speculate that previous decisions, recorded during a training run, will be made the same way again later.  If application code in production makes different decisions, the VM can easily detect the new requirements.  For example, if the production run turns out to need a different set of classes, the VM can simply process the new classes just in time, in the traditional fully dynamic way, as if CDS had never been involved.  The same is true if a class in the production run asks to link to some API (another class, method, or field) not touched on in the training run; the unpredicted linking decision can be satisfied just in time.  All of this is true no matter how the application initiates loading and linking of APIs, whether via bytecodes or via reflection.  In all cases fully Java’s flexible dynamism coexists with stable predictions stored in the CDS archive.

### What’s in a CDS archive file

[What’s in a CDS archive file]: <#file-format>
<a name="file-format"></a>

The foundational ability of CDS is to speculate class loading decisions, based on an AOT training run.  In some workflows, the list of classes observed in the training run is exported as a class-list file, which is then assembledXX It can also operate from a textual list of selected classes, although this is highly error-prone.

For each classfile it selects, it can save away a pre-parsed (or “pickled”) internal form, as an independently loadable asset within the CDS archive file.  The internal form is substantially the same as that of the VM’s internal class metadata.  It is accompanied by “pointer maps” that tell how to relocate pointers which are embedded in the metadata, so that the CDS archive can be loaded at unpredictable base addresses in the virtual memory of the production run.

But when the VM starts, although all CDS assets are immediately available in VM memory, they might not yet be usable as classes, nor can they be linked together, if they are only in the pre-parsed state.

When the Java application eventually gets around to requesting a CDS class for the first time, the VM permanently makes the pre-parsed form “live” and associates the class name to the live metadata.  Only at that point can it can be linked to other loaded classes.  This can be viewed as a partially AOT-order, partially JIT-order implementation of class loading.

On the other hand, if the archive is built with `-XX:+AOTClassLinking`, the VM itself initiates AOT loading, placing the metadata images into the VM’s system dictionary. This happens in a very early period before the application’s `main` method starts to run, and thus called the _premain phase_ of execution.  At this time, both loading and linking happen quickly, from CDS assets already present in VM memory, and pre-formatted for easy adoption as live metadata.

Because of the way assets are brought into VM memory from the CDS archive, they have stable and predictable memory locations.  This stability in turn allows them to be pre-formatted in an already-linked state, with direct references to each other.  Very specifically, the enhanced pre-formatting affects the constant pool entries in each class asset; they can be populated with resolved locations and sizes of fields, methods, and other classes, as long as those entities are also present in AOT loaded classes.

Thus, these AOT loading and linking activities happen more quickly, compared to classes which are processed piecemeal by just-in-time loading and linking.  But by an appeal to an “as-if” optimization, the loading and linking may also be viewed as happening just in time, on demand by the application.  The only evidence of the shift from JIT order to AOT order is indirect, perhaps from a change in file system activity, or from log messages emitted by the VM.

When an “as-if” optimization is working, the application cannot distinguish “ahead of time” linking from “just in time” linking, except for speed.  Such as-if rules are routine in VM technology.  As another example, code compiled by the VM runs “as if” the VM’s interpreter were running it, only it runs faster.  Also, the GC allows the application unlimited allocations “as if” memory were infinite.

A benefit of the behavioral similarity of loading and linking, between JIT and AOT orders, is that CDS can still arrange to load or link some application classes the old way, to handle corner cases that would be awkward to load in the new ahead-in-time order.  Thus, although the bulk of classes are likely to be pre-formatted in the CDS archive for AOT loading, some may not not be in the new form.  This allows CDS to be flexible when dealing with more open-ended features of the VM, such as user-defined class loaders.  Likewise, CDS may choose not to preset some individual linking decision, even in an AOT-loaded class, if CDS has some reason to believe that decision could vary in the production run, or if CDS believes it would be wasted effort.  All these choices are transparent to the application.

The presence in VM memory of many application classes, at predictable (“stabilized”) addresses, is likely to be a springboard for further enhancements to CDS.  Additional kinds of VM data, such as method profiles and compiled code, can be stored as new assets in the CDS archive, pre-formatted so as to directly link to whatever classes, methods, and fields that they need.

### Kinds of AOT processing

[Kinds of AOT processing]: <#aot>
<a name="aot"></a>

Different versions of CDS perform different levels of ahead-of-time processing. The earliest versions of CDS simply pre-parse the class files, but do not attempt to install the classes until the usual “just in time” request is made for class loading, by application logic.  Later versions perform increasing amounts of AOT processing.

The various kinds of AOT processing are enabled by command line options given when the CDS archive file is created.  They are stored within the CDS archive file.  When the VM makes a production run and is instructed to use a particular archive file, it performs the AOT processing requested by that archive file.  No other command line option or configuration setting is required in the production run; it all comes from the CDS archive.

Some kinds of processing can be disabled, which may be useful for diagnosing problems.  For example, `-XX:-AOTClassLinking` (note the `-` minus sign) disables class loading and linking.  It would also disable subsequent AOT optimizations, if any, such as AOT compilation.  If the production run told to disable AOT loading, the VM attempts to fall back to treating the CDS assets as pre-parsed classes, to be loaded in the traditional “just in time” order.

The `-XX:+AOTClassLinking` option puts an attribute into the Cache Data Store that instructs the VM to bring cached classes into an loaded state, immediately on startup.  This ensures that classes used by the application (as discovered by the AOT training run) are immediately available.  However, cached classes which cannot be AOT-loaded (such as those with user-defined class loaders) are loaded only on demand (that is, just in time), from a pre-parsed state in CDS.

The `-XX:+AOTClassLinking` option also enables subsequent AOT processing, specifically AOT linking of classes which are AOT-loaded.  Only constants which refer to other AOT-loaded classes are linked.

Class constants which configure the building of lambdas and string concatenation logic are linked ahead of time.  This is done by running the relevant `invokedynamic` bootstrap methods and dumping CDS assets which encode the resulting chains of method handles and hidden classes.  In this way, `-XX:+AOTClassLinking` supports AOT loading and linking of classes which are dynamically generated, not just those which are on the class path or module graph.  This AOT processing of bootstrap methods is limited to methods in `java.base` which are known to be free of side effects; it cannot (at present) be extended to arbitrary methods from other language runtimes.

Another kind of AOT processing (in the future) is the collection of profiles, under the (future) flag `-XX:+AOTMethodProfiling`.  This would capture selected method profile information from the training run and assemble it into the CDS archive, for use during the production run.  The production run contributes its own profiling information as well, and the VM compiler will use the “freshest” profile information available.

Another kind of AOT processing (in the future) is the saving of compiled code profiles, under the (future) flag `-XX:+AOTMethodCompilation`.  This would compile methods observed to be hot during the training run, and assemble them into the CDS archive.  The VM loads them as needed to accelerate startup or warmup.  The production run contributes its own JIT-compiled methods as well, and the VM will execute the “freshest” methods available.

### Consistency between training and production

[Consistency between training and production]: <#consistency>
<a name="consistency"></a>

As a general principle, if a training run (and any subsequent dump command) generates a CDS archive, and if the VM chooses to use it in a production run, the production run will produce substantially the same results as if the VM had ignored the CDS archive.

Of course, the two runs might have differences in timing, footprint, and order of access to system resources like the file system.  And some aspects of Java execution are intrinsically non-reproducible, if they use the entropy generated by physical processor concurrency or a true random number generator.  But with or without the archive, the VM will run the application in a way that complies with the Java VM specification, which means that, either way, results will comply with programmer expectations.

In order to ensure that CDS archive contents are relevant, CDS enforces rules ensuring consistency between training runs and production runs.  In short, CDS ensures that, in a real sense, both runs are processing the same application.  Indeed, these rules embody what it means for two application runs to be “the same”.

Here are the consistency rules CDS enforces:

 - Both runs must use the same JDK release.

 - Both runs must use the same processor ISA family, such as x86 or ARM (at some particular standard level).  As a speculative optimization, CDS may assume exactly the same hardware (including specialized ISA extentions) but provide a fallback to a more general processor (with fewer ISA extensions).

 - Both runs must use the same native data formats, including address word size and byte order.  This is usually implied by the ISA family.

 - Both runs must use the same size and encoding for managed references.  (Whether the heap base address or scale factor of compressed oops can vary is platform-dependent.)  If this encoding is set automatically (via ergonomic logic), the VM will attempt to align the setting in the production run, and reject the CDS archive if this fails.

 - Both runs must use consistent class paths.  The production run may specify extra class path entries, appended to the end; otherwise, the class paths must be identical.

 - Both runs must have the same module graph and module system options set.  If present, the use of `-m` or `--module` options must be consistent.

In some cases, a training run will refuse to generate a CDS archive if there is no possibility of running “the same application” in production.  Here are the cases:

 - Some module system features are not supported by CDS.  These are `--limit-modules`, `--patch-module`, and `--upgrade-module-path`.

 - Only JAR-based class paths are supported by CDS.  Directory-based class paths cannot be checked for consistency, since directory contents may change concurrently with execution of the application.

 - ZGC is not in supported in CDS production runs.  But see <https://bugs.openjdk.org/browse/JDK-8326035>.

Non-supported features may be supported in the future.  Consistency requirements may be relaxed in the future.

Each CDS archive records enough information to make necessary consistency checks.  Tools to inspect and manipulate such information may be created in the future.

If the VM determines it cannot a CDS archive, it will run without it (if `-Xshare:auto` is set) or emit an error diagnostic (if `-Xshare:on` is set).

CDS accepts many differences between training and production runs:

 - Each run may use a different CPU implementation within the same processor family.

 - Each run may request a different GC, as long as it is supported by CDS.  If the production run requests an unsupported GC, the VM may refuse to use the archive, or else simply ignore object graphs stored in the archive, limiting certain optimizations.

 - The runs may specify different main classes on the command line, or otherwise spend their time in different parts of the code base.  CDS will provide benefit to the production run only so far as it reuses loading and linking decisions made in the training run, but the VM will still execute correctly if the production run goes totally “off script”.

 - The two runs may have different environmental settings, such as Unix environment variables or Java properties.  If some environmental setting is internally significant to the JDK, and it differs between training and production, it is up to the VM and JDK code to choose which setting to honor, or whether to discard the CDS archive altogether.

 - The property `java.lang.Integer.IntegerCache.high` is internally significant.  It configures the cache of object identities relevant to `int` autoboxing.  Currently, CDS respects the larger of the two settings, in the production run.

Some CDS optimizations, such as the provisioning of interned strings or the linking of invokedynamic bytecodes, are implemented using archived Java heap objects. Therefore, these optimizations will not be available for garbage collectors that do not support archived Java heap objects (e.g., ZGC). However, most CDS optimizations, such as the AOT class loading, and AOT linking of references to classes, fields, and methods, are available regardless of choice of collector.

### Additional limitations

[Additional limitations]: <#limitations>
<a name="limitations"></a>
Here are some additional practical caveats and limitations on the use of CDS, beyond the basis requirement of consistency between training and production runs:

 - An AOT loaded class remains present in the VM, even if the application (as the result of its dynamic behavior) does not actually request loading of that particular class.  Such a class is not subject to class unloading.  Therefore it will use up memory footprint, where it would not if it were loaded just in time.

 - User-defined class loaders will not participate in AOT loading activities.  This is because at present there is no technique for tracking the identity of a user-defined class loader across both training and production runs.  The effect of this limitation is to load such classes just in time, giving them reduced performance.  Present work is thought to provide groundwork necessary to overcome this limitation, by first stabilizing those classes which define the user-defined class loaders.

 - There are a number of corner cases where classes cannot be loaded ahead of time.  (These may include classes with a user-defined class loader, signed classes, and classes which use a very old version of the verifier.)  A class which can itself be AOT-loaded might be not be fully AOT-linkable to another class which cannot be AOT-loaded.  Sometimes CDS may choose to defer loading of a class simply because of a footprint limitation.  It is safe in all such cases to fall back to loading and linking on a just-in-time basis.  Such limitations may be addressed in the future, if and when they prove significant.

 - Defining a class using `MethodHandles.Lookup::defineClass()` is an irreversible decision _if the class is named_.  Such calls will result in a `LinkageError` with a message about attempted duplicate class definition, if the affected named class was also AOT-loaded in the CDS archive.  This is a standard response to an attempt to define the same class name twice.

 - The only way to make a training run at present is to have the application process some representative workload.  It should run at least through startup, and must then exit, to signal trigger creation of the CDS archive.  Possible future work on AOT workflows may add new tools to help the programmer more flexibly define and evaluate such training runs and workloads.

### Choosing a training run

[Choosing a training run]: <#training>
<a name="training"></a>

A training run captures application configuration decisions and execution history, in the expectation that this information will be relevant to the production runs that come later.  Therefore, to be as useful as possible, a training run should resemble the intended production runs, to the extent that it fully configures itself and exercises all code paths that will be required in the production runs.

Here are some specific tips:

 - At any given point during the production run, CDS only confers benefits from a training run that also “got this far”.  The production run will only “expect” events which actually happened during the training run.

 - Consider putting your training run driver logic in its own main class, `MyAppTraining`.  It could be a wrapper around the main class of the production run which takes care  to exercise all common modes and sub-commands of the application.

 - For optimizing startup time, the set of classes loaded during training should mostly consist of the classes which must be loaded when production starts up.  To observe class loading, `-verbose:class` or another log option can be useful for checking that the training run is loading the right classes.

 - The training run should avoid loading many classes _not_ used by the production run, since those unused classes will add to CDS footprint.  This means very large test suites can add a footprint overhead to CDS.  Future work may examine ways to benefit from such classes but filter them out of the CDS archive file.

 - When training for startup (not full warmup), focus on running a broad set of short verification scenarios (also known as “smoke tests” or “sanity tests”).  This is often enough work to load all the classes you will need.  Avoid very large test suites that cover rare corner cases, or seldom-used functionality, or stress or regression testing, which do not help to characterize startup activities.

 - As a workaround to omit unused test classes, an advanced user might intervene in the assembly phase, which is the second part of the original CDS workflow, that creates the archive using `-Xshare:dump`.  On that command line, the class path can be edited to omit classes (such as test drivers) used only by the training run.  This may place unresolved references in the CDS archive for the missing test classes, which the production run will never need to resolve.  But this workaround may have stability problems, since CDS does not expects to encounter “holes” in the class path.  Future work may provide a better way to manually suppress such classes.

 - If the application takes external inputs from clients on the network or users through a GUI, a mocked-up workload may be necessary to exercise the classes that handle such inputs.  If such classes are omitted from CDS, they will be loaded from the class path or module graph, in just-in-time order during the production run.  Future work may consider ways to manually add such classes, when they are known, although there is no complete substitute for just running the application.

 - When training for warmup (not just startup), the training run should run long enough for the VM to compile optimized code to store in CDS.  (This assumes an AOT method compilation feature, currently in prototype, not yet delivered.)  The longer the training run exercises the warm code paths, the more optimized code will be generated for use in production.

 - A short training run cannot do all the warmup work required by a long production run.  Short training runs benefit startup only, as they exercise all the one-time startup tasks needed by the production run, however long it will run.

 - Future work may add convenient “hooks” for delimiting and tuning a training run.  The assembly phase may be triggered after a specified amount of class loading, or JIT compilation, or execution of specified methods.

### Measuring startup and warmup

[Measuring startup and warmup]: <#measuring-time>
<a name="measuring-time"></a>
Although startup and warmup are similar concepts, to measure them properly, one must understand their distinction.  For practical purposes, they are defined in terms of some particular application performing a repeatable workload, such as a request server.  Startup time is how long the VM takes to load and execute enough code in the JDK, in libraries on the class path or module graph, and in the application, so that the application can start to serve requests.  Warmup time is how long the VM takes to optimize a running application so that it serves requests with peak performance. Warmup usually consumes more resources (time and memory) than startup.

In more detail, startup is a series of one-time setup tasks, while warmup is a continuing optimization.  During startup, the VM and application load, link, and initialize classes, and configure other resources such as Java objects.  An application warms up over time, first as the VM selectively compiles byte code from class files to machine code, and then as the VM tracks “hot spots” in application code and reoptimizes their machine code.  Besides code generation, the VM tunes certain ergonomic settings during warmup.

Warmup and startup overlap during the milliseconds after the application launches.  And both activities can trail off into an indefinite future:  An application can run for seconds or minutes and suddenly perform new startup activities because it accepts a new kind of request.  The VM can also work for a long time optimizing the application, eventually (after seconds or minutes) reaching a steady state with peak performance.  Even then, if a new kind of request suddenly arrives, the VM may have to re-enter warmup activities to accommodate new code paths.  Both startup and warmup tasks can be addressed by AOT or JIT techniques, whether speculative or not, and usually all of the above.  Thus, startup and warmup are distinct sets of activities, and each deserves its own attention when assessing and improving VM technology.

In the big picture, startup and warmup are not the only important measures of quality.  In carrying out its duties, an application should consume moderate amounts of time and space, delivering good throughput (time per workload unit) and footprint (working memory size).  Of course, it should also be correct (producing the right answers) and stable (predictable execution, without crashes or any other misbehavior).  Throughput, correctness, and stability have always been core values within the Java ecosystem.  Project Leyden is making a fresh focus on improving startup, warmup, and footprint, by shifting selected computations to new points in time, either earlier (ahead of time, AOT) or later (just in time, JIT).  Within that big picture, this work is about AOT optimizations to improve startup, and eventually warmup.

Each deployed application will need its own specific definition of what constitutes one repetition of its repeatable workload; this could be a service request, or an integration test, or a benchmark, or a stress test, or some other “omnibus test” of many parts of the application.  The first repetition loads and initializes all relevant classes and application data structures, while subsequent repetitions spur the VM to optimize the application, eventually reaching peak performance.  In the setting of such an application and its repeatable workload, warmup can be measured as the time to reach a given fraction (such as 95%) of the eventual peak throughput, while startup can be measured as the time to bring the first workload repetition up to some application-specific “ready point”, or else to the end of the first repetition of the workload.

### A brief history of CDS

[A brief history of CDS]: <#cds-history>
<a name="cds-history"></a>
In one form or another, CDS has been built into the HotSpot VM [since JDK 5] in 2004.  At first its expanded name was “Class Data Sharing”, which is reflected in options like `-Xshare:…`.  Over the years it has expanded its capabilities beyond the storage of shareable class data, so the preferred alternative expansion for CDS is now “Cache Data Store”.  Either phrase refers to the same technology.

[since JDK 5]: <https://web.archive.org/web/20040604034719/http://java.sun.com/j2se/1.5.0/docs/guide/vm/class-data-sharing.html>

Since 2017, every Java runtime has an AOT cache of over 1000 definitions for core JDK classes, created when the JDK is built from source code.  In this sense, CDS is ubiquitous, even for Java programmers who have never heard of it.  Even so, CDS has been very much a “power user” feature over most of its existence.

_FIXME_ (More here about major CDS feature introductions, such as the dynamic archive, or cached objects.)

### CDS and sharing

[CDS and sharing]: <#sharing>
<a name="sharing"></a>

CDS uses memory mapping to quickly populate VM memory with the whole content of the CDS archive file, allowing the VM to pick and choose assets within the file to adopt into its live metadata.  This mapping is relocatable, but is organized to prefer a certain base address, if that is available.  If the preference is met, the mapped file does not need to have its pages edited to relocate their embedded pointers (and thus “dirtied” by copy on write).  Clean pages allows sharing of mappings between VM processes, reducing footprint.  This behavior is the motivation for the (now obsolete) acronym expansion “Class Data Sharing”.

But it should be noticed that modern CDS deployments often lose much of their page sharing due to dynamic relocations, because mapping addresses are made unpredictable by current practices such as address space layout randomization (ASLR).

With any AOT technology like CDS, there is always a tension between either under-provisioning, which may force VM startup to consume more CPU as it repeats work, or else over-provisioning, which may cause unused resources to be consume memory.

Future work is likely to improve footprint by some combination of “clawing back” sharing lost to ASLR, further tuning the tradeoff between over- and under-provisioning of assets, and compressing seldom-used assets offline (trading time for space).

### Glossary

[Glossary]: <#glossary>
<a name="glossary"></a>
Here is a list of terms which are useful when discussing CDS.

<!-- NOTE: please keep these in alphabetical order; use `sort -df` if needed -->

 - adoption - the decision by the VM, during a production run, to use a particular CDS asset (and not called “loading” or “linking” because that would be confusing)
 - ahead of time (or AOT) - an adjective describing the ordering of some optimizable activity, as somehow provisioned beforehand
 - assembly phase - a special operation of the VM which consolidates information gathered during a training run and assembles them into a memory mappable image and write it to a CDS archive file
 - Cache Data Store (or CDS) - a cache which stores VM data, specifically to improve application startup, warmup, or footprint
 - CDS archive - historical name for a CDS file
 - CDS asset - a block of data in a CDS file that encodes some particular decision or group of related decisions (e.g., a loaded class, a linked constant pool entry, a method profile, a compiled method, a cached reflective object, etc.)
 - CDS file - a file which contains the data comprising a Cache Data Store; the data is organized as a memory mappable collection of logically independent assets
 - compiled code (or “nmethod”) - an entity in the VM code cache which contains optimized native code for some method; can be a (hypothetical) CDS asset
 - just in time (or JIT) - an adjective describing the ordering of some optimizable activity, as executed lazily or on demand (distinct from the noun JIT, which is the JIT compiler)
 - linked constant - a slot in a tabular metadata entity (specifically, a constant pool) which allows one class (the constant pool holder) to use another class, a method, or a field; can be a CDS asset
 - loaded class - a metadata entity which embodies a loaded class or interface; can be a CDS asset (either pre-parsed or “live”)
 - method profile - a metadata entity which provides information about past (and probable future) executions of a method; can be a (hypothetical) CDS asset
 - relocatable segment - one of a few large regions of a CDS file which is mapped as a unit into VM memory; assets in the same segment are always at a fixed relative offset
 - relocation - how the image of a pointer in a mapped CDS file becomes a “live” pointer in the VM
 - relocation map - a bitmap in the CDS archive which locates pointers, thus guiding the relocation process
 - run, production - an application run which uses a CDS file
 - run, training - an application run which produces a CDS file
 - workflow, auto-training - a non-standard workflow where application runs are classified as training or production runs automatically and transparently
 - workflow, non-standard - an (hypothetical) interrelated series of application runs that produce and/or consume CDS data, but are not the standard workflow supported by CDS
 - workflow, standard - a training run whose archive is then used by a set of production runs