JDK-8294992 : JEP 450: Compact Object Headers (Experimental)
  • Type: JEP
  • Component: hotspot
  • Sub-Component: runtime
  • Priority: P4
  • Status: Integrated
  • Resolution: Unresolved
  • Fix Versions: 24
  • Submitted: 2022-10-07
  • Updated: 2024-11-12
Related Reports
Duplicate :  
Duplicate :  
Relates :  
Relates :  
Relates :  
Sub Tasks
JDK-8307617 :  
Description
Summary
-------

Reduce the size of object headers in the HotSpot JVM from between 96 and 128 bits down to 64 bits on 64-bit architectures. This will reduce heap size, improve deployment density, and increase data locality.


Goals
-----

When enabled, this feature

* Must reduce the object header size to 64 bits (8 bytes) on the target 64-bit platforms (x64 and AArch64),
* Should reduce object sizes and memory footprint on realistic workloads,
* Should not introduce more than 5% throughput or latency overheads on the target 64-bit platforms, and only in infrequent cases, and
* Should not introduce measurable throughput or latency overheads on non-target 64-bit platforms.

When disabled, this feature

* Must retain the original object header layout and object sizes on all platforms, and
* Should not introduce measurable throughput or latency overheads on any platform.

This experimental feature will have a broad impact on real-world applications. The code might have inefficiencies, bugs, and unanticipated non-bug behaviors. This feature must therefore be disabled by default and enabled only by explicit user request. We intend to enable it by default in later releases and eventually remove the code for legacy object headers altogether.


## Non-Goals

It is not a goal to

* Reduce the object header size below 64 bits on 64-bit platforms,
* Reduce the object header size on non-target 64-bit platforms,
* Change the object header size on 32-bit platforms, since they are already 64 bits, or
* Change the encoding of object content (i.e., fields and array elements) or array metadata (i.e., array length).


## Motivation

An object stored in the heap has metadata, which the HotSpot JVM stores in the object's header. The size of the header is constant; it is independent of object type, array shape, and content. In the 64-bit HotSpot JVM, object headers occupy between 96 bits (12 bytes) and 128 bits (16 bytes), depending on how the JVM is configured.

Objects in Java programs tend to be small. [Experiments conducted as part of Project Lilliput](https://wiki.openjdk.org/display/lilliput/Lilliput+Experiment+Results) show that many workloads have average object sizes of 256 to 512 bits (32 to 64 bytes). This implies that more than 20% of live data can be taken by object headers alone. Thus even a small improvement in object header size could yield a significant reduction in footprint, data locality, and reduced GC pressure. Early adopters of Project Lilliput who have tried it with real-world applications confirm that live data is typically reduced by 10%–20%.


## Description

Compact object headers is an experimental feature and therefore disabled by default. Compact object headers can be enabled with  `-XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders`.

### Current object headers

In the HotSpot JVM, object headers support many different features:

* *Garbage collection* — Storing forwarding pointers and tracking object ages;
* *Type system* — Identifying an object's class, which is used for method invocation, reflection, type checks, etc.;
* *Locking* — Storing information about associated light-weight and heavy-weight locks; and
* *Hash codes* — Storing an object's stable identity hash code, once computed.

The current object header layout is split into a *mark word* and a *class word*. The mark word comes first, has the size of a machine address, and contains:

    Mark Word (normal):
     64                     39                              8    3  0
      [.......................HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH.AAAA.TT]
             (Unused)                      (Hash Code)     (GC Age)(Tag)

In some situations, the mark word is overwritten with a tagged pointer to a separate data structure:

    Mark Word (overwritten):
     64                                                           2 0
      [ppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppTT]
                                (Native Pointer)                   (Tag)

When this is done, the tag bits describe the type of pointer stored in the header. If necessary, the original mark word is preserved (*displaced*) in the data structure to which this pointer refers, and the fields of the original header, i.e., the hash code and age bits, are accessed by dereferencing the pointer to get to the displaced header.

The class word comes after the mark word. It takes one of two shapes, depending on whether compressed class pointers are enabled:

    Class Word (uncompressed):
    64                                                               0
     [cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc]
                              (Class Pointer)

    Class Word (compressed):
    32                               0
     [CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC]
         (Compressed Class Pointer)

The class word is never overwritten, which means that an object's type information is always available, so no additional steps are required to check a type or invoke a method. Most importantly, the parts of the runtime that need that type information do not have to cooperate with the locking, hashing, and GC subsystems, which can change the mark word.

### Compact object headers

For compact object headers, we remove the division between the mark and class words by subsuming the class pointer, in compressed form, into the mark word:

    Header (compact):
    64                    42                             11   7   3  0
     [CCCCCCCCCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHVVVVAAAASTT]
     (Compressed Class Pointer)       (Hash Code)         /(GC Age)^(Tag)
                                  (Valhalla-reserved bits) (Self Forwarded Tag)

Locking operations no longer overwrite the mark word with a tagged pointer, thus preserving the compressed class pointer. GC forwarding operations become more complex in order to preserve direct access to the compressed class pointer, requiring a new tag bit, as discussed below. The size of the hash code does not change. We reserve four bits for future use by [Project Valhalla](https://openjdk.org/projects/valhalla/).

### Compressed class pointers

Today's compressed class pointers encode a 64-bit pointer into 32 bits. They are enabled by default, but can be disabled via `-XX:-UseCompressedClassPointers`. The only reason to disable them, however, would be for an application that loads more than about four million classes; we have yet to see such an application.

Compact object headers require compressed class pointers to be enabled and, moreover, reduce the size of compressed class pointers from 32 bits to 22 bits by changing the compressed class pointer encoding.

### Locking

The HotSpot JVM's object-locking subsystem has two levels.

  - _Lightweight locking_ is used when the locked object's monitor is uncontended, no thread control methods (`wait()`, `notify()`, etc.) are called, and no JNI locking is used. In such cases, HotSpot atomically flips the tag bits in the object header from `01` (unlocked) to `00` (lightweight-locked). No additional data structures are required, and no other header bits are used.

  - _Monitor locking_ is used when the locked object's monitor is contended, thread control methods are used, or lightweight locking is otherwise inadequate. To indicate this state, HotSpot atomically flips the tags bits in the object header from `01` (unlocked) or `00` (lightweight-locked) to `10` (monitor-locked). Monitor locking creates a new data structure to represent the object's monitor but, as with lightweight locking, does not use any other header bits.

HotSpot also supports the legacy _stack-locking_ mechanism. This spiritual predecessor to lightweight locking associates the locked object with the locking thread by copying the object header to the thread's stack and overwriting the object header with the pointer to the header copy. This is problematic for compact object headers because it overwrites the object header and thus loses crucial type information. Therefore, compact object headers are not compatible with legacy locking. If the JVM is configured to run with both legacy locking and compact object headers then compact object headers are disabled.

### GC forwarding

Garbage collectors that relocate objects do so in two steps: First they copy an object and record the mapping between its old and new copies (i.e., *forwarding*), then they use this mapping to update references to the old copy in either the entire heap or just a particular generation.

Of the current HotSpot GCs, only ZGC uses a separate forwarding table to record forwardings. All other the GCs record forwarding information by overwriting the header of the old copy with the location of the new copy. There are two distinct scenarios that involve headers.

- *Copying phases* copy objects to an empty space. The forwarding pointer to each new copy is stored in the header of the old copy. The original object header is preserved in the new copy. Code that reads the object header from the old copy follows the forwarding pointer to the new copy.

  If copying an object to its new location fails, the GCs install a forwarding pointer to the object itself, thus making the object _self-forwarded_. With compact object headers, this would overwrite the type information. To address this, we indicate that an object is self-forwarded by setting the third bit of the object header rather than by overwriting the entire header.

- *Sliding phases* relocate objects by sliding them down to lower addresses within the same space. This is typically done when heap memory is exhausted and not enough space is left for copying objects. When that happens, a last-ditch effort is made to do a full collection using a sliding collection, which works in four phases:

  1. *Mark* — Determine the set of live objects.

  2. *Compute addresses* — Walk over all live objects and compute their new locations, i.e., where they would be placed one after another. Record those locations as forwardings in the object headers.

  3. *Update references* — Walk over all live objects and update all object references to point to the new locations.

  4. *Copy* — Actually copy all live objects to their new locations.

  Step 2 destroys the original headers. This is also a problem for the current implementation: If the header is *interesting*, that is, it has an installed identity hash code, locking information, etc., then we need to preserve it. The current GCs do that by storing these headers in a side table and restoring them after a GC. This works well because there are usually only a few objects with interesting headers. With compact object headers, every object comes with an interesting header because now that header contains the crucial class information. Storing a large number of preserved headers would consume a significant amount of native heap.

  To overcome this problem, we use a simple encoding of the forwarding pointer which can address up to 8TB of heap in the lower 42 bits of the object header. Compact object headers are currently not compatible with larger heaps when collectors other than ZGC are used. If the JVM is configured to use a heap larger than 8TB and does not use ZGC then compact object headers are disabled.

### GC walking

Garbage collectors frequently walk the heap by scanning objects linearly. This requires determining the size of each object, which requires access to each object's class pointer.

When the class pointer is encoded in the header, some simple arithmetic is required to decode it. The cost of doing this is low compared to the cost of the memory accesses involved in a GC walk. No additional implementation work is needed here since the GCs already access class pointers via a common VM interface.


## Alternatives

* *Continue to maintain 32-bit platforms* — The mark and class words in object headers are sized as machine pointers, so headers on 32-bit platforms are already 64 bits. However, the difficulty of maintaining the 32-bit ports, coupled with the industry move from 32-bit environments, makes this alternative impractical in the long term.

* *Implement 32-bit object headers* — With more effort, we could implement 32-bit headers. This would likely involve implementing on-demand side storage for identity hash codes. That is our ultimate goal, but initial explorations show that it will require much more work. This proposal captures an important milestone that brings substantial improvements that we can deliver with low risk as we work further toward 32-bit headers.


## Testing

Changing the header layout of Java heap objects touches many HotSpot JVM subsystems: the runtime, all garbage collectors, all just-in-time compilers, the interpreters, the serviceability agent, and the architecture-specific code for all supported platforms. Such massive changes warrant massive testing.

Compact object headers will be tested by:

* Tier 1–4 tests, and possibly more testing tiers by vendors which have them;
* The SPECjvm, SPECjbb, DaCapo, and Renaissance benchmark suites to test both correctness and performance;
* JCStress, to test the new locking implementation; and
* Some real-world workloads.

All of these tests will be executed with the feature turned on and off, with multiple combinations of GCs and JIT compilers, and on several hardware targets.

We will also deliver a new set of tests that measure the size of various objects, e.g., plain objects, primitive type arrays, reference arrays, and their headers.

The ultimate test for performance and correctness will be real-world workloads once this experimental feature is delivered.


## Risks and Assumptions

* *Future runtime features need object header bits* — This proposal leaves no spare bits in the header for future features that might need such bits. We mitigate this risk organizationally by discussing object header needs with other major JDK projects, such as [Project Valhalla](https://openjdk.org/projects/valhalla/). We mitigate this risk technically by assuming that identity hash codes and compressed class pointers can be shrunk even further to make bits available should future runtime features need them.

* *Implementation bugs in feature code* — The usual risk for an intrusive feature such as this is bugs in the implementation. While issues in the header layout might be visible immediately with most tests, subtleties in the new locking and GC forwarding protocols may expose bugs only rarely. We mitigate this risk with careful reviews by component owners and by running many tests with the feature enabled. This risk does not affect the product so long as the feature remains experimental and off by default.

* *Implementation bugs in legacy code* — We try to avoid changing legacy code paths, but some refactorings necessarily touch shared code. This exposes the risk of bugs even when the feature is disabled. In addition to careful reviews and testing, we mitigate this risk by coding defensively and trying to avoid modifying shared code paths, even if it requires more work in feature code paths.

* *Performance issues in feature code* — The more complex protocols for compact object headers may introduce performance issues when the feature is enabled. We mitigate this risk by running major benchmarks and understanding the feature's impact on their performance. There are performance costs for indirectly accessing the class pointer, using the alternative stack locking scheme, and employing the alternative GC sliding forwarding machinery. This risk does not affect the product so long as the feature remains experimental and off by default.

* *Performance issues in legacy code* — There is a minor risk that refactoring the legacy code paths will affect performance in unexpected ways. We mitigate this risk by minimizing the changes to the legacy code paths and showing that the performance of major workloads is not substantially affected.

* *Compressed class pointers support* — Compressed class pointers are not supported by JVMCI on x64. We mitigate the immediate risk by disabling compact object headers when JVMCI is enabled. The long-term risk is that compact headers are never implemented in JVMCI, which would forever block removing the legacy header implementation. We assign only a minor probability to this risk since other JIT compilers support compact object headers without intrusive changes.

* *Compressed class pointers encoding* — As stated above, the current implementation of compressed class pointers is limited to about four million classes. Presently, users can work around this limitation by disabling compressed class pointers, but if we remove the legacy header implementation, then that will no longer be possible. We mitigate the immediate risk by providing compact object headers as an experimental feature; in the long term, we intend to work toward more efficient compressed class pointer encoding schemes.

* *Changing low-level interfaces* — Some components that manipulate object headers directly, notably the Graal compiler as the major user of JVMCI, will have to implement the new header layout. We mitigate the current risk by identifying these components and disabling the feature when those components are in use. Before the feature graduates from experimental status, those components will need to be upgraded.

* *Soft project failure* — There is a minor risk that the feature has irreconcilable functional regressions compared to the legacy implementation, e.g., limiting the number of representable classes. A related risk is that while the feature provides significant performance improvements on its own, it comes with significant functional limitations, which might lead to an argument for keeping both the new and legacy header implementations forever. Given that the goal of this work is to replace the legacy header implementation eventually, we consider this a soft project failure. We mitigate this risk by carefully examining current limitations, planning future work to eliminate them, and looking to early adopters reports to identify other risks before we invest too much effort.

* *Hard project failure* — While very unlikely, it may turn out that compact object headers do not yield tangible real-world improvements or that the achievable improvements do not justify their additional complexity. We mitigate this minor risk by gating the new code paths as experimental, thus keeping a path open to removing the feature in a future release should the need arise.

Comments
I think we can proceed with targeting now.
31-10-2024

[~rkennke] based on latest Sergey's comment in "Performance Testing Plan" I am removing my objection for targeting this JEP.
31-10-2024

[~rkennke] I looked on discussion in "Performance Testing Plan" JDK-8314440 about significant regression in Dacapo:pmd when COH is off. It is blocker for JEP. Please, move JEP back to candidate state until performance analysis is complete and all significant regressions when COH is off are addressed.
30-10-2024

Many of the JEPs for preview features have a short subsection that starts "XXX is a preview API, disabled by default" and follows with brief instructions for how to compile/run with preview features enabled. I don't think we've have enough experimental features to create a trend but it seems like this JEP could improve the instructions for regular developers that might come across this JEP and want to try the feature. Right now, it currently reads "Compact object headers are guarded by a new experimental runtime option, -XX:(+/-)UseCompactObjectHeaders. This option is disabled by default". I think you can spell this out a bit more and say that experimental features are disabled by default, run with -XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders to enable.
28-10-2024

[~rkennke] Yes that makes sense. Thanks.
19-05-2023

[~dholmes] Our plan for the future would be: - Implement 32 bit object headers in one of the next releases. This would come with another JEP. The feature and flag would remain experimental. - Once we are sufficiently confident that the feature is complete (e.g. currently, not all platforms are supported, and JVMCI/Graal is also not yet supported), we would make the feature non-experimental and on-by-default, but leave the legacy implementation in place. At this point we would probably also deprecate the flag and legacy impl. I believe this should come with a JEP, too (it has been done that way for ZGC and Shenandoah back in the day when they transitioned from experimental to non-experimental, and perhaps for other features, too). And here is where we should define the acceptance criteria, etc for that transition. - Eventually, in one or two releases after deprecating the flag, it should be disabled and legacy code removed altogether, per the usual process for this. Does this make sense?
09-05-2023

At what point should the path from experimental feature to full-fledged feature be discussed/proposed? Given this JEP and its dependent experimental features how do we map out a path by which these features become non-experimental and then enabled by default? Does this require a further JEP in the future, or just regular JBS issues and CSR request to perform the changes? Where do we define the acceptance criteria to make such a change?
08-05-2023

Endorsed.
04-05-2023

> Class space now resembles a table of x-byte slots, nKlass resembles a class id now, and our variable-sized Klass typically fills 1-2 slots. Excellent. That relieves the pressure to inject additional klass ID bits into “disfavored” secondary klass layouts, and keeps branches out of the klass accessor. This is shaping up nicely. Thank you. There are two potential “splits” here: Primary and secondary klasses with shorter and longer ID formats, and (independently) splitting any single klass into parts accessed by longer and shorter indirection chains. You are investing in the latter, which is the right decision. The former is more of a “break glass in case of emergency” tactic, I think. Or (maybe) a way to experiment with using oop bits (if oops are 64 bits wide) to carry type information. FWIW, in the past I have referred to the artifacts in the second split as “near klass” and “far klass”. But the “near” part could also be used as a “klass cache”, in the sense of CP cache which provides rapid access to data also in the CP itself. Valhalla specialized classes might have a use for a “near klass cache”, to hold one cache per species instead of of just one per klass. This is all firmly in the future, but it fits with the technologies being developed here.
19-04-2023

[~jrose] I have not forgotten your ideas (see also my answer above to Chris). This one is an interesting variation to "split set of classes into primary and secondary ones" since the additional information squirreled away into the object does not need a full slot. With 64-bit headers, we could already use your idea easily with long/obj arrays by hiding an additional 32-bit into the alignment shadow after the array length. The main effect of (https://github.com/openjdk/lilliput/pull/13) was not to shrink the nKlass to (still large) 22 bits. It was to make it possible to use (almost) every encoding point in a nKlass for a separate class while retaining the current variable-sized Klass layout. Class space now resembles a table of x-byte slots, nKlass resembles a class id now, and our variable-sized Klass typically fills 1-2 slots. The obvious disadvantage is alignment waste, and while the gains Lilliput brings are more than enough to yield us a net positive, it is still a thorn in my foot. I am working on a simple patch to mitigate the waste effect by re-using those gaps for other allocations. One interesting consideration is that a high number of classes is typically reached only by applications that generate classes, and these applications often seem to only allocate one object per class. So they won't benefit from Lilliput much since they already pay for one Klass, albeit a tiny one usually, and sometimes even for one CLD. But these applications may even run against any class limit we could think of. Not sure if that is a consideration, but we should eventually decide if we should introduce the notion of a class limit.
19-04-2023

[~jrose] Right, that is not a hard limit. In-fact, in the Lilliput project we already raised that limit considerably, and that change may become part of the delivery of this JEP. Do you think the note about it in the JEP "In the long run, those restrictions will be resolved with more compiler and runtime work. For example, there is work in Project Lilliput to allow much larger addressable class-space with fewer class bits, see Smaller class pointers." points that out well enough, or do you think it needs clarification?
19-04-2023

Regarding the supposed limit of 3 million classes: I’m pretty sure that is not a hard limit, but a soft one which can be fixed by throwing more engineering at it. This insight (if true) improves the risk profile of this project, since if we overflow the class limit, we engineer an incremental solution, rather than throw away small headers. Supporting details: In the “Smaller class pointers” link (https://github.com/openjdk/lilliput/pull/13) there is a conversation where I point out some ways to relax that limit, by using extra bits to find the 100-millionth class (if there ever is one) in the object layout of that class, rather than in the header. Basically, you treat the first 22 bits (or however many) of class ID as the first installment of a var-int which might go to two parts, in extreme circumstances. It’s complexity, but it’s not a hard limit. And that same overall tactic could allow us to buy even smaller class ID fields in the header, even down to 8 bits (if that were useful for some reason in the future).
19-04-2023

Please have one or more Reviewers review this JEP, and add their names to the Reviewers field, before you submit it.
18-04-2023

[~rkennke] Looks good. Thanks for the changes.
11-04-2023

[~cjplummer] Ok, we've revised the Interaction with Class-Pointer-Compression and related sections under Risks, and added a risk section about getting stuck with legacy and new feature ("Soft Project Failure"). Does that make sense?
11-04-2023

[~rkennke] I think what would be best is a risks section that is dedicated to the risk of being stuck with both implementations indefinitely. It should cover the reasons that might happen, and maybe the likelihood of it happening (if it can be meaningfully expressed). On the other hand, if having to maintain both implementations is not a concern, it probably should not be considered a risk either. I guess that is for others to decide.
10-04-2023

[~cjplummer] Thanks for pointing this out! I added some sentences to the 'interaction with class pointer compression' and 'risks and assumptions' sections. Does this address your concerns or do you have a suggestions how to better clarify it? As Thomas explained, there will be several hurdles to take before we can even think about removing the legacy implementation. But we are not there, yet. While we do have plans for all of the current known hurdles (class pointer compression, i-hash width) and also have plans for reducing the header even further (to 32 bits), we do feel that it'll be useful to get the first milestone - 64 bit object headers - into the hands of users, not least to get a better understanding of potentially unknown hurdles and requirements by actual real world workloads (e.g. does anybody ever need more than 3 million classes? Would 16 million suffice? Etc.) Let us know what you think!
10-04-2023

[~cjplummer] There are several nuts we need to crack before removing the old-style object headers for good. UseCompressedClassPointers is one, i-hash width is another. There are probably more. Today, the only reason to run without UseCompressedClassPointers would be if you need to load more classes than what the maximum allowed class space size (atm 3GB) would allow you to load. 3GB give you space for roughly three million classes. If you wanted to go beyond that, you would have to switch off compressed class pointers, which would move all Klass structures to the standard metaspace, which can grow unlimited. In Lilliput, we have been discussing alternate forms of narrow Klass pointers. The first step had been reduction of the nKP from 32 to 22 bits (https://github.com/openjdk/lilliput/pull/13). We plan to go beyond that, and at the same time, solve the infinite-classes problem inherent in the limited class space by adapting John Roses's proposal of first- and second-level-classes (briefly, first-level classes would be present with a - hopefully short - id/nKP in the header; second-level-classes move the id/nKP out of the header into the object body - accessing Klass* from these objects would require one more indirection). However, this step is unnecessary for the first iteration of Lilliput described in this JEP. It will be needed for the final removal of the old header format because otherwise, we will have a hard limit on how many classes we can load, which would be a regression.
04-04-2023

You say: "the feature is gated by an experimental -XX:(-|+)UseCompactObjectHeaders JVM option, which is off by default. Once the feature is deemed stable, we plan to turn it on by default, and remove legacy object header support in some future release." But then, "If uncompressed class pointers are required, then compact object headers cannot be used (see Risks)." So it seems that no matter how well the features works when enabled, if for some reason the platform cannot support compressed class pointers, then the feature cannot be enabled, meaning that legacy object header support cannot be removed. I think you should do a better job of calling this out, and also maybe comment on how realistic it is that legacy object header support can eventually be removed.
03-04-2023

I'm also wondering if this recent addition to this JEP means that there will also be a product flag for the "stack-lock" -> "fast-lock" changes. Please clarify. Update: Also did minor edits to the JEP text. I hope that's okay...
26-10-2022

Thanks Roman. I don't think it should be a diagnostic flag, but full product flag, as users adversely affected by the change will want to switch back and they may want a full product flag to do that (local policy may prohibit running production versions of apps with non-product flags). That said, while I would generally seek that their be a migration path here, from previous discussions on the stack-lock changes I thought you had indicated that making the behaviour selectable was not feasible?
24-10-2022

Thanks, David. I hear your concerns. I have added two paragraphs under 'risks' to outline what I think is a real risk (workloads suffering from some techniques while not benefitting from the header reduction), and to propose a mitigation (introduction of a flag to get back old behavior, if needed). I hope that makes sense?
21-10-2022

I will re-voice my concerns here that this is a big change in one fell-swoop with no migration/transition path and no way to switch back to the old behaviour. I'm very skeptical that sufficient performance testing can be done within the context of OpenJDK development.
18-10-2022