Bug ID: JDK-8292818 replace 96-bit representation for field metadata with variable-sized streams

Type: Enhancement
Component: hotspot
Sub-Component: runtime
Affected Version: 20

Priority: P3
Status: Resolved
Resolution: Fixed

Submitted: 2022-08-23
Updated: 2025-05-22
Resolved: 2023-03-17

JDK 21
21 b15Fixed

Replace 96-bit representation for field metadata with variable-sized streams.

The existing representation for field metadata uses 6 non-optional metadata elements of size 16 bits, for a total non-optional size per field of 96 bits.

Additional optional metadata elements are present for contended group and for generic signature; these are gated by access flag bits. The logic which locates such fields is very ad hoc and hard to read. It also does not scale to the next optional.

Luckily, nearly all accesses of field metadata are streaming, so it will be easy to use a variable-sized data structure per field, where access flag bits gate the presence of optional elements.

An updated format should (as today) stream over field elements, and be tolerant of data-dependent sizes. For example, in the very common case where a field does *not* have a `ConstantValue` initializer attribute, the 16 bits present today (in the 96-bit layout) should be absent altogether. When the element is present, then a bit (a new bit) in the access flags should announce it, so that the streaming reader can read and/or skip that extra element.

Another problem with the current representation for fields is that the 16 access flag bits are completely used up. This is easy to fix by increasing the size of the access flags element to 32 bits (like methods and classes). The cost of this can be paid for by making the constant initializer element almost always absent.

Another problem with the current 16-bit representation is that offsets, which are naturally 32 bits, require two u2 elements, that must be manually packed and unpacked. This is complicated to read and maintain. (Actually, the whole thing is complicated to read and maintain, isn't it??)

By breaking field metadata out of its 96-bit limitations, we will make our VM simpler for existing applications and future-proof for new ones, such as special markings for fields in Leyden and/or Valhalla.

Fixed size metadata allows simple random access, and conversely we will have to give this up. For an occasional random access to fields, such as "get me the data for field #42", a linear search through the stream is probably OK. If not, there are incremental engineering investments in indexing that can reduce or limit the cost of linear search. These additional investments are not described here, but are easy to design. (You can associate an index table with the compressed field metadata bundle, so that the Nth field is associated with the place in the bundle where that field's metadata begins. And this need not be done for every field, but perhaps for every 8th field, trading speed for space until we are happy.) But, if you don't need it, don't design it.

But wait, there's more. Once we embrace variable-sized field metadata, it is almost a no-brainer to encode the individual elements using variable size as well, as long as that variable size encoding is really fast to parse. And we have this in our source base already, the very good (if I may say so) UNSIGNED5 format from Pack200, using in CompressedStream. This format is super-fast to unpack, so that GC maps use it, and it does not appear to be a burden to stack walking code. It will probably also not be a burden to field lookup or other metadata-walking code.

Thus, the unit of field metadata should be a byte (u1) not a short (u2). Thus, the metadata arrays should be of type Array<u1> not Array<u2>.

So, to sum up, field metadata should be represented in Array<u1> arrays, as compressed bundles of 32-bit ints, representing field descriptions. The descriptions should have their current elements, of flags, name, signature, initializer (but optional), generic-signature (still optional), offset (now a single item), and contended group (still optional).

A typical field will have just the four elements of flags, name, signature, and offset. If they were uniformly represented as u4 words, then the format would be 128 bits, which is larger than the current 96 bits. But if we go a litter further and use UNSIGNED5, then suddenly the typical field will require 1-2 bytes per item (very very few will require more), for a typical field metadata size of 32 bits to 64 bits. This is a win in footprint.

And it will also be a win in maintainability, since the logical size of each element of field metadata is 32 bits (even if physically it only requires a single byte, or if optional a single bit). Thus, after a very low-level decompression, algorithms that work on field metadata will be 32-bit clean. Once you have a foundation you can trust to store 32-bit logical values in a workable physical format (UNSIGNED5) you can forget about that physical format.

This proposal changes only the storage format of metadata. The layer on top, used by most clients, is the FieldStream types. These types will change in their internals (since they are coupled to the storage format) but their APIs will not change, so client code will be relatively untouched.

There's a remaining problem, and that is JVMTI, which uses random access and atomic 16-bit updates to peek and poke field metadata fields to register field watchpoints. I hope we can agree that using metadata this way is bad. Metadata should be cleanly read-only, not "sometimes written when JVMTI feels like it". Part of this proposed change should move the couple of bits that JVMTI mutates into a separate byte array (or maybe possibly part of the byte array holding the compressed data, if we really don't care about mixing RO and RW data). This might seem like a show-stopper, but it's really just more of the same future-proofing. Other features (such as Leyden field-initialization tracking) also need a bit or two per field that is mutable. Such features are best implemented, probably, in the same way as the JVMTI bits, and they can share bit-positions in a common byte array, indexed by field ordinal number. So, part of this proposed change introduces a second Array<u1>, with mutable contents, indexed by field ordinal, used for JVMTI (2 bits), the rewriter's anomaly detector (1 bit), and future features (another bit or two). Lazy finals, if we do them, will need a mutable bit or two to track state.

If we like how this turns out, there are other bits of metadata in HotSpot that can also be cleaned up (made simpler and faster and smaller) in the same way.

One cleanup we can consider connected to the separation of RO metadata from RW control data is to remove the atomic bit-twiddling logic from accessFlags.hpp. That's that wrong place to put that stuff, probably, since nearly all access flags are temporary register values, for which atomic updates are crazy overkill, and totally misleading to the casual reader.

Changeset: bfb812a8 Author: Frederic Parain <fparain@openjdk.org> Date: 2023-03-17 20:18:36 +0000 URL: https://git.openjdk.org/jdk/commit/bfb812a8ff8bca70aed7695c73f019ae66ac6f33

17-03-2023

A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/12855 Date: 2023-03-03 14:50:34 +0000

06-03-2023

Blocks :	JDK-8292758 - put support for UNSIGNED5 format into its own header file
Blocks :	JDK-8304069 - ClassFileParser has ad-hoc hashtables
Causes :	JDK-8357576 - FieldInfo::_index is not initialized by the constructor
Causes :	JDK-8352075 - Perf regression accessing fields
Relates :	JDK-8317692 - jcmd GC.heap_dump performance regression after JDK-8292818
Relates :	JDK-8319650 - Improve heap dump performance with class metadata caching
Relates :	JDK-8316342 - CLHSDB "dumpclass" command produces invalid classes
Relates :	JDK-8305490 - CLHSDB "dumpclass" command produces classes with invalid field descriptors
Relates :	JDK-8293118 - AccessFlags should be just the classfile flags