JDK-8179302 : Pre-resolve constant pool string entries and cache resolved_reference arrays in CDS archive
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 10
  • Priority: P2
  • Status: Closed
  • Resolution: Fixed
  • Submitted: 2017-04-26
  • Updated: 2019-11-10
  • Resolved: 2017-08-14
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 10
10 b21Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Sub Tasks
JDK-8180260 :  
Description
At CDS dump time, the java.lang.String objects referenced in the string table are copied into the 'archive' java heap regions and cached as part of the CDS shared archive. The archived String objects are mapped into the runtime java heap and can be used directly.

When resolving constant pool string entries, the resolved Strings are set in the 'resolved_references' array for fast access. The 'resolved_references' is an array that holds references of resolved constant pool entries including Strings, mirrors and methodTypes, etc. At dump time, all constant pool string entries to the existing interned strings can be resolved. The 'resolved_references' arrays should also be cached in the CDS archive. Pre-resolving string constants and caching the 'resolved_references' arrays improves both startup time and runtime performance.

Design
----------
Platform requirements:
Non-Windows, 64-bit platforms. Support G1 GC only. Requires UseCompressedOops and UseCompressedClassPointers.

Goal:
Pre-resolve java.lang.Strings (existing interned Strings) referenced from the constant pool and cache the constant pool 'resolved_reference' arrays. This is a general design that allows future caching of other types of java objects.

==========================================================================
Types of Pinned G1 Heap Regions

The pinned region type is a super type of all archive region types, which include the open archive type and the closed archive type.

00100 0 [ 8] Pinned Mask 
01000 0 [16] Old Mask
10000 0 [32] Archive Mask 
11100 0 [56] Open Archive:   ArchiveMask | PinnedMask | OldMask
11100 1 [57] Closed Archive: ArchiveMask | PinnedMask | OldMask + 1

Pinned Regions

Objects within the region are 'pinned', which means GC does not move any live objects. GC scans and marks objects in the pinned region as normal, but skips forwarding live objects. Pointers in live objects are updated. Dead objects (unreachable) can be collected and freed.

Archive Regions

The archive types are sub-types of 'pinned'. There are two types of archive region currently, open archive and closed archive. Both can support caching java heap objects via the CDS archive.

An archive region is also an old region by design.

Open Archive (GC-RW) Regions

Open archive region is GC writable. GC scans & marks objects within the region and adjusts (updates) pointers in live objects the same way as a pinned region. Live objects (reachable) are pinned and not forwarded by GC.
Open archive region does not have 'dead' objects. Unreachable objects are 'dormant' objects. Dormant objects are not collected and freed by GC.

Adjustable Outgoing Pointers

As GC can adjust pointers within the live objects in open archive heap region, objects can have outgoing pointers to another java heap region, including closed archive region, open archive region, pinned (or humongous) region, and normal generational region. When a referenced object is moved by GC, the pointer within the open archive region is updated accordingly.

Closed Archive (GC-RO) Regions

The closed archive region is GC read-only region. GC cannot write into the region. Objects are not scanned and marked by GC. Objects are pinned and not forwarded. Pointers are not updated by GC either. Hence, objects within the archive region cannot have any outgoing pointers to another java heap region. Objects however can still have pointers to other objects within the closed archive regions (we might allow pointers to open archive regions in the future). That restricts the type of java objects that can be supported by the archive region.
In JDK 9 we support archive Strings with the archive regions.

The GC-readonly archive region makes java heap memory sharable among different JVM processes. NOTE: synchronization on the objects within the archive heap region can still cause writes to the memory page.

Dormant Objects

Dormant objects are unreachable java objects within the open archive heap region. 
A java object in the open archive heap region is a live object if it can be reached during scanning. Some of the java objects in the region may not be reachable during scanning. Those objects are considered as dormant, but not dead. For example, a constant pool 'resolved_references' array is reachable via the klass root if its container klass (shared) is already loaded at the time during GC scanning. If a shared klass is not yet loaded, the klass root is not scanned and it's constant pool 'resolved_reference' array (A) in the open archive region is not reachable. Then A is a dormant object. 

Object State Transition

All java objects are initially dormant objects when open archive heap regions are mapped to the runtime java heap. A dormant object becomes live object when the associated shared class is loaded at runtime. Explicit call to G1SATBCardTableModRefBS::enqueue() needs to be made when a dormant object becomes live. That should be the case for cached objects with strong roots as well, since strong roots are only scanned at the start of GC marking (the initial marking) but not during Remarking/Final marking. If a cached object becomes live during concurrent marking phase, G1 may not find it and mark it live unless a call to G1SATBCardTableModRefBS::enqueue() is made for the object.

Currently, a live object in the open archive heap region cannot become dormant again. This restriction simplifies GC requirement and guarantees all outgoing pointers are updated by GC correctly. Only objects for shared classes from the builtin class loaders (boot, PlatformClassLoaders, and AppClassLoaders) are supported for caching.

Caching Java Objects at Archive Dump Time

The closed archive and open archive regions are allocated near the top of the dump time java heap. Archived java objects are copied into the designated archive heap regions. For example, String objects and the underlying 'value' arrays are copied into the closed archive regions. All references to the archived objects (from shared class metadata, string table, etc) are set to the new heap locations. A hash table is used to keep track of all archived java objects during the copying process to make sure java object is not archived more than once if reached from different roots. It also makes sure references to the same archived object are updated using the same new address location.

Caching Constant Pool resolved_references Array

The 'resolved_references' is an array that holds references of resolved constant pool entries including Strings, mirrors and methodTypes, etc. Each loaded class has one 'resolved_references' array (in ConstantPoolCache). The 'resolved_references' arrays are copied into the open archive regions during dump process. Prior to copying the 'resolved_references' arrays, JVM iterates through constant pool entries and resolves all JVM_CONSTANT_String entries to existing interned Strings for all archived classes. When resolving, JVM only looks up the string table and finds existing interned Strings without inserting new ones. If a string entry cannot be resolved to an existing interned String, the constant pool entry remain as unresolved. That prevents memory waste if a constant pool string entry is never used at runtime.

All String objects referenced by the string table are copied first into the closed archive regions. The string table entry is updated with the new location when each String object is archived. The JVM updates the resolved constant pool string entries with the new object locations when copying the 'resolved_references' arrays to the open archive regions. References to the 'resolved_references' arrays in the ConstantPoolCache are also updated.
At runtime as part of ConstantPool::restore_unshareable_info() work, call G1SATBCardTableModRefBS::enqueue() to let GC know the 'resolved_references' is becoming live. A handle is created for the cached object and added to the loader_data's handles. 

Runtime Java Heap With Cached Java Objects

The closed archive regions (the string regions) and open archive regions are mapped to the runtime java heap at the same offsets as the dump time offsets from the runtime java heap base. 


Comments
The RFE has been integrated on 8/14/17.
30-08-2017

I have created a new task JDK-8184139 for Test Plan, following new agreed upon format.
11-07-2017