Bug ID: JDK-8296344 Remove dependency on G1 for writing the CDS archive heap

JDK-8296344 : Remove dependency on G1 for writing the CDS archive heap

Type: Enhancement
Component: hotspot
Sub-Component: runtime
Affected Version: 20

Priority: P4
Status: Open
Resolution: Unresolved

Submitted: 2022-11-03
Updated: 2023-01-05

Versions (Unresolved/Resolved/Fixed)

The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.

JDK 21
21Unresolved

Related Reports

Relates :	JDK-8297914 - Remove java_lang_Class::process_archived_mirror()
Relates :	JDK-8298612 - Refactor archiving of java String objects
Relates :	JDK-8298601 - Refactor archiving of java.lang.Module objects
Relates :	JDK-8297313 - Refactor APIs for calculating address of CDS archive heap regions
Relates :	JDK-8296263 - Uniform APIs for using archived heap regions
Relates :	JDK-8298048 - Combine CDS archive heap into a single block
Relates :	JDK-8298600 - Prerequisites for JDK-8296344: Remove dependency on G1 for writing the CDS archive heap
Relates :	JDK-8251330 - Reorder CDS archived heap to avoid oopmaps for objects when possible

Description

Currently the dumping of the CDS archive heap has complex interaction with G1. Each time a Java object needs to be archived, we allocate a copy of this object using G1CollectedHeap::archive_mem_allocate(). The problems are:

- The complex interface makes it difficult to implement heap archiving for other collectors.

- When the G1 heap is fragmented, we may not be able to allocate the archived objects in the desired address range, resulting in a suboptimal archive.

Proposal:

Since we don't use the archived objects at dump time, they don't need to be stored inside the dumptime heap. Instead of asking G1 to give us real memory from the dump time heap for storing the archived objects, we manage a buffer ourselves when copying the objects.

The goal is to generate a CDS heap image that has the same format as generated by the previous dumping algorithm. I.e.,

- The archive heap is divided in 'open' and 'closed' parts.
- At runtime, the archive heap can be mapped with the existing G1CollectedHeap::alloc_archive_regions() API

(Note: a follow-up RFE will simplify the runtime mapping code and consolidate all archived objects into a single block. No more 'open' and 'close' parts -- see JDK-8298048).

=================================================
Algorithm:

- Identify the java heap objects that need to be archived => remember these objects in a hashtable. Each object is associated with an 'open' or 'closed' attribute.

- Allocate a GrowableArray as a temporary buffer. The GrowableArray is divided into multiple 1MB blocks.

- Start at position 0 of the buffer: copy the all the 'open' objects in the hashtable sequentially into the GrowableArray

- Advance to the next position P in the buffer, where P is aligned with G1's region size (HeapRegion::GrainBytes)

- Copy the all the 'closed' objects in the hashtable sequentially into the GrowableArray

- When the objects are being copied, add appropriate fillers such that no objects cross 1MB boundaries. (**)

- After all the objects are copied, calculate the requested addresses of the copied objects. We do so by "moving" the GrowableArray such that it will be flushed against the end of the current  G1 heap.

- E.g., if all the copied objects can fit in 2 G1 regions, then the lowest copied object Foo would sit at the lowest address of the second G1 region from the top of the heap. This is called the "requested address" of Foo. By default, we want to map Foo at this address at runtime.

- Relocate all the oop fields in the copied objects according to their requested addresses.

- Write the contents of the GrowableArray into the CDS archive, separating the "open" portion from the "closed" portion.
=================================================

(**) The size of G1 regions depends on the max heap size, but is never smaller than 1MB. By ensuring that no objects in the archive heap cross 1MB boundaries, we can always map the archive regardless of the runtime G1 region size. (This 1MB value may need to be reconsidered when we support archive heap mapping in other region-based collectors.)

The benefit is less CDS-specific code in G1. It also makes it possible to dump the archive heap with non-G1 collectors (see follow-up RFE JDK-8298614).

Comments

Hi [~asmehra] I've updated the Description with more details about the proposed design. I also implemented a prototype: https://github.com/openjdk/jdk/compare/master...iklam:jdk:8296344-remove-cds-heap-dump-dependency-on-G1 (Only the last commit "8296344: Remove dependency on G1 for writing the CDS archive heap" are for this RFE. The other commits are pre-requisites that I will push in different RFEs). The current prototype (as of Nov 30, 2022) makes an extra pass to copy the objects inside a Java bytearray, before copying them into the GrowableArray as stated in the Description. This is suboptimal and confusing. This extra pass will be removed after some of the pre-requisites RFE are integrated. See JDK-8298600.
20-12-2022
Here's a second prototype where I removed the extra pass for copying the objects: https://github.com/openjdk/jdk/compare/master...iklam:jdk:NEW-8296344-remove-cds-heap-dump-dependency-on-G1--STEP2 (only the last 2 commits are relevant to this RFE).
20-12-2022
I am posting few questions regarding this work to better understand its scope and get more clarity for https://bugs.openjdk.org/browse/JDK-8296263: 1. Is it the goal of this RFE to allow archived heap written by any collector to be used by any other collector? For instance writing archive heap with Serial and using it with G1 2. It was mentioned elsewhere (https://github.com/openjdk/jdk/pull/10970#issuecomment-1306207433) that the region boundary for the archived objects need to be recorded. I guess it would be recorded in one of the CDS headers (probably CDSFileMapRegion). If the archive heap is written using non-region based collectors, would the objects be still region-aligned? I guess it depends on the answer to previous question. If we want to allow archived heap to be usable by different collectors then the objects need to be region-aligned. Then the next question would be what would be region boundary for non-region based collectors?
15-11-2022
This REF makes it easier to implement JDK-8251330: "Reorder CDS archived heap to avoid oopmaps for objects when possible"
07-11-2022