JDK-8310823 : CDS archived object streaming
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 22
  • Priority: P4
  • Status: Closed
  • Resolution: Duplicate
  • Submitted: 2023-06-23
  • Updated: 2024-06-21
  • Resolved: 2024-06-21
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdResolved
Related Reports
Duplicate :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Description
Based on suggestions from [~eosterlund] during the review of JDK-8310160

[1] Allocate an OopStorage with the same number of slots as the number of individual CDS archived objects.

[2] Allocate a fake "scratch" object for every CDS archived object. These can be byte arrays or java.lang.Object

[3] Copy each archived object into its scratch space

[4] Fix up the pointers inside the scratch spaces

Theoretically, GC can happen during [2], and the scratch spaces may end up in arbitrary locations, which would make [4] slow. However, in most cases, the scratch spaces would be one or a small number of contiguous blocks. Within each block, the order of the objects are the same as their order in the CDS archive.

- If we have a single contiguous, ordered block, the pointer fixing can be done with a fast path (same performance as the current implementation)

- If we have a small number of contiguous, ordered blocks, we can speculate the object being fixed and the pointer it points at are in the same block. If the pointer points to a different block, it can be fixed with a quick lookup. We had an implementation that's optimized for up to 4 such blocks. See

https://github.com/openjdk/jdk/blob/65442a2e26afa7c31b5949e7e20606e4066ced3b/src/hotspot/share/cds/archiveHeapLoader.cpp#L270-L324

- Otherwise, the pointer patching needs to be using a hashtable look up. This should be very rare.

======================================
The advantage of this proposal is that we don't need to have CDS-specific code in the GCs anymore. 

The disadvantage is performance may be slower. This could be more problematic in the future for Project Leyden as we expect more objects to be archived. (The current size is about 1MB so perhaps not a big deal).

We might be able to archive the same performance with small tuning of the scratch object allocation:

- CDS tell GC that it's allocating scratch objects, with the requested based address and total size
- Some GCs may be able to honor the request so that the allocated scratch objects are exactly where CDS wants them to be. In this case, the archive heap can be mmaped with no relocation
- Or, the GC could ensure that the scratch objects are in a single contiguous, ordered block. This allows optimized relocation.

Comments
This issue represents my failed attempt to implemented object streaming. A new attempt is done in JDK-8326035.
21-06-2024

I have done more testing with my prototype for JDK-8310823 (Materialize CDS archived objects with regular GC allocation) [1] and found a few issues: + The minimum heap requirement has increased significantly. We allocate many objects during VM start-up. This may fill up the young generation for some collectors, causing the VM to exit as a GC is not yet possible at this stage. Since the generation sizing policy is different in each GC (and not really controllable via -XX:NewSize), the failure mode with small heap size becomes unpredictable. I think this a functional regression, which is more serious that the performance regression in start-up time. + Although it's possible to enable archive heap objects for ZGC with my JDK-8310823 patch , there's only very marginal improvement in start-up time (probably because ZGC is doing many other tasks concurrently during start-up) As we expect more heap objects to be archived in Project Leyden, we need a solution that scales well without blowing up the minimum heap size. For example, with the mainline, an 18MB archived heap can be mapped with -Xmx20M, but with my patch, I needed to use -Xmx40M. To reduce the minimum heap requirement, [~eosterlund] has developed a prototype that materializes the archived objects incrementally. However, it's not clear whether future Leyden developments would support such incremental loading. As we don't see any immediate benefits for JDK-8310823, I would suggest putting that on hold for now, until we get better understanding of the requirements from Leyden. [1] https://github.com/openjdk/jdk/pull/15730
05-10-2023

WIP: https://github.com/openjdk/jdk/compare/master...iklam:jdk:8310823-materialize-cds-heap-with-regular-alloc?expand=1 Discussion for the WIP: https://github.com/openjdk/jdk/pull/14520
06-07-2023