JDK-8042668 : Provide GC support for shared heap ranges in Class Data Sharing
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 9
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2014-05-07
  • Updated: 2018-01-08
  • Resolved: 2015-06-13
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 9
9 b72Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Description
Summary
-------
JDK-8059092 describes CDS support for allowing interned strings to be among the objects shared by applications.  This requires the strings and their corresponding arrays to be in a part of the heap that can be archived and later mapped and shared.  This entry describes the required GC/Allocation support, which is targeted for G1 only.

There are two aspects of the support:  dump-time, and restore-time.  

The dump-time support must give CDS a way to allocate space in the heap for the strings and char arrays which are to be shared.
CDS will fill this space with objects and then dump it to an archive file.   The space should be at or near the top of the (maximum) heap, so that the memory range is likely to be valid even if the restore-time heap size differs.   It should be contiguous.

The restore-time support must give CDS a way to allocate the space in the heap corresponding to the dump-time memory range, so that the shared archive can be mapped at that location.  The heap corresponding to that range must be "pinned," that is, never perturbed by GC, so that it can be effectively shared.  No other "pinned" allocation support is needed, such as adding pinned objects at runtime.  

Goals (mostly inherited from JDK-8059092)
-------
- Only provide this support in the G1 collector.
- Only the 64-bit platform with compressed oops and compressed klass pointers need be supported.
- No significant  degradation (<2-3%) of startup time or GC pause time. 

Non-Goals
-------
- Support for specialized allocation or pinned regions in collectors outside of G1.
- Generalized "pinned allocation" support, such as for run-time allocation of pinned objects.
- Support for pinning at granularity finer than G1 heap regions.
- "Humongous" strings need not be supported in the shared archive.
- 32-bit platform support.

Description
-------
At dump time, CDS will perform the dump operation after class loading, from the VM thread.   The part which is relevant here consists of a loop which walks the string table, and for each entry, allocates a string object and corresponding array object and initializes them.  The containing memory range will then be dumped to an archive file, and after creating the archive, the VM will exit.  The aspects of this allocation support that are new are:

- CDS must be able to inquire what the containing memory range is, once the allocation loop is complete.
- The base of the range should be G1-region aligned, to make best use of restore-time memory.
- The allocations should occur in contiguous regions, to minimize the number of archive segments and map operations that are needed.
- The allocation should be at or near the top of maximum heap, so that the memory ranges are likely to be valid even if the restore-time heap is smaller than the dump-time heap.
- Also to support different restore-time heap sizes, individual object allocations must not cross MIN_REGION_SIZE boundaries. 

Note that these objects need not be pinned at dump time.

At restore time, CDS will verify that the narrow oop encoding in use is the same as what was used at dump-time.  At JVM init time, it will request allocation of the specific address range(s) that were initialized/archived at dump-time.  The ranges must be marked pinned, not to be modified by GC.  Because the restore-time heap size may be different than the dump-time heap size, the range base might not be region-aligned (but will still be MIN_REGION_SIZE aligned).  In that case, dummy object(s) should fill the unused portion of the starting region.   Because no humongous strings are allowed at dump-time, there is no need to mark any regions within the range as 'humongous' in addition to pinned.

This allocation routine will return a failure status if the range is not entirely within the maximum heap, or if any of the contained G1 regions are not free.

The CDS code will subsequently mmap the archive file in the range initialized at dump time.  (Since these are dump-time-region-aligned, they are also h/w-page-aligned.)   Objects in the mapped file contain (compressed) pointers to other objects in the same pinned memory range, but not outside it. 

There may be wasted space in the pinned range, though typically only in a single G1 region.  There is no requirement to make this space usable at the present time.   When and if generalized pinned-object allocation is implemented, this space could be used as the first free pinned heap location for allocations.


Implementation Notes
-------

The implementation is summarized here for the benefit of code reviewers.  "Archive Region" support is added to G1 to support the above requirements.

"Archive" regions are G1 regions that are not modifiable by GC, being neither scavenged nor compacted, or even marked in the object header.  They can contain no pointers to non-archive heap regions, and object headers point to shared CDS metaspace (though this last point is not enforced by G1).  Thus, they allow the underlying hardware pages to be shared among multiple JVM instances.

In short, a dump-time run (using -Xshare:dump) will allocate space in the Java heap for the strings which are to be shared, copy the string objects and arrays to that space, and then archive the entire address range in the CDS archive.  At restore-time (using -Xshare:on), that same heap range will be allocated at JVM init time, and the archived data will be mmap'ed into it.  GC must treat the range as 'pinned,' never moving or writing to any objects within it, so that cross-JVM sharing will work.

CDS only requires this support in G1, and in the 64-bit JVM with compressed pointers.  However, there is nothing specific to the 64-bit version or compressed oops in the G1 support.

* Dump-time Support

At dump-time, the CDS code performing the dump will make calls to the allocator for the individual objects which will constitute the shared string archive.  Because it is a goal to allow the dump-time and restore-time heaps to have different sizes, it is desirable to allocate this space at the top of the reserved heap, which may be outside the currently committed heap.  The code must also take into account that the restore-time JVM might have a different G1 heap region size and region boundaries, depending on the heap size.  Therefore, no allocation can cross a min_region_size boundary, because at restore time, that boundary might be an actual G1 region boundary, even if it is not one at dump-time.

To accomplish this, a G1ArchiveAllocator class has been added.  This uses a new G1CollectedHeap::alloc_highest_available_region() routine, which will return the topmost region of the heap which is free (committing uncommitted regions as necessary).  Allocating within the region, the G1ArchiveAllocator operates on min_region_size sub-regions, to avoid allocating any objects that cross that boundary.  It also disallows any allocations that would be considered 'humongous' for min_region_size.

Ultimately, one or more G1 regions may contain the shared string data.  An entry point is provided which returns the used ranges, combining adjacent regions into single ranges when possible.  This routine will also align the ending address up to a requested multiple and fill.  In the case of CDS, this space will later be the target of an mmap, so should be aligned to hardware page size.  The regions used for this allocation are marked as "Archive" regions.  Under Xshare:dump, the JVM exits after capturing the archive region contents, but execution could continue, with those regions marked as "Archive" remaining unmodified by GC.

The G1ArchiveAllocator should be called only from the VM thread at safepoint.  It is invoked via these G1CollectedHeap entry points for initiating archive allocation, allocating some number of objects, and ending the archive range:

  // Facility for allocating in 'archive' regions in high heap memory from
  // the VM thread, and recording the allocated ranges.  The end_ call
  // optionally aligns the end address and returns the allocated ranges as
  // an ascending array of MemRegions.  This can be used to create and
  // archive a heap region which can be mapped at the same fixed addresses
  // in a future JVM instance.

  void begin_archive_alloc_range();
  void end_archive_alloc_range(GrowableArray<MemRegion>* ranges,
                               size_t end_alignment = 0);
  bool is_archive_alloc_too_large(size_t word_size);
  HeapWord* archive_mem_allocate(size_t word_size);


* Restore-time Support

At restore-time, the CDS code mapping the archived strings into the heap must call the GC/allocator code to allocate the specific address range(s) from which the archive was taken, and to mark the containing G1 regions as Archive.  This must be done as a two-step process, because the call to allocate the space must be done before any class loading has occurred, and ultimately, the allocator may need to insert fill objects, which requires that classes have been loaded.

The CDS code verifies that the restore-time compressed heap encoding is the same as it was at dump-time, but the allocator verifies that the requested ranges are actually within the heap, and not already in use.  A 'check_archive_addresses' call is made to allow the range to be checked independently.   A subsequent "allocate_archive_regions" call allocates and marks the G1 regions that constitute the space as archive regions, of course verifying that they are not already in use.

The CDS code also performs the actual mmap of the archive file into the required range(s).  The final step of calling fill_archive_regions creates any needed fill objects to make the G1 regions containing the archived MemRegions parseable.   If the dump-time and restore-time heaps are the same sizes, no fill objects will be required.

These G1 routines accept an array of MemRegions to be allocated/pinned, but there will normally be only 1 or 2 in the CDS usage.  If all the strings fit within a single G1 region at dump-time, there will be 1.  If multiple G1 regions were used, there will be 2, with the last (partially-used) G1 region in a separate MemRegion.   This allows the CDS code to avoid archiving a (potentially) large chunk of unused memory.

  // Facility for allocating a fixed range within the heap and marking               
  // the containing regions as 'archive'.  For use at JVM init time, when the caller
  // may mmap archived heap data at the specified range(s). The check_ call          
  // verifies that the regions are within the reserved heap.  The alloc_ call        
  // commits the appropriate regions and marks them as 'archive," after which        
  // the caller can perform the mmap.  The fill_ call (which must occur after class
  // loading) inserts any required filler objects around the specified ranges        
  // to make the regions parseable.                                                  

  bool check_archive_addresses(MemRegion* range, size_t count);
  bool alloc_archive_regions(MemRegion* range, size_t count);
  void fill_archive_regions(MemRegion* range, size_t count);


* Archive/Pinned Object Support

Archive object support is at the granularity of a G1 region, with "Archive" being a new heap region type.  A "Pinned" attribute has been added, which is shared by Humongous and Archive regions.  Pinned regions are never added to the collection set or considered for compaction.   Archive regions have the "Pinned" and "Old" tags set.

The majority of the new support is in the allocation routines already described, with small changes to other code that checks region types.  Many instances of tracing and assertion code needed to be made aware of Archive regions.  Region dumping code marks Archive regions as such.  Region verification code checks that Archive regions contain no heap pointers to non-archive regions, and also that there are no 'pinned' regions that aren't also old ('archive') or humongous.   In the future, there may be a use for such regions.

No changes are needed for the concurrent mark code, other than not allowing archive regions into the collection set. For full GC support, however, additional work was needed in mark/sweep.


* Shared Mark/Sweep Support

G1 uses the markSweep code shared with SerialGC and CMS to perform full GC operations.  This code would, at a minimum, need to be corrected to leave pinned objects in place.  But since the main goal of archive region support is to allow the memory to be shared, the code also has to avoid modifying live objects' mark words.

To avoid adding overhead for the other GCs that use markSweep, most of the support is in g1MarkSweep, with only a test in places necessary to determine if the more heavyweight code needs to be invoked.  This also avoids most of the work for G1 if there are no Archive regions in use.  When an archive region is created, checking code will be enabled so that G1MarkSweep::archive_check_enabled() returns true.  A bitmap is also created, and G1MarkSweep::mark_range_archive will set bits corresponding to the G1 regions in that range true, to indicate their "archive" status. G1MarkSweep::in_archive_range can then be used to determine if an object is in an archive region.

The in-lined is_archive_object() test in the shared markSweep code is shown below.  For example, mark_and_push uses it to avoid marking/pushing Archive objects.

inline bool MarkSweep::is_archive_object(oop object) {
#if INCLUDE_ALL_GCS
  return (G1MarkSweep::archive_check_enabled() &&
          G1MarkSweep::in_archive_range(object));
#else
  return false;
#endif
}


* Performance Impact

The main performance concern is on full GC's, because of the change to the mark/sweep code which affects every live object, and which is shared by SerialGC and CMS.  When archive regions are not in use, the additional cost consists of a test and branch.  This results in a degradation of about 1% in full GC times for SPECjbb 2005 using G1 (measuring full GCs of a 12G heap).   If archived regions are in use, so the additional in_archive_range check must be invoked, the degradation increases to 4%.

A test which fills the heap with a tree of arrays of object pointers, and thus should especially stress the mark code, showed a degradation of about 5.5% when archived regions were in use (measuring full GCs with 8G of live data).  However, the change was still only about 1% when archives were not in use, or when SerialGC was used.

Running server refworkload benchmarks showed no difference with this change added, when archive regions were not in use. Most show no impact when a region is use (which causes the extra check in mark/sweep), either. One exception is the jetstream benchmark, which forces several full GCs in the small measurement window. Measuring young GC pause times across many JBB runs showed no impact.



Comments
A new requirement has been added, which effects the interface: The shared strings must be mapped before classes have been loaded. This means that the pinning code cannot "fill_with_object" to fill any unused gaps in the pinned regions, because there is no IntArrayObject yet. This will probably mean a 2-step process will be needed at restore time, if we want to continue to support different heap sizes at restore-time vs. dump-time.
09-04-2015

The shared string space will be at the high end of the heap, because of the way the heap range is chosen. The high end tends to stay put even if a somewhat larger or smaller heap size is chosen, while the low end may move. So with zero-based compressed oops, this means the space should still be addressable at the same offset, and likely still at the very top of the heap, even if the heap is larger or smaller (as long as it is still within the same encoding).
24-02-2015

Here is a proposal for supporting shared strings with G1: During dump time, a designated "String Space" is allocated within the java heap, probably at the end of the heap. This space is a "pinned" region, which means oops residing within the space are scanned by GC normally, but not moved or collected by GC. String objects from the string table and the underlying char array objects are copied to the string space during dumping before writing out the archived data (it might be possible to allocate the string and the char arrays from the string space directly). On 64-bit platform with compressed oop pointer, the narrow oops are encoded using offsets (with or without scale) from the narrow oop base. Currently there are four different encoding modes, which include 32-bit unscaled, zero based, disjoint heap based and heap based encoding. Depending on the heap size and the heap minimum base, different encoding mode is used for narrow oops. The heap size and minimum base should be the same for dump time and runtime to ensure the same narrow oop based and encoding is used, so the oop pointers within the shared string space remain valid after dump. The string space is not required to be at a fixed address, but should be located at the same offset from the narrow oop base for dump time and runtime. The offset of the string space, the heap size and minimum base should be also stored in the archive for string data validation. If the heap minimum and size change, it will invalidate the encoding of the oop pointer to the char array from the shared string. In such case, the shared string data is ignored while the rest of the shared data can still be used by the VM. A warning indicating shared strings are not used due to incompatible GC configuration will be reported by the VM. At runtime, the string space is mapped as part of the java heap at the same offset from the heap minimum base as dump time. The mapped string space contains the shared string and char array objects. No patching is required for the oop pointers within the string space. The shared string space is mapped RW. GC should avoid writing to the oops in the shared string space.
13-02-2015

Before answering Jon's question, I have following two question: What would be a better choice for the pinned region in G1? Would it be easier to implement it at the lower end of the heap or the higher end?
11-02-2015

First paragraph in the above (latest) comment the phase "probably at the end of the heap" refers to the part of the heap at the lowest address? If yes, "start of the heap" might be more correct.
11-02-2015

Thanks, Coleen. I've added above information to the tracking wiki: https://wiki.se.oracle.com/display/JPG/CDS-JDK9-008+Store+interned+strings+in+archives+-+prototype.
10-01-2015

The compressed class pointer in java.lang.String instances will have to use the same encoding with -Xshare:on than -Xshare:dump. The Klasses are allocated into the CDS archive for -Xshare:dump, and in the compressed class space for Xshare:on so that the encoding is the same when we initialize _klass for new objects. If the encoding is to be the same for dumped j.l.String objects, the narrow_klass_base() has to be the same as dump time. Currently this isn't a restriction. See the code in metaspace.cpp set_narrow_klass_base_and_shift().
09-01-2015

If archived string objects mapped into heap, for example the old gen, is there any special handling needed during object scanning? When an archived object (for now just the strings) is reached from a regular live object, we need to make sure GC also scans the archived object, so any objects reached from it can also be scanned, but we also need GC not to move the object.
09-01-2015

GC requirements * The archived string objects should always be mapped at the same address. The string objects points to their underlying char array objects. Using fixed address avoids the need for fixing the pointer to the char array at runtime. * The archived StringTable uses offset from a base address to point to archived string objects. The archived strings should not be moved by GC. * The archived StringTable will not be processed by GC as roots (because offset is used in the table). Some archived string objects can be reached from other java objects, some don't. Need to make sure GC does not collect any of the archived string objects.
08-01-2015

The interned strings do not point to any other objects. In JDK7, the CDS archive was mapped as part of the heap (in Perm Gen) because it contained heap objects. In JDK8, the CDS archive includes only metadata and symbols, without any heap objects. Therefore, it doesn't need to mapped contiguous to the heap. Today, on 64 bit, it's hard-coded to start at 0x8:00000000, which is disjoint from the heap (the heap usually ends at around 0x7:c0000000 with the default 16GB max heap). I am thinking of storing the interned strings at the bottom of the heap (so it will be separated from the "main" portion of the archive). Something like this for 64-bit: ->heap bottom 0x3:00000000 ~ 0x3:00100000 [interned strings] 0x3:00100000 ~ 0x7:00000000 [rest of the heap (total = 16GB)] ->heap top [unused] 0x8:00000000 ~ 0x8:12345678 [main CDS archive] This way, hopefully it will work for all GCs: + For G1, the [interned strings] are regions that are not collected (but may still need to be scanned, as interned strings can be locked) + For the other GC types, the [interned strings] are at the bottom of the old gen. The [main CDS archive] actually does not have any direct pointers into [interned strings], so if we cannot map the bottom of the heap at the selected address, we simply disable the CDS interned strings. But we can still map the [main CDS archive]. Any interned strings used by the [main CDS archive] will be created dynamically by StringTable::lookup(). The 0x3:00000000 address is good for the default max heap size (16GB), but it can be configured via -XX flags. That way, the user can pick a lower address to enable more efficient pointer compression (e.g., if his max heap size is always smaller than 1GB, etc).
07-05-2014

The priority is taken from the CDS priority. See https://bugs.openjdk.java.net/browse/JDK-6590051 https://bugs.openjdk.java.net/browse/JDK-6603108
07-05-2014