Relates :
|
|
Relates :
|
|
Relates :
|
|
Relates :
|
|
Relates :
|
Summary ------- JDK-8059092 describes CDS support for allowing interned strings to be among the objects shared by applications. This requires the strings and their corresponding arrays to be in a part of the heap that can be archived and later mapped and shared. This entry describes the required GC/Allocation support, which is targeted for G1 only. There are two aspects of the support: dump-time, and restore-time. The dump-time support must give CDS a way to allocate space in the heap for the strings and char arrays which are to be shared. CDS will fill this space with objects and then dump it to an archive file. The space should be at or near the top of the (maximum) heap, so that the memory range is likely to be valid even if the restore-time heap size differs. It should be contiguous. The restore-time support must give CDS a way to allocate the space in the heap corresponding to the dump-time memory range, so that the shared archive can be mapped at that location. The heap corresponding to that range must be "pinned," that is, never perturbed by GC, so that it can be effectively shared. No other "pinned" allocation support is needed, such as adding pinned objects at runtime. Goals (mostly inherited from JDK-8059092) ------- - Only provide this support in the G1 collector. - Only the 64-bit platform with compressed oops and compressed klass pointers need be supported. - No significant degradation (<2-3%) of startup time or GC pause time. Non-Goals ------- - Support for specialized allocation or pinned regions in collectors outside of G1. - Generalized "pinned allocation" support, such as for run-time allocation of pinned objects. - Support for pinning at granularity finer than G1 heap regions. - "Humongous" strings need not be supported in the shared archive. - 32-bit platform support. Description ------- At dump time, CDS will perform the dump operation after class loading, from the VM thread. The part which is relevant here consists of a loop which walks the string table, and for each entry, allocates a string object and corresponding array object and initializes them. The containing memory range will then be dumped to an archive file, and after creating the archive, the VM will exit. The aspects of this allocation support that are new are: - CDS must be able to inquire what the containing memory range is, once the allocation loop is complete. - The base of the range should be G1-region aligned, to make best use of restore-time memory. - The allocations should occur in contiguous regions, to minimize the number of archive segments and map operations that are needed. - The allocation should be at or near the top of maximum heap, so that the memory ranges are likely to be valid even if the restore-time heap is smaller than the dump-time heap. - Also to support different restore-time heap sizes, individual object allocations must not cross MIN_REGION_SIZE boundaries. Note that these objects need not be pinned at dump time. At restore time, CDS will verify that the narrow oop encoding in use is the same as what was used at dump-time. At JVM init time, it will request allocation of the specific address range(s) that were initialized/archived at dump-time. The ranges must be marked pinned, not to be modified by GC. Because the restore-time heap size may be different than the dump-time heap size, the range base might not be region-aligned (but will still be MIN_REGION_SIZE aligned). In that case, dummy object(s) should fill the unused portion of the starting region. Because no humongous strings are allowed at dump-time, there is no need to mark any regions within the range as 'humongous' in addition to pinned. This allocation routine will return a failure status if the range is not entirely within the maximum heap, or if any of the contained G1 regions are not free. The CDS code will subsequently mmap the archive file in the range initialized at dump time. (Since these are dump-time-region-aligned, they are also h/w-page-aligned.) Objects in the mapped file contain (compressed) pointers to other objects in the same pinned memory range, but not outside it. There may be wasted space in the pinned range, though typically only in a single G1 region. There is no requirement to make this space usable at the present time. When and if generalized pinned-object allocation is implemented, this space could be used as the first free pinned heap location for allocations. Implementation Notes ------- The implementation is summarized here for the benefit of code reviewers. "Archive Region" support is added to G1 to support the above requirements. "Archive" regions are G1 regions that are not modifiable by GC, being neither scavenged nor compacted, or even marked in the object header. They can contain no pointers to non-archive heap regions, and object headers point to shared CDS metaspace (though this last point is not enforced by G1). Thus, they allow the underlying hardware pages to be shared among multiple JVM instances. In short, a dump-time run (using -Xshare:dump) will allocate space in the Java heap for the strings which are to be shared, copy the string objects and arrays to that space, and then archive the entire address range in the CDS archive. At restore-time (using -Xshare:on), that same heap range will be allocated at JVM init time, and the archived data will be mmap'ed into it. GC must treat the range as 'pinned,' never moving or writing to any objects within it, so that cross-JVM sharing will work. CDS only requires this support in G1, and in the 64-bit JVM with compressed pointers. However, there is nothing specific to the 64-bit version or compressed oops in the G1 support. * Dump-time Support At dump-time, the CDS code performing the dump will make calls to the allocator for the individual objects which will constitute the shared string archive. Because it is a goal to allow the dump-time and restore-time heaps to have different sizes, it is desirable to allocate this space at the top of the reserved heap, which may be outside the currently committed heap. The code must also take into account that the restore-time JVM might have a different G1 heap region size and region boundaries, depending on the heap size. Therefore, no allocation can cross a min_region_size boundary, because at restore time, that boundary might be an actual G1 region boundary, even if it is not one at dump-time. To accomplish this, a G1ArchiveAllocator class has been added. This uses a new G1CollectedHeap::alloc_highest_available_region() routine, which will return the topmost region of the heap which is free (committing uncommitted regions as necessary). Allocating within the region, the G1ArchiveAllocator operates on min_region_size sub-regions, to avoid allocating any objects that cross that boundary. It also disallows any allocations that would be considered 'humongous' for min_region_size. Ultimately, one or more G1 regions may contain the shared string data. An entry point is provided which returns the used ranges, combining adjacent regions into single ranges when possible. This routine will also align the ending address up to a requested multiple and fill. In the case of CDS, this space will later be the target of an mmap, so should be aligned to hardware page size. The regions used for this allocation are marked as "Archive" regions. Under Xshare:dump, the JVM exits after capturing the archive region contents, but execution could continue, with those regions marked as "Archive" remaining unmodified by GC. The G1ArchiveAllocator should be called only from the VM thread at safepoint. It is invoked via these G1CollectedHeap entry points for initiating archive allocation, allocating some number of objects, and ending the archive range: // Facility for allocating in 'archive' regions in high heap memory from // the VM thread, and recording the allocated ranges. The end_ call // optionally aligns the end address and returns the allocated ranges as // an ascending array of MemRegions. This can be used to create and // archive a heap region which can be mapped at the same fixed addresses // in a future JVM instance. void begin_archive_alloc_range(); void end_archive_alloc_range(GrowableArray<MemRegion>* ranges, size_t end_alignment = 0); bool is_archive_alloc_too_large(size_t word_size); HeapWord* archive_mem_allocate(size_t word_size); * Restore-time Support At restore-time, the CDS code mapping the archived strings into the heap must call the GC/allocator code to allocate the specific address range(s) from which the archive was taken, and to mark the containing G1 regions as Archive. This must be done as a two-step process, because the call to allocate the space must be done before any class loading has occurred, and ultimately, the allocator may need to insert fill objects, which requires that classes have been loaded. The CDS code verifies that the restore-time compressed heap encoding is the same as it was at dump-time, but the allocator verifies that the requested ranges are actually within the heap, and not already in use. A 'check_archive_addresses' call is made to allow the range to be checked independently. A subsequent "allocate_archive_regions" call allocates and marks the G1 regions that constitute the space as archive regions, of course verifying that they are not already in use. The CDS code also performs the actual mmap of the archive file into the required range(s). The final step of calling fill_archive_regions creates any needed fill objects to make the G1 regions containing the archived MemRegions parseable. If the dump-time and restore-time heaps are the same sizes, no fill objects will be required. These G1 routines accept an array of MemRegions to be allocated/pinned, but there will normally be only 1 or 2 in the CDS usage. If all the strings fit within a single G1 region at dump-time, there will be 1. If multiple G1 regions were used, there will be 2, with the last (partially-used) G1 region in a separate MemRegion. This allows the CDS code to avoid archiving a (potentially) large chunk of unused memory. // Facility for allocating a fixed range within the heap and marking // the containing regions as 'archive'. For use at JVM init time, when the caller // may mmap archived heap data at the specified range(s). The check_ call // verifies that the regions are within the reserved heap. The alloc_ call // commits the appropriate regions and marks them as 'archive," after which // the caller can perform the mmap. The fill_ call (which must occur after class // loading) inserts any required filler objects around the specified ranges // to make the regions parseable. bool check_archive_addresses(MemRegion* range, size_t count); bool alloc_archive_regions(MemRegion* range, size_t count); void fill_archive_regions(MemRegion* range, size_t count); * Archive/Pinned Object Support Archive object support is at the granularity of a G1 region, with "Archive" being a new heap region type. A "Pinned" attribute has been added, which is shared by Humongous and Archive regions. Pinned regions are never added to the collection set or considered for compaction. Archive regions have the "Pinned" and "Old" tags set. The majority of the new support is in the allocation routines already described, with small changes to other code that checks region types. Many instances of tracing and assertion code needed to be made aware of Archive regions. Region dumping code marks Archive regions as such. Region verification code checks that Archive regions contain no heap pointers to non-archive regions, and also that there are no 'pinned' regions that aren't also old ('archive') or humongous. In the future, there may be a use for such regions. No changes are needed for the concurrent mark code, other than not allowing archive regions into the collection set. For full GC support, however, additional work was needed in mark/sweep. * Shared Mark/Sweep Support G1 uses the markSweep code shared with SerialGC and CMS to perform full GC operations. This code would, at a minimum, need to be corrected to leave pinned objects in place. But since the main goal of archive region support is to allow the memory to be shared, the code also has to avoid modifying live objects' mark words. To avoid adding overhead for the other GCs that use markSweep, most of the support is in g1MarkSweep, with only a test in places necessary to determine if the more heavyweight code needs to be invoked. This also avoids most of the work for G1 if there are no Archive regions in use. When an archive region is created, checking code will be enabled so that G1MarkSweep::archive_check_enabled() returns true. A bitmap is also created, and G1MarkSweep::mark_range_archive will set bits corresponding to the G1 regions in that range true, to indicate their "archive" status. G1MarkSweep::in_archive_range can then be used to determine if an object is in an archive region. The in-lined is_archive_object() test in the shared markSweep code is shown below. For example, mark_and_push uses it to avoid marking/pushing Archive objects. inline bool MarkSweep::is_archive_object(oop object) { #if INCLUDE_ALL_GCS return (G1MarkSweep::archive_check_enabled() && G1MarkSweep::in_archive_range(object)); #else return false; #endif } * Performance Impact The main performance concern is on full GC's, because of the change to the mark/sweep code which affects every live object, and which is shared by SerialGC and CMS. When archive regions are not in use, the additional cost consists of a test and branch. This results in a degradation of about 1% in full GC times for SPECjbb 2005 using G1 (measuring full GCs of a 12G heap). If archived regions are in use, so the additional in_archive_range check must be invoked, the degradation increases to 4%. A test which fills the heap with a tree of arrays of object pointers, and thus should especially stress the mark code, showed a degradation of about 5.5% when archived regions were in use (measuring full GCs with 8G of live data). However, the change was still only about 1% when archives were not in use, or when SerialGC was used. Running server refworkload benchmarks showed no difference with this change added, when archive regions were not in use. Most show no impact when a region is use (which causes the extra check in mark/sweep), either. One exception is the jetstream benchmark, which forces several full GCs in the small measurement window. Measuring young GC pause times across many JBB runs showed no impact.
|