JDK-8072061 : Automatically determine optimal sizes for the CDS regions
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 9
  • Priority: P2
  • Status: Closed
  • Resolution: Fixed
  • Submitted: 2015-02-02
  • Updated: 2024-11-06
  • Resolved: 2017-08-03
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 10
10 b21Fixed
Related Reports
Blocks :  
Duplicate :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Description
Justification
=========

To improve the usability of CDS, we should eliminate the need to manually configure SharedReadWriteSize, SharedReadOnlySize, SharedMiscDataSize and SharedMiscCodeSize. Currently, these 4 values specify the size of 4 regions of the CDS archive. The user must set them "big enough" to contain all classes metadata.

In JDK-8048150, we added some code to guess the needed size based on the number of classes in the classlist. However, the guess is overly pessimistic but still not precise. So it would usually waste a lot of virtual address space, yet in rare cases would fail to reserve enough space.

The virtual address space waste is problematic on 64-bit platforms with class pointer compression. The 4 CDS regions sit at the beginning of the class space. If the CDS regions are excessively large, it will leave less space for allocating InstanceKlasses at run time, and would lead to failure to load classes (as we have seen in corner cases with JavaScript engines).

Currently, to have the "correct" region sizes requires trial and error. You need to dump the archive once to find out how much space is needed, and then dump the archive again with the correct sizes. Such "curation" of archives is inherently incompatible with cloud deployments, where arbitrary Java applications are deployed and any type of application-specific (manual) configuration is undesirable.

The usability problem becomes even worse as we plan to support 2-level CDS archives in JDK 10. Therefore, the JVM needs to automatically choose optimal sizes for the archive regions without user involvement.

Design Overview
=============

During CDS archive creation:

1. Load all the classes in the class-list. Allocate their class metadata using the "regular" metaspace allocators.

2. Iterate over all reachable class metadata objects so we know what objects should be archived. Note that some metadata objects created in step 1 would be subsequently freed (due to rewriting, etc) so we should not include these in the CDS archive.

3. Copy the RO objects into the RO region. The size of the RO region becomes fixed at the end of this step.

4. Copy the RW objects into the RW region. The RW region immediately follows the RO region.

Benefits
======    

+ Simplify the usage of AppCDS (no need to use preset RO/RW region sizes.), especially with planned 2-level archives in JDK 10.
+ As a next step, during the copying phase (#4 above), we can segregate the frequently and rarely used methods. Since only about 30% of loaded methods are actually used, doing this will allow us to reduce the runtime memory usage.
+ As a side effect, the added functionality to iterate over class metadata objects will enable various new tools such:
    + Detailed size statistics (improvement over the existing ClassStatsDCmd)
    + Detection of memory leaks in the Metaspace.

Alternatives
=========
As discussed in the comments below in this bug report, the JVM could be programmed to "dump twice" to first determine the size of the regions, and then do the final dumping. This would be an automation of the manual steps described in the "Justification" section above.

However, this has several issues:
1. We need to spawn a child JVM process to do the "trial" dumping, and then pass various sizing information back to the main process.
2. The main process still doesn't know how to eliminate freed/unreachable objects from the archive. These currently consists about 1.5% of the archive size, and will become an even larger percentage as we plan to run the Java-based class loader for dumping the archive (up to 5%).

Comments
URL: http://hg.openjdk.java.net/jdk10/jdk10/hotspot/rev/731370f39fcd User: jwilhelm Date: 2017-08-18 18:01:35 +0000
18-08-2017

URL: http://hg.openjdk.java.net/jdk10/jdk10/rev/d2b64cb3dc6e User: jwilhelm Date: 2017-08-18 17:58:43 +0000
18-08-2017

URL: http://hg.openjdk.java.net/jdk10/hs/hotspot/rev/731370f39fcd User: iklam Date: 2017-08-03 03:03:15 +0000
03-08-2017

URL: http://hg.openjdk.java.net/jdk10/hs/rev/d2b64cb3dc6e User: iklam Date: 2017-08-03 03:03:13 +0000
03-08-2017

JDK-8180325 adds this code: 1998 void InstanceKlass::remove_unshareable_info() { 1999 Klass::remove_unshareable_info(); 2000 2001 if (is_in_error_state()) { 2002 // Classes are attempted to link during dumping and may fail, 2003 // but these classes are still in the dictionary and class list in CLD. 2004 // Check in_error state first because in_error is > linked state, so 2005 // is_linked() is true. 2006 // If there's a linking error, there is nothing else to remove. 2007 return; 2008 } 2009 This should be changed to an assert after JDK-8072061 is fixed -- we should remove all classes that are in error state from the CDS image.
15-05-2017

I have created a new prototype with a simpler design (no more big switch statements for all possible metapace object types). It's mostly working (only failing 7 out of 100+ CDS tests). Here's the before/after statistics. All gaps are eliminated, and about 1.6% size reduction due to elimination of unreferenced objects. ============================ BEFORE ro space: 5290320 [ 30.4% of total] out of 10485760 bytes [ 50.5% used] at 0x0000000800000000 rw space: 5614616 [ 32.3% of total] out of 10485760 bytes [ 53.5% used] at 0x0000000800a00000 md space: 137216 [ 0.8% of total] out of 4194304 bytes [ 3.3% used] at 0x0000000801400000 mc space: 34053 [ 0.2% of total] out of 122880 bytes [ 27.7% used] at 0x0000000801800000 st space: 16384 [ 0.1% of total] out of 16384 bytes [100.0% used] at 0x00000007bfc00000 od space: 6305712 [ 36.2% of total] out of 20971520 bytes [ 30.1% used] at 0x000000080181e000 total : 17398301 [100.0% of total] out of 46276608 bytes [ 37.6% used] ro_cnt ro_bytes % | rw_cnt rw_bytes % | all_cnt all_bytes % --------------------+---------------------------+---------------------------+-------------------------- Unknown : 1 40 0.0 | 1 40 0.0 | 2 80 0.0 Class : 0 0 0.0 | 1202 757864 13.1 | 1202 757864 6.8 Symbol : 35418 1469464 27.8 | 0 0 0.0 | 35418 1469464 13.3 TypeArrayU1 : 8694 293328 5.5 | 1220 236024 4.1 | 9914 529352 4.8 TypeArrayU2 : 2761 158456 3.0 | 0 0 0.0 | 2761 158456 1.4 TypeArrayU4 : 1364 88936 1.7 | 0 0 0.0 | 1364 88936 0.8 TypeArrayU8 : 1968 183672 3.5 | 0 0 0.0 | 1968 183672 1.7 TypeArrayOther : 0 0 0.0 | 0 0 0.0 | 0 0 0.0 Method : 0 0 0.0 | 18458 1631008 28.2 | 18458 1631008 14.7 ConstMethod : 18458 2771096 52.4 | 0 0 0.0 | 18458 2771096 25.0 MethodData : 0 0 0.0 | 0 0 0.0 | 0 0 0.0 ConstantPool : 0 0 0.0 | 1220 1873072 32.4 | 1220 1873072 16.9 ConstantPoolCache : 0 0 0.0 | 1188 1116608 19.3 | 1188 1116608 10.1 Annotation : 55 1760 0.0 | 0 0 0.0 | 55 1760 0.0 MethodCounters : 0 0 0.0 | 0 0 0.0 | 0 0 0.0 Deallocated : 62 5632 0.1 | 0 0 0.0 | 62 5632 0.1 SymbolHashentry : 35418 280912 5.3 | 0 0 0.0 | 35418 280912 2.5 SymbolBucket : 8854 35420 0.7 | 0 0 0.0 | 8854 35420 0.3 StringHashentry : 178 1424 0.0 | 0 0 0.0 | 178 1424 0.0 StringBucket : 44 180 0.0 | 0 0 0.0 | 44 180 0.0 Other : 0 0 0.0 | 0 171269 3.0 | 0 171269 1.5 --------------------+---------------------------+---------------------------+-------------------------- Total : 113275 5290320 100.0 | 23289 5785885 100.0 | 136564 11076205 100.0 ========================== AFTER mc space: 34053 [ 0.2% of total] out of 131072 bytes [ 26.0% used] at 0x0000000800000000 md space: 7912 [ 0.0% of total] out of 8192 bytes [ 96.6% used] at 0x0000000800020000 ro space: 4833568 [ 27.9% of total] out of 4837376 bytes [ 99.9% used] at 0x0000000800022000 rw space: 6023544 [ 34.8% of total] out of 6025216 bytes [100.0% used] at 0x00000008004bf000 st space: 16384 [ 0.1% of total] out of 16384 bytes [100.0% used] at 0x00000007bfc00000 od space: 6301527 [ 36.4% of total] out of 6303744 bytes [100.0% used] at 0x0000000800a7e000 total : 17216988 [100.0% of total] out of 17321984 bytes [ 99.4% used] ro_cnt ro_bytes % | rw_cnt rw_bytes % | all_cnt all_bytes % --------------------+---------------------------+---------------------------+-------------------------- Unknown : 0 0 0.0 | 0 0 0.0 | 0 0 0.0 Class : 0 0 0.0 | 1202 757864 13.2 | 1202 757864 7.0 Symbol : 35418 1389392 27.0 | 0 0 0.0 | 35418 1389392 12.7 TypeArrayU1 : 8695 257776 5.0 | 1188 231712 4.0 | 9883 489488 4.5 TypeArrayU2 : 2753 148896 2.9 | 0 0 0.0 | 2753 148896 1.4 TypeArrayU4 : 1359 84368 1.6 | 0 0 0.0 | 1359 84368 0.8 TypeArrayU8 : 1968 180280 3.5 | 0 0 0.0 | 1968 180280 1.7 TypeArrayOther : 0 0 0.0 | 0 0 0.0 | 0 0 0.0 Method : 0 0 0.0 | 18458 1631008 28.4 | 18458 1631008 15.0 ConstMethod : 18458 2771096 53.8 | 0 0 0.0 | 18458 2771096 25.4 MethodData : 0 0 0.0 | 0 0 0.0 | 0 0 0.0 ConstantPool : 0 0 0.0 | 1188 1839832 32.0 | 1188 1839832 16.9 ConstantPoolCache : 0 0 0.0 | 1188 1115872 19.4 | 1188 1115872 10.2 Annotations : 55 1760 0.0 | 0 0 0.0 | 55 1760 0.0 MethodCounters : 0 0 0.0 | 0 0 0.0 | 0 0 0.0 SymbolHashentry : 35418 280908 5.5 | 0 0 0.0 | 35418 280908 2.6 SymbolBucket : 8854 35420 0.7 | 0 0 0.0 | 8854 35420 0.3 StringHashentry : 178 1424 0.0 | 0 0 0.0 | 178 1424 0.0 StringBucket : 44 180 0.0 | 0 0 0.0 | 44 180 0.0 Other : 0 0 0.0 | 0 171289 3.0 | 0 171289 1.6 --------------------+---------------------------+---------------------------+-------------------------- Total : 113200 5151500 100.0 | 23224 5747577 100.0 | 136424 10899077 100.0
10-02-2017

We had some numbers when the compaction investigation was originally looked at back in mid 2015. Looking at my emails I have data like this: Space Compaction data ro space: used 8456120, compact 8450048, diff -6072 rw space: used 11449008, compact 11423744, diff -25264 taken from dump below. So savings are/were not large. I can't locate the project summary (from when it was abandoned for 9). --- Summary of MSO objects by type: FindDestination recorded 164066 MSO's Alloc'd Unref Live Diff Unknown : 0 2 2 0 0 Class : 2435 2435 0 2435 0 Symbol : 60912 61047 135 60912 0 TypeArrayU1 : 16419 16434 15 16419 0 TypeArrayU2 : 5083 5084 1 5083 0 TypeArrayU4 : 2529 2530 1 2529 0 TypeArrayU8 : 0 3892 1 3891 -3891 (== PointerArrays) TypeArrayOther : 0 0 0 0 0 PointerArray : 3869 0 0 0 3869 ArrayPointerArray : 22 0 0 0 22 Method : 33961 33961 0 0 0 ConstMethod : 33961 33961 0 0 0 MethodData : 0 0 0 0 0 ConstantPool : 2421 2436 15 2421 0 ConstantPoolCache : 2421 2421 0 2421 0 Annotation : 33 33 0 33 0 MethodCounters : 0 0 0 0 0 Deallocated : 0 28 28 0 0 Here's the raw data: Loading classes to share: done. Shared spaces: preloaded 2419 classes Rewriting and linking classes ... Rewriting and linking classes: done Number of classes 2435 instance classes = 2421 obj array classes = 6 type array classes = 8 Updating ConstMethods ... done. Removing unshareable information ... done. FindDestination recorded 164066 MSO's Unknown : 0 Class : 2435 Symbol : 60912 TypeArrayU1 : 16419 TypeArrayU2 : 5083 TypeArrayU4 : 2529 TypeArrayU8 : 0 TypeArrayOther : 0 PointerArray : 3869 ArrayPointerArray : 22 Method : 33961 ConstMethod : 33961 MethodData : 0 ConstantPool : 2421 ConstantPoolCache : 2421 Annotation : 33 MethodCounters : 0 Deallocated : 0 RO: total 167 unreferenced blocks = 9776 bytes RW: total 31 unreferenced blocks = 25440 bytes Old SharedBaseAddress = 0x0000000800000000 New SharedBaseAddress = 0x0000000a00000000 New RO base: 0x0000000a00000000 New RW base: 0x0000000a0080f000 New MD base: 0x0000000a012f4000 New MC base: 0x0000000a013a6000 Shared symbol table stats -------- base: 0x0000000800000000 Number of entries : 61047 Total bytes used : 545088 Average bytes per entry : 9.000 Average bucket size : 4.000 Variance of bucket size : 3.983 Std. dev. of bucket size: 1.996 Maximum bucket size : 13 ro space: 8456120 [ 40.9% of total] out of 16777216 bytes [50.4% used] at 0x0000000800000000 rw space: 11449008 [ 55.4% of total] out of 16777216 bytes [68.2% used] at 0x0000000801000000 md space: 719152 [ 3.5% of total] out of 4194304 bytes [17.1% used] at 0x0000000802000000 mc space: 34053 [ 0.2% of total] out of 122880 bytes [27.7% used] at 0x0000000802400000 total : 20658333 [100.0% of total] out of 37871616 bytes [54.5% used] Space Compaction data ro space: used 8456120, compact 8450048, diff -6072 rw space: used 11449008, compact 11423744, diff -25264 SD table: actual 116216, calculated 116216 SD bucket: actual 16160, calculated 16160 PI table: actual 7656, calculated 7656 PI bucket: actual 512, calculated 512 WC size 6176 Dumping shared data to file: /export/users/dh198349/jdk9-rt-cds/build/b01/se-linux-x64-internal-slowdebug/images/jdk/lib/amd64/server/classes.jsa Shared file region 0: 0x8107b8 bytes, addr 0x0000000800000000 file offset 0x 1000 Shared file region 1: 0xaeb2b0 bytes, addr 0x0000000801000000 file offset 0x812000 Shared file region 2: 0x af930 bytes, addr 0x0000000802000000 file offset 0x12fe000 Shared file region 3: 0x 8505 bytes, addr 0x0000000802400000 file offset 0x13ae000 Detailed metadata info (rw includes md and mc): ro_cnt ro_bytes % | rw_cnt rw_bytes % | all_cnt all_bytes % --------------------+---------------------------+---------------------------+-------------------------- Unknown : 1 48 0.0 | 1 48 0.0 | 2 96 0.0 Class : 0 0 0.0 | 2435 1940096 15.9 | 2435 1940096 9.4 Symbol : 61047 2231376 26.4 | 0 0 0.0 | 61047 2231376 10.8 TypeArrayU1 : 13998 510360 6.0 | 2436 448328 3.7 | 16434 958688 4.6 TypeArrayU2 : 5084 296352 3.5 | 0 0 0.0 | 5084 296352 1.4 TypeArrayU4 : 2530 162200 1.9 | 0 0 0.0 | 2530 162200 0.8 TypeArrayU8 : 3892 342296 4.0 | 0 0 0.0 | 3892 342296 1.7 TypeArrayOther : 0 0 0.0 | 0 0 0.0 | 0 0 0.0 PointerArray : 0 0 0.0 | 0 0 0.0 | 0 0 0.0 ArrayPointerArray : 0 0 0.0 | 0 0 0.0 | 0 0 0.0 Method : 0 0 0.0 | 33961 3274720 26.8 | 33961 3274720 15.9 ConstMethod : 33961 4908832 58.1 | 0 0 0.0 | 33961 4908832 23.8 MethodData : 0 0 0.0 | 0 0 0.0 | 0 0 0.0 ConstantPool : 0 0 0.0 | 2436 3573120 29.3 | 2436 3573120 17.3 ConstantPoolCache : 0 0 0.0 | 2421 2212696 18.1 | 2421 2212696 10.7 Annotation : 33 1056 0.0 | 0 0 0.0 | 33 1056 0.0 MethodCounters : 0 0 0.0 | 0 0 0.0 | 0 0 0.0 Deallocated : 28 3600 0.0 | 0 0 0.0 | 28 3600 0.0 SymbolHashentry : 0 0 0.0 | 61047 484020 4.0 | 61047 484020 2.3 SymbolBucket : 0 0 0.0 | 15262 61052 0.5 | 15262 61052 0.3 Other : 0 0 0.0 | 0 208133 1.7 | 0 208133 1.0 --------------------+---------------------------+---------------------------+-------------------------- Total : 120574 8456120 100.0 | 119999 12202213 100.0 | 240573 20658333 100.0
25-01-2017

Here are a few factors that we probably can consider while trying to address this issue in JDK 10. - How big is the gap in the address space between different shared spaces today? - If the gaps are few hundred KB, is it worth the effort to make the spaces to be the exact sizes? Probably not if that's the case. - If the gaps are big, is there a way to have better estimate for the space sizes without loading the classes twice? Maybe this bug could be addressed by adjusting the space size computation method during estimate. - Is the solution scalable? The proposed solution that loads classes twice works with static archiving using a class list, but probably would not work with dynamic archiving during application runtime.
25-01-2017

Can you load the classes so that you exactly how much space you need. Throw that away, resize the Metaspaces, and redo the class loading? It's simple and I don't think it would inhibit a better solution later. It would just waste time during the creation of the archives, right?
05-06-2015

We can unmap the unused reservations in the archives, so we will end up having a free "gap" between the RO and RW regions (let's ignore the MC and MD regions for now). What do we do with these gaps? We can try to fit another archive in there, and will end up having a smaller gap. The ability to use these "gaps" would depend on our ability to guess how big the RO/RW regions "should be", but that's the whole point of this RFE -- it's impossible to determine the actual size of the RW/RO regions ahead of time. If we could, there wouldn't be such gaps to begin with. The same can be said about allocating the infrequently accessed methods in a separate region -- you will be introducing yet one more region and one more unused gap. This will make the problem of wasting virtual address worse, not better. And, all your proposals above do not address what we do with unused holes inside the archive. If we do constantpool merging, the original constant pools will become garbage, yet their size will be too small to fit a merged constant pool (which will be bigger than the original pools). Before PermGen removal, the class meta objects are heap objects and it's very easy to move them (that's what the GC does). After PermGen removal and the introduction of MetaspaceObj, we lost the ability of iterating/relocating the class meta data objects. This RFE tries to reinstate that ability so we can perform more aggressive optimizations on the archive. Will the metaspace change related to fragmentation allow class meta data objects to be relocated? We certainly need to coordinate and avoid duplicated/incompatible work. We should probably file a JEP as well. The JEP should probably be titled "Add the ability to iterate/relocate MetaspaceObjs".
04-06-2015

Is there a JEP for this? It appears that alternatives haven't been considered. The metaspaces are currently mmap'ed regions and only commit up to the amount of memory used. The rest of the reservation (on some platforms) can be unmapped. For methods, you could allocate to different metaspaces regions for the frequently vs. infrequently accessed methods. Metaspace has some fragmentation issues which may change how it is implemented. This work needs to take into account that the design might change. This work is incompatible with the original design of metaspace.
04-06-2015

[~coleenp] + Simplify the usage of the current (8u40) AppCDS implementation. Currently if you have a large number of classes, you need to make guesses about what (fixed) sizes to use for the RO/RW regions. This RFE will make it unnecessary to have any preset RO/RW size. Sorry my language is a bit loose -- it makes the user's life simpler, not the implementation. + Conserve address space. If the user has to specify a fixed RW/RO size for the archive(s), it's likely they would specify too much (like "just use 100MB to be safe ...."). I am not sure what you mean by "add code to metaspace.cpp to trim the metaspaces created for the archives". + AOT will not completely solve the problem. Some methods are invoked exactly once, and will be paged into memory, but they may not be worth compiling by AOT. (AOT methods are several times bigger than the Java method so it would hardly be seen as a footprint optimization). Also, AOT code would fall back to interpreter on uncommon traps, and tier-2 compiler may kick in and recompile some of the AOT-compiled methods.
04-06-2015

+ Simplify the usage of the current (8u40) AppCDS implementation. I don't see how this code simplifies anything! The simplest approach is to allocate metadata in place, leave them there and dump those spaces. Adding 1337 lines of code that knows the exact types of metadata that are in the archive is not a simplification. It is an invitation for bugs when adding new types of metadata. + Conserve address space. This would be useful when we support multiple CDS archives in the future How much address space is conserved? Another approach would be to add code to metaspace.cpp to trim the metaspaces created for the archives. What percentage of the total footprint of the application does this account for? + As a next step, during the copying phase (#4 above), we can segregate the frequently and rarely used methods. Since only about 30% of loaded methods are actually used, doing this will allow us to reduce the runtime memory usage. Do we have any performance numbers that would reliably measure this? It seems that AOT would solve the problem of running methods that are used since they can be located in the code space based on usage.
03-06-2015

Part of this fix should address the code clean up as listed in JDK-8068688.
28-04-2015

The tests referenced in 8067162 should be updated as part of this fix. Once we have exact sizing we can check the actual sizes to determine a "too small" size; and the utilization test can probably be dispensed with altogether.
19-02-2015