JDK-8214388 : CDS dumping fails with java heap fragmentation
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 12
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2018-11-27
  • Updated: 2020-05-30
  • Resolved: 2018-12-04
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 12
12 b23Fixed
Related Reports
Relates :  
Relates :  
Description
"java -Xshare:dump" would intermittently fail with

   Unable to write archive heap ... due to fragmentation.

This usually happens when you try to dump lots of classes (e.g. 10000) with a relatively small heap (e.g., 1g) with a lot of GC threads (e.g., 24).

(Example use case -- Eclipse IDE loads 15,000 classes with 512MB heap.)

At CDS static archiving dump time, we allocate free GC regions at high address (near or at the top of java heap) for archiving java objects (with G1 currently). If GC happens during the dump time, the java heap might be fragmented with free and used regions interlaced.

During the object archiving process, we search for a highest free region in the java heap, then allocating downwards starting from the one that's found. If GC activities causes the heap to be fragmented, the consecutive free regions that we find might not be large enough for archiving all selected java objects. In that case, we currently report fragmentation issue and bail out.
Comments
I removed text in the description that this bug happens only with "extreme cases". I added a user case for Eclipse that should be well within our supported parameters.
30-11-2018

For JDK 12, I'm okay if you implement the temporary solution with single threaded full GC but without disabling humongous allocation. There is no need to disable humongous allocation if that's not an issue today.
29-11-2018

If case 2) in above summary is not a requirement, then the proposed solution #2 can be taken. Given the nature of this issue (large number loaded classes and smaller heap at dump time) and the existing workaround with increased java heap, I recommend fixing this after JDK-12. Solution #2 requires relocating references from archived objects, string table and Klass data. Relocation is not uniformed and might be multi-segmented when there are more than two separate segments. The implementation of the change is risky for JDK 12 given the time frame. In JDK 12, we are shipping the default CDS archive for the first time, I think our first priority at the moment is to guarantee the quality of the default CDS archive and avoid bugs introduced by last minute changes.
29-11-2018

There's no use case for humongous allocations today. However, it's possible for dumping to fail today due to fragmentation. So, unless you have a better fix that's ready for JDK 12, we should disallow humongous allocations for now. There's no reason to hold up a bug fix *just in case*.
29-11-2018

Disable humongous allocation is not a good idea. In the future, we might archive large arrays. I'd like to keep the system extendable, which is why I want to think through all use cases and find a solution that meets all our requirement.
29-11-2018

Transcript from chat: [~sjohanss] The reason for the holes is that we use multiple threads and the threads claims a set of regions which are then compacted. To avoid extensive synchronization, we allow the heap to have these holes after the full GC. And yes, it will strive to compact towards the bottom, so the working sets for the threads will be built up from the bottom of the heap. You could also force the full collection to only use one thread, this will be slower but give "perfect" compaction. The easiest way to do this would be to set ParallelGCThreads=1, but that will affect the young collections as well, but I'm not sure how much GC work going on when you dump the archive. Also, keep in mind that the humongous regions are not move even during a full collection. [~iklam] there isn't much GC work needed to dump archives. Most of the objects allocated are mirrors and strings. It basically brings up the VM, initializes the modules graph, loads a bunch of classes, and that's it. We don't run arbitrary code so there's no chance to allocate arbitrary objects. The strings are from the class files, but these strings can't be humongous (they can be at most 64K chars long due to classfile limit. Is there an API to say "do a full GC but with a single thread?" [~sjohanss]: That's the problem and the reason I asked how much other GC work is going on. Currently the full GC uses a few different metrics to calculate the number threads. We could of course add the constructor you are talking about, but another way would be to ergonomically set ParallelGCThreads to 1 if dumping. Because if that is known during startup and no real GC work is expected it might be an ok workaround.
29-11-2018

[~sjohanss] has suggested doing a full GC with a single worker thread. With G1, this will compact all used regions to the lower end of the heap, leaving a single contiguous list of free regions at the top of the heap. That way, we will never run into the fragmentation issue. The only catch is G1 cannot move humongous regions. So I will forbid the allocation of humongous during dump time. This is not an issue because: + the interned strings in classfiles cannot be humongous (due to 64K-char limit of UTF8 strings in classfiles) + normal operations during dump time doesn't allocate humongous arrays + arbitrary user code cannot be executed during dump time. I have a prototype working already.
29-11-2018

Here is a summary of three different cases where we have limitations (all with insufficient java heap size for object archiving): 1) The java heap can accommodate allocations during loading/linking work at dump time, but GC is triggered at dump time, which can cause fragmentation in the heap. 2) The java heap can accommodate allocations during loading/linking work at dump time, but not sufficient space left for archiving the objects. 3) The java heap is too small to accommodate allocations during loading/linking work at dump time. The issue raised here is case #1. The object archiving system is not originally designed to accommodate GCs at dump time.
28-11-2018

I think this the an enhancement instead of a bug. We currently detect dump time heap fragmentation when archiving cannot be accommodated and reports error properly. With the proposed solution (or other possible solutions), there are still edge cases that we cannot handle and must bail out. Case 1) If the dump time heap is not large enough and OOM occurs during classloading/linking, we need to abort dumping in that case. Case 2) If the dump time java heap has only enough space for allocations for loading/linking, but not enough space left for archiving, even with proposed solution #2, object archiving cannot be performed. For above cases, a larger java heap needs to be used for dump time. For users who want to do static archiving using customized class list, runtime GC setting is recommended for dump time, since the system is most optimal in that case. When a different (smaller) java heap size is chosen at dump time, runtime relocation might be done and the sharing benefit is lost for archived objects in the closed archive regions.
27-11-2018

This is not a minor enhancement. It's a bug with reliability, scalability and ease of use. Except for truly excessive cases, dumping the heap objects should always work without user intervention. Dumping 35,000 classes should be well within the range of supported use cases, so it should never fail.
27-11-2018

There are several approaches to consider: Approach 1) Allocate outside the java heap during object archiving. Need to adjust all references in the archived objects. The target addresses for adjusted references should be within the heap. We currently use GC APIs during object archiving for allocation, updating fields and verification. Doing the work outside the heap would have problem in those cases. Approach 2) Keep the current scheme, if fragmentation is detected, do an extra iteration of the archived objects and adjust the references to those objects that are in the segregated regions. So the heap data that we write into the archive file can still appear to be in consecutive memory regions. I'm currently in favor of #2 solution.
27-11-2018