Bug ID: JDK-8221173 JEP 387: Elastic Metaspace

JDK-8221173 : JEP 387: Elastic Metaspace

Type: JEP
Component: hotspot
Sub-Component: runtime

Priority: P3
Status: Closed
Resolution: Delivered
Fix Versions: 16

Submitted: 2019-03-20
Updated: 2023-08-15
Resolved: 2020-12-17

Related Reports

Duplicate :	JDK-8076480 - Use all possible Metachunk sizes before expanding Metaspaces.
Duplicate :	JDK-8076476 - Coalesce Metachunks in the Metaspaces
Relates :	JDK-8300732 - Whitebox functions for Metaspace test should use byte size
Relates :	JDK-8302455 - VM.classloader_stats memory size values are wrong
Relates :	JDK-8306832 - Metaspace: deallocate should not adjust up the deallocated size
Relates :	JDK-8198423 - Improve metaspace chunk allocation
Relates :	JDK-8221925 - [metaspace] provide size histogram for jcmd VM.metaspace
Relates :	JDK-8245707 - Increase Metaspace reserve alignment
Relates :	JDK-8187338 - Per anonymous class class loader data is costly
Relates :	JDK-8302385 - Remove MetaspaceReclaimPolicy=none
Relates :	JDK-8243147 - Deprecate UseLargePagesInMetaspace
Relates :	JDK-8076476 - Coalesce Metachunks in the Metaspaces
Relates :	JDK-8245215 - Obsolete InitialBootClassLoaderMetaspaceSize and UseLargePagesInMetaspace
Relates :	JDK-8243392 - Remodel CDS/Metaspace storage reservation
Relates :	JDK-8251158 - Implementation of JEP 387: Elastic Metaspace

Sub Tasks

JDK-8242424 :	Deprecate InitialBootClassLoaderMetaspaceSize - Resolved
JDK-8242622 :	Introduce MetaspaceReclaimPolicy - Resolved
JDK-8243147 :	Deprecate UseLargePagesInMetaspace - Resolved

Description

Summary
-------

Return unused HotSpot class-metadata (i.e., _metaspace_) memory to the operating system more promptly, reduce metaspace footprint, and simplify the metaspace code in order to reduce maintenance costs.

Non-Goals
---------

  - It is not a goal to change the way that compressed class-pointer encoding works, or the fact that a compressed class space exists.

  - It is not a goal to extend the use of the metaspace allocator to other areas of HotSpot, though that may be a possible future enhancement.

Motivation
----------

Since its inception in [JEP 122][jep122], metaspace has been somewhat notorious for high off-heap memory usage. Most normal applications don't have problems, but it is easy to tickle the metaspace allocator in just the wrong way to cause excessive memory waste. Unfortunately these types of pathological cases are not uncommon.

Metaspace memory is managed in per-class-loader [arenas][arena]. An arena contains one or more  _chunks_, from which its loader allocates via inexpensive pointer bumps.  Metaspace chunks are coarse-grained, in order to keep allocation operations efficient. This can, however, cause applications that use many small class loaders to suffer unreasonably high metaspace usage.

When a class loader is reclaimed, the chunks in its metaspace arena are placed on freelists for later reuse. That reuse may not happen for a long time, however, or it may never happen. Applications with heavy class loading and unloading activity can thus accrue a lot of unused space in the metaspace freelists.  That space can be returned to the operating system to be used for other purposes if it is not fragmented, but that’s often not the case.

Description
-----------

We propose to replace the existing metaspace memory allocator with a [buddy-based allocation scheme][buddy]. This is an old and proven algorithm which has been used successfully in, e.g., the Linux kernel. This scheme will make it practical to allocate metaspace memory in smaller chunks, which will reduce class-loader overhead. It will also reduce fragmentation, which will allow us to improve elasticity by returning unused metaspace memory to the operating system.

We will also commit memory from the operating system to arenas lazily, on demand. This will reduce footprint for loaders that start out with large arenas but do not use them immediately or might never use them to their full extent, e.g., the boot class loader.

Finally, to fully exploit the elasticity offered by buddy allocation we will arrange metaspace memory into uniformly-sized _granules_ which can be committed and uncommitted independently of each other. The size of these granules can be controlled by a new command-line option, which provides a simple way to control virtual-memory fragmentation.

A document describing the new algorithm in detail can be found [here][review-guide]. A working prototype exists as [a branch in the JDK sandbox repository][prototype].

Alternatives
------------

Instead of modernizing metaspace, we could remove it and allocate class metadata directly from the C heap. The advantage of such a change would be reduced code complexity. Using the C-heap allocator would, however, have the following disadvantages:

- As an arena-based allocator, metaspace exploits the fact that class metadata objects are bulk-freed. The C-heap allocator does not have that luxury, so we would have to track and release each object individually. That would increase runtime overhead, and, depending on how the objects  are tracked, code complexity and/or memory usage.

- Metaspace uses pointer-bump allocation, which achieves very tight memory packing. A C-heap allocator typically incurs more overhead per allocation.

- If we use the C-heap allocator then we could not implement the compressed class space as we do today, and would have to come up with a different solution for compressed class pointers.

- Relying too much upon the C allocator brings its own risk. C-heap allocators can come with their own set of problems, e.g., high fragmentation and poor elasticity. Since these issues are not under our control, solving them requires cooperation with operating-system vendors, which can be time-intensive and easily negate the advantage of reduced code complexity.

Nevertheless, we tested [a prototype that rewired metadata allocation to the C heap][malloc]. We compared this `malloc`-based prototype to the buddy-based prototype, described above, running
a micro-benchmark which involved heavy class loading and unloading. We switched off the compressed class space for this test since it would not work with C-heap allocation.

On a Debian system with glibc 2.23, we observed the following issues with the `malloc`-based prototype:

- Performance was reduced by 8-12%, depending on the number and size of loaded classes.
- Memory usage (process RSS) [increased by 15-18%][malloc-graph] for class load peaks before class unloading.
- Memory usage did not recover at all from usage spikes, i.e., metaspace was completely inelastic. This led to a difference in memory usage of [up to 153%][malloc-graph].

These observations hide the memory penalty caused by switching off the compressed class space; taking that into consideration would make the comparison even more unfavorable for the `malloc`-based variant.


Risks and Assumptions
---------------------

### Virtual-memory fragmentation

Every operating system manages its virtual memory ranges in some way; the Linux kernel, e.g., uses a red-black tree. Uncommitting memory may fragment these ranges and increase their number. This may affect the performance of certain memory operations. Depending on the OS, it also may cause the VM process to encounter system limits on the maximum number of memory mappings.

In practice the defragmentation capabilities of the buddy allocator are quite good, so we have observed a very modest increase in the number of memory mappings. Should the increased number of mappings be a problem then we would increase the granule size, which would lead to coarser uncommitting. That would reduce the number of virtual-memory mappings at the expense of some lost uncommit opportunities.

### Uncommit speed

Uncommitting large ranges of memory can be slow, depending on how the OS implements page tables and how densely the range had been populated before. Metaspace reclamation can happen during a garbage-collection pause, so this could be a problem.

We haven’t observed this problem so far, but if uncommit times become an issue then we could offload the uncommitting work to a separate thread so that it could be done independently of GC pauses.

### Reclamation policy

To deal with potential problems involving virtual memory fragmentation or uncommit speed, we will add a new production command-line option to control metaspace reclamation behavior:

    `-XX:MetaspaceReclaimPolicy=(balanced|aggressive|none)`

- `balanced`: Most applications should see an improvement in metaspace memory footprint while the negative effects of memory reclamation should be marginal. This mode is the default, and aims for backward compatibility.
- 'aggressive': Offers increased memory-reclamation rates at the cost of increased virtual-memory fragmentation.
- 'none': Disables memory reclamation altogether.

### Maximum size of metadata

A single metaspace object cannot be larger than the _root chunk size_, which is the largest chunk size that the buddy allocator manages. The root chunk size is currently set to 4MB, which is comfortably larger than anything we would want to allocate in metaspace.


[jep122]: https://openjdk.java.net/jeps/122
[arena]: https://en.wikipedia.org/wiki/Region-based_memory_management
[buddy]: https://en.wikipedia.org/wiki/Buddy_memory_allocation
[review-guide]: https://cr.openjdk.java.net/~stuefe/JEP-Improve-Metaspace-Allocator/review-guide/review-guide-1.0.html
[prototype]: http://hg.openjdk.java.net/jdk/sandbox/shortlog/38a706be96d4
[malloc]: https://cr.openjdk.java.net/~stuefe/JEP-Improve-Metaspace-Allocator/test/test-mallocwhynot/readme.md
[malloc-graph]: http://cr.openjdk.java.net/~stuefe/JEP-Improve-Metaspace-Allocator/test/test-mallocwhynot/malloc-only-vs-patched.svg

Comments

[~cwayne] See JDK-8260562 Does this look good for the release note?
28-01-2021
I thought [~stuefe] wrote a release note but I can't find it. We also need to document MetaspaceReclaimPolicy. I'll add a release note subtask to the main Jira CR.
27-01-2021
Thank you Mark. This reads good, more concise and still correct.
04-07-2020
Thanks for reducing the text -- that made the essence of this JEP easier to see. I’ve edited it fairly heavily, mainly to tease out the problem statements in the Description section that really belong in the Motivation section, but also to simplify the overall flow. (Along the way I also read your blog entries, which were very helpful, so now I know much more about metaspace than I used to!). Please review this version, correct any inaccuracies or mistakes that I might’ve introduced, and then assign the issue to me and I’ll move it to Candidate.
26-06-2020
I reduced the text in this JEP by ~50%. We think this reads better. If this is still too detailed, I think we can slim down the Alternatives section.
18-06-2020
I've made some minor grammar and format edits. I think this looks good. Is this the appropriate level of detail for this JEP?
17-06-2020
I re-read the text, and together with Thomas did some adaptions. Sounds good to me, much more compact now.
17-06-2020
The Summary won’t make much sense to a reader who doesn’t already know what “metaspace” means. Consider something like: “Return unused class-metadata (i.e., _metaspace_) memory to the operating system more promptly, and reduce its memory footprint.” The text currently in the Motivation section really belongs at the beginning of the Description section. Per the JEP template, the Motivation section should answer question such as: Why should this work be done? What are its benefits? Who's asking for it? How does it compare to the competition, if any? If you don’t have anything to add beyond the Goals that you’ve already stated, then just omit the Motivation section. As I suggested before, please convert the footnotes into inline links, and please place auxiliary files on cr.openjdk.java.net rather than on GitHub. Finally, I don't think it's necessary to capitalize "Metaspace" everywhere. (I asked John Rose about this, and he agreed.) Moving this back to Draft for now.
15-05-2020
A few initial comments: - Please adjust this JEP so that it follows the JEP template (https://openjdk.java.net/jeps/2). The titles of subsections should start with “###” or “####” in order to distinguish them from section titles. - Your success metrics aren’t specific metrics (e.g., “pause times less than one microsecond”), so they’re more appropriate to the Goals section. - Please convert the footnotes into inline links, and please place auxiliary files on cr.openjdk.java.net rather than on GitHub. To see how your JEP will appear when published: https://openjdk.java.net/jeps/8221173 (updates every fifteen minutes).
20-04-2020
I upgraded the priority to P3 because Hidden Classes are going to be checked into JDK 15 and this work supports efficient allocation of metadata for hidden classes, assuming an increase in their use.
13-04-2020
Please ask one or more reviewers to add themselves to this JEP, and then re-submit.
24-03-2020
Very good point. The new bin list covers ranges, atm sizes [2-4)[4-6)[6-8) ... up to 16, then the other mechanism kicks in. This is all adjustable oc and I have not yet spend much time tuning. Can you recommend a test for mass testing redefinitions?
07-02-2020
We should probably move these comments to the matching RFE for this. A good stress test for redefines is test/hotspot/jtreg/vmTestbase/nsk/jvmti/RedefineClasses/StressRedefine*
07-02-2020
One thing to consider is that Method* is 11 or 13 words which is outside the range of SmallBlocks 3-10 words. Making Method fit inside SmallBlocks might waste less memory for lots of redefinitions.
06-02-2020
I should think so. I have to update the text to describe that part in detail. I did not do so yet because I felt it was not a necessary or integral part of this improvement, but it makes sense to write it down and maybe discuss that regardless. In short: Today, prematurely deallocated blocks as well as remainder space of retired chunks are kept in a structure consisting of Dictionary (for larger blocks) and bin list (for smaller blocks). The new prototype changes this a bit: We still have a bin list (improved) for smaller blocks but the dictionary is not used anymore. Instead, we just keep a list of larger free blocks which we use up. This works better than the old structure and is simpler too. This is still a rather young addition to the new Metaspace. The tests show that we retain less memory, so block reuse is more efficient. Since CMS is gone, and the new Metaspace has no humongous chunks, I guess we would not need the TreeDictionary anymore.
06-02-2020
I'm looking at this code to figure out why we have this crash: JDK-8236746. This code has always seemed needlessly complicated, but hasn't crashed before. There's a lot of leftover code in binaryTreeDictionary.hpp from when CMS was removed. Now the binaryTreeDictionary code is used for metaspace only for free humongous chunks, but it's also used for freeing chunks of memory inside of metaspace chunks, for class redefinition, default methods, and class loading failure. With the new elastic metaspace, will the intra-metaspace memory be freed and reallocated efficiently?
06-02-2020
Yes, I replaced it with a simpler manager which seems to work better. A combination of a simple bin list for small blocks (like SmallBlocks today with some improvements) and a simple unordered list of larger blocks. The latter would replace the current dictionary. Reasoning is that in Metaspace we do not have a 1:1 put/get scenario - so, one where the size of blocks requested and those put into the manager are generally the same. Instead most of the blocks added to the manager come from remaining space of retired chunks and are often much larger than what is requested from it. The new structure seems to retain less space, so I think its a better solution.
06-02-2020
Would this make the binaryTreeDictionary go away?
06-02-2020
A working prototype exists, which shows significant reduction in retained memory after class unloading, in jdk-sandbox (branch "stuefe-new-metaspace-branch".
17-10-2019