Bug ID: JDK-8034852 Shrinking of Metaspace high-water-mark causes incorrect OutOfMemoryErrors or back-to-back GCs

Type: Bug
Component: hotspot
Sub-Component: gc
Affected Version: 8

Priority: P2
Status: Closed
Resolution: Fixed

Submitted: 2014-02-13
Updated: 2014-07-28
Resolved: 2014-06-19

JDK 8	JDK 9
8u20 b19Fixed	9Fixed

Before more memory is committed for the Metaspace we check that we don't go past MetaspaceGC::_capacity_until_GC, which acts as a high-water-mark for the Metaspace. If we do, we trigger a Metadata GC before allowing the Metadata allocation.

Psuedo code for the Metadata allocation failure path:
if (HWM reached) {
  do Metadata GC (1)
  expand Metaspace and allocate (2)
  if (allocation failed) {
    release all soft-references and GC
    allocate
    if (allocation failed) {
      throw OOME
    }
  }
}

It's expected that the expansion of the Metaspace and the allocation of Metadata in (2) should succeed unless we can't commit more memory from the OS or the MaxMetaspaceSize has been reached.

However, there's a possibility that Metadata GC in (1) could lower the HWM although we actually need to increase it. This will break some assumptions in (2) and we fail to allocate Metadata.

The effect is that we get an OutOfMemoryError or back-to-back GCs.

ILW = HML => P2

Impact: High
 The JVM will most likely shutdown hang doing back-to-back GCS

Likelihood: Medium
 Not seen in our usual testing, bug can be reproduced. Applications with large, temporary spikes in Metaspace are more likely to hit this bug.

Workaround: Low
 Use -XX:MaxMetaspaceFreeRatio=100. This will turn off HWM shrinking.
 or, set a high MetaspaceSize, which will set the initial HWM value. Metadata GCs will be triggered at a later point.

verified by related tests gc/metaspace/TestMetaspaceInitialization.java
28-07-2014
This bug was solved in 9, see JDK-8042821. But the fix committed in JDK-8042821 was not complete, which can be seen by JDK-8042933. The fix for JDK-8042933 was the correct fix JDK-8042821 and this was pushed to 9 and backported to 8u20, see JDK-8046773. So, this bug is fixed in both 9 and 8u20, but the fix had to use a different bug, since you can't push with the same bug id twice.
17-06-2014
The issue happens when the GC threshold for Metaspace (called"capacity_until_GC" in the code) becomes less than the committed memory for Metaspace. Any calls to Metaspace::allocate that requires committing more memory will then fail in MetaspaceGC::allowed_expansion, because capacity_until_GC() < MetaspaceAux::committed_memory(). The effect will be a full GC and after the GC we try to expand and allocate. After the expansion and before the allocation, one of two things can happen: 1. capacity_until_GC is larger than the committed memory after the expansion. The allocation will now succeed, but the next allocation requiring a new chunk will again trigger a full GC. This pattern will repeat itself for each new allocation request requiring a new chunk. 2. capacity_until_GC is still less than the committed memory even after the expansion. We throw a Java OOME (incorrectly). How can the GC threshold for Metaspace be less than the committed memory? The problem is that MetaspaceGC::compute_new_size uses the field _allocated_capacity for describing the amount of memory in Metaspace that is "in use". _allocated_capacity does not consider the memory in the chunk free lists to be "in use", since memory in the chunk free lists are supposed to be available for new allocations. The problem is that the chunk free lists can become fragmented, and then the memory is not available for all kinds of allocations. This patch changes MetaspaceGC::compute_new_size to use MetaspaceAux::committed_memory for describing how much memory that is "in use". The effect will be that memory in the chunk free lists will now be considered "in use" (but will of course be used for future allocations where possible). This will prevent capacity_until_GC from shrinking below the committed memory "by definiton", since capacity_until_GC can't be lower than the memory that is "in use".
08-05-2014
Release team: Approved for deferral.
13-02-2014
8-defer-request justification: This is not a showstopper. The problem only occurs under certain conditions and there is a fairly good and easy workaround. We want to defer this to 8u20.
13-02-2014
Workaround patch: diff -r 493930310461 src/share/vm/runtime/globals.hpp --- a/src/share/vm/runtime/globals.hpp Wed Feb 12 10:06:51 2014 +0100 +++ b/src/share/vm/runtime/globals.hpp Thu Feb 13 14:42:44 2014 +0100 @@ -3158,7 +3158,7 @@ "The minimum percentage of Metaspace free after GC to avoid " \ "expansion") \ \ - product(uintx, MaxMetaspaceFreeRatio, 70, \ + product(uintx, MaxMetaspaceFreeRatio, 100, \ "The maximum percentage of Metaspace free after GC to avoid " \ "shrinking") \ \
13-02-2014
For the implementation details see: MetaspaceGC::compute_new_size() MetaspaceGC::allowed_expansion() Metaspace::expand_and_allocate(...)
13-02-2014