Bug ID: JDK-8207200 Committed > max memory usage when getting MemoryUsage

Type: Bug
Component: hotspot
Sub-Component: gc
Affected Version: openjdk8u,9,10

Priority: P3
Status: Resolved
Resolution: Fixed

Submitted: 2018-07-12
Updated: 2024-01-03
Resolved: 2018-09-04

JDK 11	JDK 12
11.0.22-oracleFixed	12 b10Fixed

http://mail.openjdk.java.net/pipermail/serviceability-dev/2018-July/024409.html

java.lang.IllegalArgumentException: committed = 542113792 should be <
max = 536870912
at java.lang.management.MemoryUsage.<init>(MemoryUsage.java:166)
at sun.management.MemoryImpl.getMemoryUsage0(Native Method)
at sun.management.MemoryImpl.getHeapMemoryUsage(MemoryImpl.java:71)
at org.elasticsearch.indices.breaker.HierarchyCircuitBreakerService.currentMemoryUsage(HierarchyCircuitBreakerService.java:246)
[...]

The values reported in the exception message are:

* "max": 536870912 = 512MB (exactly)
* "committed": 542113792 = 517MB (exactly), i.e. 5MB more than "max".

As the value of "max" is exactly what we have specified with -Xmx this indicates to me that the problem seems to be the calculation of "committed".

As the value of "max" is exactly what we have specified with -Xmx it seems to
indicate that the problem is the calculation of "committed". I do not
understand under which conditions this can happen thus I post this to the
mailing list in case anybody has ideas what might cause this.

A pull request was submitted for review. URL: https://git.openjdk.org/jdk11u-dev/pull/2281 Date: 2023-11-20 06:06:20 +0000
20-11-2023
Git URL: https://github.com/openjdk/jdk/commit/db7b4e20e1f75f6b78c8827aa5f8ccfd842f01ab
27-10-2023
Before approving this patch along with JDK-8209061 and JDK-8209062 to 11u, I would like to get an opinion of some experienced GC engineer, involved in OpenJDK 11 Updates, on how risky this backport of significant size (in LOC) is and whether it's worth taking this risk. For the time being I'm removing the jdk11u-fix-request label. Feel free to add back once you have some more technical backing.
23-12-2020
Fix Request (11u) This issue still affects jdk11u, so this fix needs to be backported. It applies cleanly after JDK-8209061 and JDK-8209062. Testing - tier1
15-09-2020
URL: http://hg.openjdk.java.net/jdk/jdk/rev/c265860d5d45 User: tschatzl Date: 2018-09-04 10:19:24 +0000
04-09-2018
Unfortunately, it only needs the concurrent readers (mutators) reading the used/committed values out of order. Consider an allocation in old gen (large object) where the heap will be expanded: the GC will update the current capacity, then update the current used (which are both plain variables). Even assuming that the writes are ordered (on TSO systems, by some additional locks/cmpxchg, whatever), the reader does not synchronize at all, so it (theoretically) can read values from these two variables in any order from any time - so used appears higher than committed. Unlikely, yes - but the same issue occurs with G1 which artificially lengthens the time between the writes of some (other) variables and a concurrent reader loading from them in no particular memory order (and it does other bad things like rewriting these variables a few times during the calculation which is probably the main issue in this case). (Initial and max values are written only once at startup long before the application starts, so it is very unlikely that they can get read out of order)
20-07-2018
It seems like the reporter on the mailing lists (Daniel) is running into the race condition: "I am quite happy that I could reproduce it after running the tests repeatedly for approximately a week after the first failure." Given how rarely the bug reproduces, the problem is most likely concurrent readers and writers racing. Me and Thomas discussed this code quite a bit, and I _think_ the current code works under single-threaded stop-the-world scenarios, but as Thomas writes above, does not work at all with concurrent readers and writers.
20-07-2018
After having looked and implemented a POC fix I think this issue is not particularly G1 specific: all collectors that may expand a space during mutator time (e.g. GCLocker, just committing new memory in the spaces, including for large object allocation directly in old gen) need to somehow synchronize the update of the used/committed values in the spaces which nobody does. The problem gets worse by the way this total memory usage is created: the method simply adds up all pool's initial/used/committed/maximum memory, which by definition is racy, as any internally consistent values of the pools may change while gathering them. It does help other collectors that they have strictly disjoint spaces (i.e. one space can't reuse memory from another). The reason G1 is particularly affected is because in addition the plainly returning these (already potentially inconsistent) values it does really weird additional calculations in an attempt to make them consistent and reasonable (which it fails on both accounts), widening the window of opportunity for overall inconsistencies to occur. I do think that at least the first problem also affects non-heap pool (code heap, metaspace) reporting.
19-07-2018
Analysis of G1MonitoringSupport::recalculate_*_sizes() showed that the code is not safe wrt to concurrent readers if written to at mutator time. Multiple variables that are queried by the MXBeans (that is used by GetMemoryUsage()) are read and written multiple times which can cause inconsistencies in the output, depending on what values the reads and writes observe. Since these statistics may be updated concurrently to the mutator, as G1 directly allocate humongous objects in old gen, this may lead to potentially inconsistent values, including the ones observed in this bug. Additionally there are other issues in the code that may cause impossible values to show up with the right command line flags.
17-07-2018
Removed Affects Version for 8 from the issue because the bug has been reported for JDK 10 and with G1. Since the code suspected to be responsible for this issue did not change for a long time, so keeping 9.
16-07-2018
Daniel (the original reporter) was able to reproduce this issue with logging. A test failed around 15:06:55 This time it happened on Linux (kernel 4.13.0-45-generic) with JDK 10 (build 10+46). The JVM arguments were: -Xms512M -Xmx512M -Xlog:gc=trace,heap=trace,tlab*=off:stdout:time,pid,tid,level,tags The logs are somewhat massive (~250MB uncompressed) and available at https://www.dropbox.com/s/wir9sv1dk5cf54y/JDK-8207200_test_log.txt.zip?dl=0 Added as attachment to this bug: JDK-8207200_test_log.txt.zip
13-07-2018
daniel.mitterdorfer@gmail.com from Elastic observed the issue on JDK 10 (build 10.0.1+10), he is currently trying to reproduce it so we can get some more logging.
13-07-2018
Me and Thomas did a very quick analysis. The code in G1MonitoringSupport::recalculate_sizes seem suspicious, it shouldn't assign used to committed. There are also comments and code hinting that this code might not be correct: // Somewhat defensive: be robust in case there are inaccuracies in // the calculations _eden_committed = MIN2(_eden_committed, committed); Since this code should only be run at safepoint (_not_ asserted in the function), there shouldn't be any possibility for miscalculations here.
13-07-2018
Please triage and update with your analysis
13-07-2018
https://netbeans.org/bugzilla/show_bug.cgi?id=194733 along with a few duplicate issues linked from there ran into it on OS X.
12-07-2018

Blocks :	JDK-8209062 - Clean up G1MonitoringSupport
Relates :	JDK-8219524 - Shenandoah misreports "committed" size in MemoryMXBean