JDK-8313083 : Print 'rss' and 'cache' as part of the container information
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 22
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2023-07-26
  • Updated: 2025-06-23
  • Resolved: 2024-01-10
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11 JDK 17 JDK 21 JDK 23 JDK 8
11.0.29-oracleFixed 17.0.17-oracleFixed 21.0.9-oracleFixed 23 b05Fixed 8u471Fixed
Related Reports
Relates :  
Description
Container information printed in hs_err files and jcmd VM.info does not contain the 'rss' and 'cache' usage. For a process running in a cgroup based container, it is not just the 'rss' that is accounted towards the total memory usage. The 'cache' usage is also counted towards the memory usage. The OOM killer can terminate a process if its rss+cache usage reaches the max memory limit of the container.

Often times, users monitor the RSS of processes running in a container, and get confused when a process gets terminated by the OOM killer even when its RSS is much lower than the max memory limit of the container.

Currently, we print the total memory usage (from /sys/fs/cgroup/memory/memory.usage_in_bytes) in hs_err and VM.info output.

Example: 
container (cgroup) information:
container_type: cgroupv1
cpu_cpuset_cpus: 0-1
cpu_memory_nodes: 0
active_processor_count: 2
cpu_quota: no quota
cpu_period: 100000
cpu_shares: no shares
memory_limit_in_bytes: 524288 k
memory_and_swap_limit_in_bytes: 1048576 k
memory_soft_limit_in_bytes: unlimited
memory_usage_in_bytes: 524164 k. <<------
memory_max_usage_in_bytes: 524288 k. 
kernel_memory_usage_in_bytes: 3124 k
kernel_memory_max_usage_in_bytes: unlimited
kernel_memory_limit_in_bytes: 4272 k
maximum number of tasks: unlimited
current number of tasks: 35

It would be very helpful to print the 'rss' and 'cache' usage as well, which can be obtained from /sys/fs/cgroup/memory/memory.stat file.
Comments
Fix request [21u,17u] I backport this for parity with 21.0.9-oracle,17.0.17-oracle. Low risk, just collecting some values. Edits needed in 21, clean backport to 17. Test passes. SAP nightly testing passed.
17-06-2025

A pull request was submitted for review. Branch: master URL: https://git.openjdk.org/jdk17u-dev/pull/3639 Date: 2025-06-15 18:50:44 +0000
15-06-2025

A pull request was submitted for review. Branch: master URL: https://git.openjdk.org/jdk21u-dev/pull/1875 Date: 2025-06-15 18:48:11 +0000
15-06-2025

Changeset: c96cbe48 Author: Gerard Ziemski <gziemski@openjdk.org> Date: 2024-01-10 17:29:55 +0000 URL: https://git.openjdk.org/jdk/commit/c96cbe481c86800b76e220374b24b6671984adb7
10-01-2024

Yes, I meant to have a reference in the JBS system as to why that mapping was chosen. "Using what cadvisor" does seemed a bit thin as a rationale. It could have been wrong, so I did some digging. The kernel sources are a better one. Since in cg v1 those stat values are called something else than in cg v2, we can map 'rss' to 'anon' and 'cache' to 'file' for cg v2 because the kernel uses the same indices when it sets size values in the memory.stat interface files: cg v2: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/memcontrol.c?id=0dd3ee31125508cd67f7e7172247f05b7fd1753a#n1635 cg v1: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/memcontrol.c?id=0dd3ee31125508cd67f7e7172247f05b7fd1753a#n4266 That seems to be why cadvisor also has chosen this mapping. Therefore, since the kernel sources and cadvisor use this mapping, what the proposed patch does seems fine to me.
10-01-2024

To be more verbal, when I said: i.e. rss = memory.stats/anon I meant "rss" is the "anon" value in the "memory.stats" file in cgroup v2 provided statistic files.
09-01-2024

Sorry, I am not sure I follow. According to https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/memcontrol.c we have this struct table: struct memory_stat { const char *name; unsigned int idx; }; static const struct memory_stat memory_stats[] = { { "anon", NR_ANON_MAPPED }, { "file", NR_FILE_PAGES }, NR_ANON_MAPPED and NR_FILE_PAGES are just indexes into the memory_stats table, which simply maps the type of the memory stat to a user readable name. They are not values that can be used themselves in the calculation of 'anon' or 'file', so we have: 'rss' (cg v1) == 'anon' (cg v2) 'cache' (cg v1) == 'file' (cg v2) just like I stated.
09-01-2024

Kernel source code references for cgroup v1: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/memcontrol.c?id=0dd3ee31125508cd67f7e7172247f05b7fd1753a#n4209 and https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/memcontrol.c?id=0dd3ee31125508cd67f7e7172247f05b7fd1753a#n4227 And corresponding values for cgroup v2: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/memcontrol.c?id=0dd3ee31125508cd67f7e7172247f05b7fd1753a#n1521 So the mapping is: 'rss' (cg v1) => 'anon' (cg v2) - NR_ANON_MAPPED 'cache' (cg v1) => 'file' (cg v2) - NR_FILE_PAGES Inspired by the cadvisor source code commit here: https://github.com/google/cadvisor/commit/691ca316134060db0508730aa7757bc5e6c1e280
08-01-2024

A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/17161 Date: 2023-12-19 17:41:51 +0000
19-12-2023

Posted preliminary PR https://github.com/openjdk/jdk/pull/17161
19-12-2023

I looked at how cAdvisor does it (https://github.com/google/cadvisor). It displays RSS in the "Processes" section on its locally served http web page, but looking at the css source code implementation, it looks like it simply uses "ps" output for that, not cgroups v2 metrics. Looking at the go source code implementation of cAdvisor, we have this: if cgroups.IsCgroup2UnifiedMode() { ret.Memory.Cache = s.MemoryStats.Stats["file"] ret.Memory.RSS = s.MemoryStats.Stats["anon"] ret.Memory.Swap = s.MemoryStats.SwapUsage.Usage - s.MemoryStats.Usage.Usage ret.Memory.MappedFile = s.MemoryStats.Stats["file_mapped"] } else if s.MemoryStats.UseHierarchy { ret.Memory.Cache = s.MemoryStats.Stats["total_cache"] ret.Memory.RSS = s.MemoryStats.Stats["total_rss"] ret.Memory.Swap = s.MemoryStats.Stats["total_swap"] ret.Memory.MappedFile = s.MemoryStats.Stats["total_mapped_file"] } else { ret.Memory.Cache = s.MemoryStats.Stats["cache"] ret.Memory.RSS = s.MemoryStats.Stats["rss"] ret.Memory.Swap = s.MemoryStats.Stats["swap"] ret.Memory.MappedFile = s.MemoryStats.Stats["mapped_file"] } i.e. rss = memory.stats/anon This rss value is not the same as the one reported by system tools, such as "ps -aux", on my Ubuntu machine, but if that is how cAdvisor does it, then I'm OK with implementing it that way.
18-12-2023

Thanks Gerard for trying out so many experiments. I am out of any more ideas at this point.
15-12-2023

If I do: cat /proc/PID/status | grep -i rss I get "VmRSS", which is equal to "RssAnon" + "RssFile" + "RssShmem", so suppose "rss" = "anon" + "file" + "shmem". Running Stylepad.jar from bash terminal we get cgroup consisting of 2 processes according to "cgroup.procs" file. Running "smem | grep PID" on those 2 processes we get total: rss: 243,671,040 Another way to find out rss is to run "cat /proc/PID/status | grep -i rss" and the total for our 2 processes is: rss: 243,515,392 Those two numbers are very close, so let's assume that the real rss is very close to them. Let's take a look at cgroup v2 data from "memory.current" we get: current: 279,961,600 From "memory.stat=" file, we get: anon: 199,827,456 file: 78,110,720 kernel: 2,023,424 which add up to "current", so we know that "anon", "file" and "kernel" values are independent of each other. Let's use the other metrics from "memory.current" file and calculate "rss" as per your suggestion - 'anon' + 'swapcached': rss = 'anon' + 'swapcached' anon:199,827,456 swapcached: 0 rss: 199,827,456 target: 243,515,392 err: -43,687,936 A difference of "43,687,936", so it does not look right. Let's try the formula from "/proc/PID/status", where "rss" = "anon" + "file" + "shmem": rss = 'anon' + 'file' + 'shmem' anon: 199,827,456 file: 78,110,720 shmem: 745,472 rss: 278,683,648 target: 243,515,392 err: 35,168,256 A difference of "35,168,256", so it does not look right either.
14-12-2023

Here is the data from one captured sessions: anon 124,874,752 inactive_anon 125,440,000 active_anon 180,224 so in this case "anon" < "inactive_anon" + "active_anon", and not equal. Confusing and not confidence building.
14-12-2023

Please note that the values in memory.stat file are in 'bytes', and the numbers in pmap or smaps are in KB. As per my understanding, 'active_anon' and 'inactive_anon' are parts of 'anon'. So, the closest representation of rss would be 'anon' + 'swapcached' value. And, 'file' should give us the pagecache usage. From documentation: anon Amount of memory used in anonymous mappings such as brk(), sbrk(), and mmap(MAP_ANONYMOUS) file Amount of memory used to cache filesystem data, including tmpfs and shared memory.
14-12-2023

In the meantime, I will look into providing some cgroups v2 based metrics that can be used to anticipate OOM event as discussed https://faun.pub/how-much-is-too-much-the-linux-oomkiller-and-used-memory-d32186f29c9d
13-12-2023

At this point I do not know how to derive "rss" from the memory data provided by cgroups v2 I will ask the team to see if anyone knows.
12-12-2023

The doc says: "memory.current A read-only single value file which exists on non-root cgroups. The total amount of memory currently being used by the cgroup and its descendants." No mention of "rss" or "cache" that I see. No real definition. Can you point me to where exactly "rss" is claimed to be part of "memory.current" and the actual details? I mean "rss" obviously makes up for the total of process memory, but saying just that doesn't help much. If you want just the "rss" part, then we need to be able to find out all the other parts that make up the process memory as well. "memory.stat" has a bunch of different values that I am hoping can be used to synthesize rss and cache, but where does it says that "anon" holds "rss" and "file" holds "cache"? It is OK if you don't know the details, I am happy to dig deeper, but what you said so far does not help unfortunately, unless I can verify them.
05-12-2023

Gerard, from the documentation here: https://www.kernel.org/doc/Documentation/cgroup-v2.txt For cgroupv2, memory.current file contains the total amount of memory (rss+page cache) currently being used by the cgroup and its descendants. memory.stat file contains the breakdown of the used memory. 'anon' and 'file' hold the 'rss' and 'page cache' values respectively. anon: Amount of memory used in anonymous mappings such as brk(), sbrk(), and mmap(MAP_ANONYMOUS) file: Amount of memory used to cache filesystem data, including tmpfs and shared memory. It will be good to print these three values and confirm what the documentation says.
05-12-2023

[~poonam] could you please verify that it is indeed "rss" and "cache" values that you need? do you have any insights on how to do this with cgroup v2?
05-12-2023

I don't think I can get this into jdk22. Having to implement it for cgroup v2 is much more involved that it was for cgroup v1. I will continue to work on this, so it doesn't slip through cracks again and make sure it gets into jdk23.
05-12-2023

Since [~poonam] originally asked for "rss" and "cache" values, then that's what I'm going to continue working on figuring out. In the meantime if I hear that it is "container_memory_working_set_bytes" that we are more interested in after all, then I will look into that instead. I need to point out here that once we start reporting the synthetic "rss" value, then we will be on the hook to keep reporting it. The still missing detail here is how we should define "rss". I will look for any already existing solutions in this area (containers).
05-12-2023

This issue asks for "rss", "cache" values to be printed when printing cgroup v1 info. After looking into it I said that we can do that easily, since we already had the mechanism querying other info, so here we just had to extend it by reading 2 additional fields. I have provided this patch and asked for feedback. The feedback said that these two values show up as "unsupported" in cgroup v2. After looking into this deeper here is my current understanding. cgroup v2 is quite different from cgroup v1 in this context - it does not provide either "rss" or "cache". I think it might be because those values can be derived from the many values it does show (?) The original request for this feature states that we need "rss" and "cache" to anticipate when OOM killer might kill a process. According to this https://faun.pub/how-much-is-too-much-the-linux-oomkiller-and-used-memory-d32186f29c9d "container_memory_working_set_bytes" best represents the limit used by OOM killer. And according to https://stackoverflow.com/questions/74796436/rss-memory-equivalent-in-cgroup-v2 "container_memory_working_set_bytes" is defined as "container_memory_usage_bytes" - "total_inactive_file", where: "container_memory_usage_bytes" is the value in "/sys/fs/cgroup/memory/memory.usage_in_bytes" file "total_inactive_file" can be found in "/sys/fs/cgroup/memory/memory.stat" file On my Linux machine (Ubuntu), which is using cgroup v2 ("stat -fc %T /sys/fs/cgroup/" prints "cgroup2fs", not "tmpfs") I can't find "/sys/fs/cgroup/memory/memory.usage_in_bytes" (this is cgroup v1 file?). Also, instead of "/sys/fs/cgroup/memory/memory.stat" I see "/sys/fs/cgroup/memory.stat", which has "inactive_file" field. So things are not completely clear yet and even though I marked "Understanding: Fix Understood", that was indeed true for cgroup v1, as originally requested in this issue, but is no longer true for cgroup v2 implementation. Today I will change it to "Unknown" In addition, since there are such drastic differences between cgroup v1 and cgroup v2, I'm not sure I like our current mechanism that attempts to print the same values for cgroup regardless of whether it is v1 or v2. It prints "unsupported" for those fields in v2 that do not exist in v1, but that's a confusing way of doing this. I think a better way to do this would be to print all the v1 values and all v2 values and let the developer extract and parse those they need to compute the info they desire. We could provide a shortcut for "rss" in v2, by calculating it ourselves, but if we do it that way, we should mark it, to make it clear that this is a synthetic value, calculated from other "native" (supported) values.
04-12-2023

Attaching a patch for the core of changes I think are needed. I still have to compile it on Linux to see if it even builds - I'm developing on mac...
30-08-2023

https://stackoverflow.com/questions/50865763/memory-usage-discrepancy-cgroup-memory-usage-in-bytes-vs-rss-inside-docker-con
30-08-2023

In jdk/src/hotspot/os/linux/cgroupV1Subsystem_linux.cpp we have code that uses /sys/fs/cgroup/memory/memory.usage_in_bytes and we even already use /sys/fs/cgroup/memory/memory.stat to find "Hierarchical Memory Limit" and "hierarchical_memsw_limit" so adding RSS should be simple if we are OK with pulling the value out of the system file.
30-08-2023

The attached test program can be used to see the OOM killer in action even when the RSS of the process is less than the memory limit. The tar bundle does not contain the largefile.txt used in the test. Please create a very large file (~100MB) to use it with this Java program. Create the docker image: $ docker build -t oom_killer . $ docker run -it --memory="512m" oom_killer Run the program from inside the container: $ java -Djava.library.path=. -XX:+UnlockDiagnosticVMOptions -Xms400m -Xmx400m -XX:+AlwaysPreTouch ContainerOOMKillerTest & After a while, OOM killer terminates the process, and from the memory.stat output printed by this program, we can see that RSS is below the memory limit when the OOM killer got invoked. ------------------------------- Memory usage after allocating native memory is: 536850432 ...Printing memory stat after allocating native memory... cache 4825088 rss 529588224 rss_huge 0 shmem 0 mapped_file 12288 dirty 16384 writeback 0 swap 0 pgpgin 475730 pgpgout 345258 pgfault 400303 pgmajfault 9 inactive_anon 529559552 active_anon 8192 inactive_file 4812800 active_file 12288 unevictable 0 hierarchical_memory_limit 536870912 ------------------------------------
26-07-2023