JDK-8146115 : Improve docker container detection and resource configuration usage
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 9
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • OS: linux
  • CPU: x86
  • Submitted: 2015-12-23
  • Updated: 2019-05-22
  • Resolved: 2017-11-16
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 10 JDK 8
10 b34Fixed 8u191Fixed
Related Reports
CSR :  
Duplicate :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Sub Tasks
JDK-8189773 :  
JDK-8190942 :  
JDK-8196595 :  
Java startup normally queries the operating system in order to setup runtime defaults for things such as the number of GC threads and default memory limits. When running in a container, the operating system functions used provide information about the host and do not include the container configuration and limits. The VM and core libraries will be modified as part of this RFE to first determine if the current running process is running in a container. It will then cause the runtime to use the container values rather than the general operating system functions for configuring and managing the Java process. There have been a few attempts to correct some of these issue in the VM but they are not complete. The CPU detection in the VM currently only handles a container that limits cpu usage via CPU sets. If the Docker --cpu or --cpu-period along with --cpu-quota options are specified, it currently has no effect on the VMs configuration.
The experimental memory detection that has been implemented only impacts the Heap selection and does not apply to the os::physical_memory or os::available_memory low level functions. This leaves other parts of the VM and core libraries to believe there is more memory available than there actually is.

To correct these shortcomings and make this support more robust, here's a list of the current cgroup subsystems that we be examined in order to update the internal VM and core library configuration.

Number of CPUs
Use a combination of number_of_cpus() and cpu_sets() in order to determine how many processors are available to the process and adjust the JVMs os::active_processor_count appropriately. The number_of_cpus() will be calculated based on the cpu_quota() and cpu_period() using this formula: number_of_cpus() = cpu_quota() / cpu_period(). If cpu_shares has been setup for the container, the number_of_cpus() will be calculated based on cpu_shares()/1024. 1024 is the default and standard unit for calculating relative cpu usage in cloud based container management software.

Also add a new VM flag (-XX:ActiveProcessorCount=xx) that allows the number of CPUs to be overridden. This flag will be honored even if UseContainerSupport is not enabled.

Total available memory
Use the memory_limit() value from the cgroup file system to initialize the os::physical_memory() value in the VM. This value will propagate to all other parts of the Java runtime.

Memory usage
Use memory_usage_in_bytes() for providing os::available_memory() by subtracting the usage from the total available memory allocated to the container.

As as troubleshooting aid, we will dump any available container statistics to the hotspot error log and add container specific information to the JVM logging system.  Unified Logging will be added to help to diagnose issue related to this support.  Use -Xlog:os+container=trace for maximum logging of container information.

A new option -XX:-UseContainerSupport  will be added to allow the container support to be disabled.  The default for this flag will be true. Container support will be enabled by default.

From SQE point of view, this change is ready for integration.

I do not believe I can determine the docker version from within a container. The only sign that a container process is running via docker is the fact that docker currently mounts the containers cgroup file system under /docker. /proc/self/cgroup contains paths such as these: 10:freezer:/docker/42cd8b1ebfeb1d6ce991ecb87916292cda8c41d364c6615c497b6fb3b4865977 9:devices:/docker/42cd8b1ebfeb1d6ce991ecb87916292cda8c41d364c6615c497b6fb3b4865977 8:memory:/docker/42cd8b1ebfeb1d6ce991ecb87916292cda8c41d364c6615c497b6fb3b4865977 7:hugetlb:/docker/42cd8b1ebfeb1d6ce991ecb87916292cda8c41d364c6615c497b6fb3b4865977 6:cpu,cpuacct:/docker/42cd8b1ebfeb1d6ce991ecb87916292cda8c41d364c6615c497b6fb3b4865977 5:blkio:/docker/42cd8b1ebfeb1d6ce991ecb87916292cda8c41d364c6615c497b6fb3b4865977 4:net_cls,net_prio:/docker/42cd8b1ebfeb1d6ce991ecb87916292cda8c41d364c6615c497b6fb3b4865977 3:cpuset:/docker/42cd8b1ebfeb1d6ce991ecb87916292cda8c41d364c6615c497b6fb3b4865977 2:perf_event:/docker/42cd8b1ebfeb1d6ce991ecb87916292cda8c41d364c6615c497b6fb3b4865977 1:name=systemd:/docker/42cd8b1ebfeb1d6ce991ecb87916292cda8c41d364c6615c497b6fb3b4865977 I don't think it's wise to assume that this convention will continue. In addition, there are a few competing libcontainer implementations that we will want to support. These will most likely not follow this convention.

From a slack chat: 1:30 PM] Fairoz Matte How to identify the JVM crashes occured on Cloud environment from hs_error file. Ex System configuration "running on docker 17.09 in Ubuntu 16.04 server" I don't find this sort of information in hs_error logs.. [1:50 PM] David Holmes The vm doesn't know anything about docker versions, or even if it is running in docker. Even the upcoming container support is based on cgroups, not docker. [4:06 PM] Ioi Lam Google tells me this: https://forums.docker.com/t/get-a-containers-full-id-from-inside-of-itself/37237/2 Docker Forums Get a container's full id from inside of itself Well, it seems that I already found the solution ! I simply have to run the following command from inside a container cat /proc/self/cgroup | head -1 | tr --delete ���10:memory:/docker/��� Best [4:20 PM] Fairoz Matte Thanks Ioi, How about adding this information under System Section of hs_error file? Something like "OS:CentOS Linux release 7.4.1708 (Core) //if docker get docker info docker 17.09 in Ubuntu 16.04 server uname:Linux 4.4.0-96-generic #119-Ubuntu SMP Tue Sep 12 14:59:54 UTC 2017 x86_64

Attached is a program that demonstrates a more robust way of extracting cgroup configuration data for memory and cpuset subsystems. It also shows how to determine if cgroup v1 versus v2 is operating.

The problem is that there may not be a single cgroup filesystem to examine - the UseCGroupMemoryLimitForHeap flag already examines the "default" /sys/fs/cgroup location. What we want/need/desire is a simple way to find the cgroup information for the current process.

Here's one way I've found to accomplish this: 1. Find out where the cgroup file system is mounted. % mount | grep cgroup | grep memory cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory) 2. Examine the memory.limit_in_bytes contents stored in the cgroup memory file system more /sys/fs/cgroup/memory/memory.limit_in_bytes 5242880

Not sure if this actually gives us anything useful: $ ls -l /proc/7600/ns total 0 lrwxrwxrwx 1 gtee uucp 0 May 4 04:28 cgroup -> cgroup:[4026531835] lrwxrwxrwx 1 gtee uucp 0 May 4 04:28 ipc -> ipc:[4026531839] lrwxrwxrwx 1 gtee uucp 0 May 4 04:28 mnt -> mnt:[4026531840] lrwxrwxrwx 1 gtee uucp 0 May 4 04:28 net -> net:[4026531957] lrwxrwxrwx 1 gtee uucp 0 May 4 04:28 pid -> pid:[4026531836] lrwxrwxrwx 1 gtee uucp 0 May 4 04:28 user -> user:[4026531837] lrwxrwxrwx 1 gtee uucp 0 May 4 04:28 uts -> uts:[4026531838] gtee@spb23500:~$ ls -l /proc/7600/ns/cgroup lrwxrwxrwx 1 gtee uucp 0 May 4 04:28 /proc/7600/ns/cgroup -> cgroup:[4026531835] $ ls -l /proc/7600/ns/cgroup/ ls: cannot access '/proc/7600/ns/cgroup/': Not a directory $ cat /proc/7600/ns/cgroup cat: /proc/7600/ns/cgroup: Invalid argument

I just came across this: /proc/[pid]/ns/cgroup (since Linux 4.6) This file is a handle for the cgroup namespace of the process. which might be a simpler way to find the cgroup entries of the current process. But not if the rest of the path is arbitrarily constructed by the cgroup "administrator".

Good job we don't rely on detecting the "unlimited" value!

JDK-8170888 has now added experimental support for using the cgroup memory limit.

I think David's finding (2016-02-29 22:50 ) is there's a way to get the memory size info, but that doesn't always work (perhaps it depends on how the admin sets up the docker, or the version of docker). So if we want to fix it in JDK9, perhaps we can require a certain minimum docker version, plus the admin would have to config the docker in a certain way. I think having such restrictions would be better than deferring to JDK10 and not doing it at all for JDK9.

I suspect the "quiet abort" occurs when we can't allocate anything to enable error reporting. Or maybe the OS is forcefully aborting the process?

Deferring to 10 at this stage.

Good summary of state-of-the-problem in 2014: http://fabiokung.com/2014/03/13/memory-inside-linux-containers/ Earlier discussion: https://blog.docker.com/2013/10/gathering-lxc-docker-containers-metrics/

This would have been useful: https://sourceforge.net/projects/libcg/ A library for interacting with cgroups. But it doesn't seem to have gotten to critical mass.

From: https://groups.google.com/forum/#!searchin/docker-user/available$20memory/docker-user/De0lmGu-Mbc/fmPa9HR7uSQJ Docker's memory usage is enforced using cgroups. Unfortunately, that information doesn't seem to be available inside the container by default as far as I can see, unless you manually mount /sys/fs/cgroup inside the container. If you do that, then you can get at the limit with something like this: cat /sys/fs/cgroup/memory$(awk -F: '/4:memory/ {print $3}' /proc/self/cgroup)/memory.limit_in_bytes Here's my test: $ docker run -v /sys/fs/cgroup:/sys/fs/cgroup --rm -m 512m ubuntu bash -c 'cat /sys/fs/cgroup/memory$(awk -F: '"'"'/4:memory/ {print $3}'"'"' /proc/self/cgroup)/memory.limit_in_bytes' 536870912 ---- So not only do we have to parse the cgroup information, we have to have the end user ensure the cgroup filesystem is mounted in the container. This is not looking very promising.

https://docs.docker.com/engine/reference/run/#runtime-constraints-on-resources -m, --memory="" Memory limit (format: <number>[<unit>]). Number is a positive integer. Unit can be one of b, k, m, or g. Minimum is 4M. --memory-swap="" Total memory limit (memory + swap, format: <number>[<unit>]). Number is a positive integer. Unit can be one of b, k, m, or g. --memory-reservation="" Memory soft limit (format: <number>[<unit>]). Number is a positive integer. Unit can be one of b, k, m, or g. --kernel-memory="" Kernel memory limit (format: <number>[<unit>]). Number is a positive integer. Unit can be one of b, k, m, or g. Minimum is 4M.

If the memory constraints are coming from cgroups then there does not appear to be a programmatic API that will provide the information regarding available memory etc. It seems the only approach is to read what may be under /proc/<pid>/cgroup. But even then it is unclear how the limits that can be defined: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/sec-memory.html memory.limit_in_bytes sets the maximum amount of user memory (including file cache). memory.memsw.limit_in_bytes sets the maximum amount for the sum of memory and swap usage. translate into the "memory" that the VM is interested in. Please advise the observed and expected behaviour under Docker.

Is this information reported from Runtime.*Memory functions, and/or available memory as seen inside the VM?

JDK-8140793 captures the CPU detection. This bug captures the physical memory detection.