Bug ID: JDK-8182070 Container Awareness

JDK-8182070 : Container Awareness

Type: JEP
Component: hotspot
Sub-Component: runtime

Priority: P3
Status: Closed
Resolution: Withdrawn

Submitted: 2017-06-13
Updated: 2024-06-04
Resolved: 2018-05-17

Related Reports

Blocks :	JDK-8184303 - JEP-JDK-8182070: SQE Test Plan for Container Awareness
Blocks :	JMC-5930 - JMC should report CPU usage based on the CPU quota rather then on number of CPUs the host machine has
Blocks :	JMC-5901 - Utilize information from the host/container
Blocks :	JDK-8199944 - Add Container MBean to JMX
Relates :	JMC-5522 - Rule for memory settings when running in a container
Relates :	JDK-8203357 - Container Metrics
Relates :	JDK-8146115 - Improve docker container detection and resource configuration usage

Description

Summary
-------

Container aware Java runtime

Goals
-----

Provide an internal API that can be used to extract container specific configuration and runtime statistics.  This JEP will only support Docker on Linux-x64 although the design should be flexible enough to allow support for other platforms and container technologies.  The initial focus will be on Linux cgroups technology so that we will be able to easily support other container technologies running on Linux in addition to Docker.

Non-Goals
---------

Although it is expected that this API will function properly on all cgroup enabled Linux platforms, it is not a goal of this JEP to perform validation of any configuration other than Docker containers running on Linux x64 and direct Linux host process execution.  Since cgroups are enabled for processes running directly on Linux hosts, this configuration will also be verified.

Motivation
----------

Container technology is becoming more and more prevalent in Cloud based applications.  The Cloud Serverless application programming model motivates developers to split large monolithic applications into 100s of smaller pieces each running in thier own container.  This move increases the importance of the observability of each running container process.  Adding the proposed set of APIs will allow more details related to each container process to be made available to external tools thereby improving the observability.

Description
-----------

This enhancement will be made up of the following work items:

Detecting if Java is running in a container.

The Java runtime, as well as any tests that we might write for this feature, will need to be able to detect that the current Java process is running in a container.  A new API will be made available for this purpose.

Exposing container resource limits, configuration and runtime statistics.

There are several configuration options and limits that can be imposed upon a running container.  Not all of these
are important to a running Java process.  We clearly want to be able to detect how many CPUs have been allocated to our process along with the maximum amount of memory that the process has been allocated but there are other options that we might want to base runtime decisions on.

In addition, since containers typically impose limits on system resources, they also provide the ability to easily access the amount of consumption of these resources.  The goal is to provide this information in addition to the configuration data.

A new jdk.internal.platform.Metrics class will define the API to obtain the types of configuration and consumption metrics listed below.

    class Container {
         public static Metrics metrics();
    }

Here are some of the types of configuration and consumption statistics that will be made available:

    Total Memory Limit
    Soft Memory Limit
    Maximum Memory Usage
    Current Memory Usage 
    Maximum Kernel Memory Usage
    Current Kernel Memory Usage
    Kernel Memory Limit
    Swap Memory Limit
    Maximum Swap Usage
    Current Swap Usage 
    CPU Shares
    CPU Period
    CPU Quota
    Number of CPUs
    CPU Sets
    CPU Set Memory Nodes
    CPU Usage
    CPU Usage Per CPU
    Block I/O Requests Serviced
    Block I/O Total Bytes Transferred
    OOM Kill Enabled
    
In addition to the new internal API, a new java -XshowSettings:system option will be added to allow the reporting of the system metrics.

Once this API has been integrated in the Java sources, enhancements to [JMX][1] and [JFR][2] will be done separately to enable the use and reporting of these metrics.


Alternatives
------------

There are a few existing tools available to extract some of the same container statistics.  These tools could be used instead.  The benefit of providing a core Java internal API is that this information can be exposed by current Java serviceability tools such as JMX and JFR along side other JVM specific information.

Testing
-------

Docker/container specific tests should be added in order to validate the functionality being provided with this JEP.

Risks and Assumptions
---------------------

Docker is currently based on cgroups v1. Cgroups v2 is also available but is incomplete and not yet supported by Docker. It's possible that v2 could replace v1 in an incompatible way rendering this work unusable until it is upgraded.

Other alternative container technologies based on hypervisors are being developed that could replace the use of cgroups for container isloation.

Dependencies
-----------

None at this time.


  [1]: https://bugs.openjdk.java.net/browse/JDK-8199944
  [2]: https://bugs.openjdk.java.net/browse/JMC-5901

Comments

This JEP has been closed and a new RFE has been created that will be used to integrate this work. https://bugs.openjdk.java.net/browse/JDK-8203357
17-05-2018
Is there a bug opened for the JFR events?
17-05-2018
The most important and user-relevant parts of "Container Awareness" were already merged into JDK 10 under JDK-8146115. Those changes should really have had their own JEP. The remaining work of the internal API described here is relatively straightforward and of little interest outside the JDK itself, so I suggest that you just handle it as an RFE.
26-04-2018
Markus and Erik were supportive of the idea of adding additional JFR events that can expose these metrics in order to gain better insight into the platform resource consumption occuring during recordings. I got the sense that they would prefer this logic to be in the hotspot VM sources but said they were ok with it being a Java given our direction of moving more logic to java. Once this API is integrated they will look at adding these event in a future release.
06-04-2018
Hi Bob! How did the discussions around a refresh of the related JFR events (and additional related events) go?
06-04-2018
Implement stages A, B, E and F were integrated via an RFE (https://bugs.openjdk.java.net/browse/JDK-8146115) that was integrated into JDK 10. This JEP is now focused on the remaining implementation areas: C. Add resource statistic extraction functions D. Add jdk.internal.Platform class in the core libraries
25-01-2018
Thanks for the comments Karen. Here are some responses to your points. 1. Planning for the future I was planning on specifying an invalid value return for functions that are not supported on a specific platform type. This will allow portability across many different types of containers or virtualization products. 2. All information returned will be dynamic (the most up-to-date contents of the /proc file system. As stated in section F, however, the VM will most likely not alter it's initial internal configuration of memory/cpu's etc. 3. jdk.internal.Platform internal or public I only intended on adding these internal functions for use by the core libraries and possibly internal tools (JFR, Management, etc). I suppose an argument could be made to expose them in the same package as java.lang.Runtime while avoiding duplicate functionality. I do plan on altering the existing Runtime APIs to extract their information from my new internal APIs when running containerized. 4. is_MP issue I don't see this as an issue. His change will remove the optimized path where we were avoiding memory barriers. Other decisions based on the number of available processors will still work the same. 5. Implementing in stages. I will add some subtasks that will lay out the ordering of implementation. The order might look something like this: A. Add container detection and CPU, Memory limit detection to the Hotspot VM B. Alter the existing VM functions to use the new resource limit functions C. Add resource statistic extraction functions in the VM D. Add jdk.internal.Platform class in the core libraries E. Add Unified Logging support F. Add hotspot error dumping support
30-08-2017
Totally agree that we need better container resource support. Yes please to error log information - this is a place where we might benefit from the difference between what the host reports and what we read in groups ( and possibly what we read initially vs. values at time of error log) Couple of questions/comments: 1. Planning for the future. The current proposal is closely tied to linux-x64/cgroups v1. To make it easier to support groups v2, or other containers or other platforms - would it be worth designing a simple mechanism that tells what configuration information is available? This could be a set of capability bits for instance, unless there are obvious ways for all of the APIs to return "invalid value". 2. Initial information vs. dynamically changed information. In your table, it would help to be precise about which information will reflect the state when the runtime starts up and which information is gathered upon first request vs. each request. 3. F: Will the notifications also be in the jdk.internal.Platform module, i.e. for internal use only? Is there a plan to expose any of this information beyond the internal APIs? 4. David has an rfe right now to deprecate is_MP and the ability to assume running single process. Are there other cases in which inconsistencies between initial information and later information might cause problems? 5. gray box: If we wanted to implement this in stages - it would be helpful to clarify which information would be needed and used internally first - clearly anything used to calculate number of CPUs or total memory. minor edits: Success Metrics: with out-of-the-box options. Motivation: “that the Java runtime is not aware of” -> “of which the Java runtime is unaware”. Description: B. “that we be allocated” -> “that the process has been allocated”. “I intent on providing” -> “The goal is to provide” In the gray box: “CPU Quote” -> “CPU Quota” C. “The operating system functions … and does not” -> “and do not”. In the same sentence “containers configuration” -> “container’s configuration” or "container configuration and limits"
28-08-2017
A number based on cpu quota and period will not yield anything close to an optimal configuration for the VM/libraries/application logic - quite the opposite IMHO. We need to understand exactly how quotas/periods are implemented within the container to even try and make some kind of reasonable determination. If you have 2 CPUs worth of resource on a 64 CPU systems then you effectively have 1/32 of a timeslice on each CPU for a given unit of computation time. Or you may have 1 full timeslice on 2 CPUs. Howe does the container manage this? If you have, for the sake of argument, 64 important tasks in your application, is it better for 2 tasks to make progress at a time, or for all 64 tasks to make some progress? You can only take advantage of all 64 CPUs if you have 64 threads. There is no simple answer here IMHO, so the VM will not be able to implement a simple one-size-fits-all policy based on quota/period.
10-08-2017
I believe that doing something to limit the number of cpu's allocated based on the cpu_quota and cpu_period is much better than doing nothing. If there are 64 CPUs and the quota&period have been set to use 2 CPU's worth of resource, don't you think it's better to have the VM configure itself based on 2 versus 64? I got feedback from the PaaS folks and they don't want to use cpusets. They much prefer to use quota and period.
09-08-2017
Reading about the proposed CPU limit support: --- number_of_cpus() = cpu_quota() / cpu_period(). Since it's not currently possible to understand the relative weight of the running container against all other containers, altering the cpu_shares of a running container will have no affect on altering Java's configuration. --- I don't think this is workable. In short CPU quotas/shares/periods can not be effectively used to "size" the number of threads you need to get the right "utilization" of CPU resources. If there are 10 cpus available but you only have a 50% share you effectively get 5 cpus at 100%, but that doesn't translate to needing 5 threads! If you only have 5 threads you can only use 5 cpus, but you will still only get 50% of the time on those 5 cpus - giving the effect of 2.5 cpus. IMHO only cpusets (the actual number of available processors) are relevant to sizing/configuring the VM itself, the libraries and even the application. Information about quotas/shares etc would be needed by application logic if it wanted to reason about its own performance characteristics (throughput, latency etc).
08-08-2017
Looks reasonable to me, reading through this, I don't see any java launcher implications. Just checking.
14-07-2017
Do you think we really need to provide simultaneous platform configuration for both Host and Container. The running Java process only needs to know what it's resource limitations and configuration are. The primary objective of this JEP is to ensure that we have the most accurate information in order to base startup and runtime decisions. In my opinion, it's a bug that the kernel ABI mechanisms, such as sysconf, are providing host based information rather than Container specific content. This could be fixed in the future making these two instances redundant. It's also possible that other operating systems won't have this anomaly. Also, for the Linux cgroups case, the exact same extraction logic works for both hosts and containers as long as cgroups is enabled on the host system. This further reduces the need for two types of statistics. As for the the API, I believe we need both a Java class based API for core libraries and applications to use but we also need to have a low level C++ interface for the VM to use at startup. Decisions such as the number of compiler and GC threads are made long before any Java byte codes can be executed. Since all of this information is operating system specific, my suggested list of functions was based on the os.hpp interface in the VM. I used it's naming convention. I was suggesting to provide an API layering jdk.internal.VM -> JVM_Functions -> os.hpp interface. I agree that jdk.internal.VM is not the best package and like your jdk.internal.Platform suggestion. Today, the number of available processors is exposed in java.lang.Runtime::availableProcessors -> JVM_AvailableProcessors -> os::active_processor_count(). I was attempting to be consistent with the current implementation.
28-06-2017