JDK-8228428 : OperatingSystemMXBean should be made container aware
  • Type: CSR
  • Component: core-svc
  • Sub-Component: java.lang.management
  • Priority: P3
  • Status: Closed
  • Resolution: Approved
  • Fix Versions: 14
  • Submitted: 2019-07-19
  • Updated: 2019-12-05
  • Resolved: 2019-12-05
Related Reports
CSR :  
Description
Summary
-------

Update the `com.sun.management.OperatingSystemMXBean` specification to make it clear that it reports values based on the current operating environment - such as a container environment. Replace methods that refer to "physical" or "system" values with ones that do not, by adding new (better-named) methods, deprecating the existing ones, and require that the existing methods return the same value as the new methods.

Problem
-------

When executing in a container, or other virtualized operating environment, we would like to know information specific to that environment - such as the amount of memory that is available to the JVM process. The existing `com.sun.management.OperatingSystemMXBean` interface was designed with only a native operating system in mind and its API reflects that in places. For example, some methods are named and specified in terms of "physical memory" as reported by the operating system. The notion of "physical memory" is out-dated in such operating environments, where we want to know the amount of memory available to the JVM process; further, information about "physical memory" may be unobtainable in these environments.

Solution
--------

Clarify that the `OperatingSystemMXBean` reports values specific to the actual operating environment of the JVM process, whether native OS or virtualized OS, or container etc.

Define new methods in the `OperatingSystemMXBean` to replace those that are expressed in terms of "physical" or "system" values; deprecate the existing methods and update them to return the same values as the new methods e.g:

 - `getFreeMemorySize()`
 - `getTotalMemorySize()`
 -  `getCpuLoad()`

By doing this we both provide the desired, useful, information to existing users of the API, whilst providing an updated API for them to migrate to, which avoids potentially confusing references to "physical" and "system" values.

Specification
-------------

    *** 28,37 ****
    --- 28,43 ----
      /**
       * Platform-specific management interface for the operating system
       * on which the Java virtual machine is running.
       *
       * <p>
    +  * This interface provides information about the operating environment
    +  * on which the Java virtual machine is running. That might be a native
    +  * operating system, a virtualized operating system environment, or a
    +  * container-managed environment.
    +  *
    +  * <p>
       * The {@code OperatingSystemMXBean} object returned by
       * {@link java.lang.management.ManagementFactory#getOperatingSystemMXBean()}
       * is an instance of the implementation class of this interface
       * or {@link UnixOperatingSystemMXBean} interface depending on
       * its underlying operating system.
    *** 80,117 ****
    --- 86,176 ----
          public long getProcessCpuTime();
      
          /**
           * Returns the amount of free physical memory in bytes.
           *
    +      * @deprecated Use {@link #getFreeMemorySize()} instead of
    +      * this historically named method.
    +      *
    +      * @implSpec This implementation must return the same value
    +      * as {@link #getFreeMemorySize()}.
    +      *
           * @return the amount of free physical memory in bytes.
           */
    +     @Deprecated(since="14")
          public default long getFreePhysicalMemorySize() { return getFreeMemorySize(); }
      
          /**
    +      * Returns the amount of free memory in bytes.
    +      *
    +      * @return the amount of free memory in bytes.
    +      * @since 14
    +      */
    +     public long getFreeMemorySize();
    + 
    +     /**
           * Returns the total amount of physical memory in bytes.
           *
    +      * @deprecated Use {@link #getMemorySize()} instead of
    +      * this historically named method.
    +      *
    +      * @implSpec This implementation must return the same value
    +      * as {@link #getMemorySize()}.
    +      *
           * @return the total amount of physical memory in  bytes.
           */
    +     @Deprecated(since="14")
          public default long getTotalPhysicalMemorySize() { return getTotalMemorySize(); }
      
          /**
    +      * Returns the total amount of memory in bytes.
    +      *
    +      * @return the total amount of memory in  bytes.
    +      * @since 14
    +      */
    +     public long getTotalMemorySize();
    + 
    +     /**
           * Returns the "recent cpu usage" for the whole system. This value is a
           * double in the [0.0,1.0] interval. A value of 0.0 means that all CPUs
           * were idle during the recent period of time observed, while a value
           * of 1.0 means that all CPUs were actively running 100% of the time
           * during the recent period being observed. All values betweens 0.0 and
           * 1.0 are possible depending of the activities going on in the system.
           * If the system recent cpu usage is not available, the method returns a
           * negative value.
           *
    +      * @deprecated Use {@link #getCpuLoad()} instead of
    +      * this historically named method.
    +      *
    +      * @implSpec This implementation must return the same value
    +      * as {@link #getCpuLoad()}.
    +      *
           * @return the "recent cpu usage" for the whole system; a negative
           * value if not available.
           * @since   1.7
           */
    +     @Deprecated(since="14")
          public default double getSystemCpuLoad() { return getCpuLoad(); }
      
          /**
    +      * Returns the "recent cpu usage" for the operating environment. This value
    +      * is a double in the [0.0,1.0] interval. A value of 0.0 means that all CPUs
    +      * were idle during the recent period of time observed, while a value
    +      * of 1.0 means that all CPUs were actively running 100% of the time
    +      * during the recent period being observed. All values betweens 0.0 and
    +      * 1.0 are possible depending of the activities going on.
    +      * If the recent cpu usage is not available, the method returns a
    +      * negative value.
    +      *
    +      * @return the "recent cpu usage" for the whole operating environment;
    +      * a negative value if not available.
    +      * @since 14
    +      */
    +     public double getCpuLoad();
    + 

 

Comments
Moving to Approved; thanks.
05-12-2019

The interface was updated to changing the existing methods that are being deprecated into default methods which call the new methods.
05-12-2019

[~dholmes], I think it is better structured to use the default methods even if there is only one expected implementation., but I don't think it is critical in this case. Moving to Approved. If you decide to use the default methods, update the CSR and I'll re-approve it.
02-12-2019

OperatingSystemMXBean is an interface but it is a platform managed object. There is only one implementor of this interface and we define it. Hence while a public interface would be evolved by adding default methods, there is no need to declare these as default methods in this case. There is no change needed for `getProcessCpuLoad`. Moving back to Finalized.
27-11-2019

As OperatingSystemMXBean is an interface, I'd recommend changing the existing methods that are being deprecated into default methods which call the new methods. For example, + * @deprecated Use {@link #getCpuLoad()} instead of + * this historically named method. + * + * @implSpec This implementation must return the same value + * as {@link #getCpuLoad()}. + * * @return the "recent cpu usage" for the whole system; a negative * value if not available. * @since 1.7 */ + @Deprecated(since="14") public default double getSystemCpuLoad() {return getCpuLoad();} Also, should getProcessCpuLoad be updated as part of this change too? Moving to Pended for tracking purposes; please re-finalize when ready.
27-11-2019

[~sgehwolf] The spec change itself spells out what we mean by operating environment: * <p> * This interface provides information about the operating environment * on which the Java virtual machine is running. That might be a native * operating system, a virtualized operating system environment, or a * container-managed environment. Thanks for all the Reviews. Moving to finalized.
25-11-2019

[~dholmes] Some wording changes: 1. "in such virtualized environments" => "in such operating environments" 2. " further, information about "physical memory" may be unobtainable in such environments. " => " further, information about "physical memory" may be unobtainable in these environments." Perhaps "operating environment" needs to be spelled out explicitly somewhere: It may refer to: physical machine, virtualized environment (VmWare, KVM, etc), container (perhaps some more I'm missing).
22-11-2019

[~sgehwolf] you edited this yesterday but I can't spot any difference - what did you change please?
22-11-2019

[~mchung] I have added the `(since="14")` - thanks. Could everyone please edit the issue and add themselves as Reviewers and I will move this to finalized. Thanks.
22-11-2019

[~dholmes] also thanks for doing this, I guess I was wrong about the complexity of the process towards agreement on a proposed solution! :)
21-11-2019

[~dholmes] Your proposed spec change looks good, simple and clean. Thanks for doing this. The deprecation should mention the release from which it deprecates, i.e. `@Deprecated(since="14")`.
21-11-2019

Looks good to me. Thanks for doing this David.
21-11-2019

Second pass now done. Simplified things by updating the interface level docs to establish we always return appropriate values.
21-11-2019

In order for these APIs to be consistent, I would only expect to see container specific total and free swap space when running the JVM in a container so the specification needs to be clarified. The reason for the use of getMemoryAndSwapLimit and getMemoryAndSwapUsage is that this is how cgroup exports swap limit and usage information. The limits and usage values available in the /proc file systems are the sum of memory+swap. The implementation in the webrev needs to be fixed to subtract the total and free memory from the returned values.
20-11-2019

Hi Bob, This was just the first pass for the most problematic methods. But three of the remaining methods refer to the JVM process so nothing further needs to be said there that I can see. That leaves `getTotalSwapSpaceSize` / `getFreeSwapSpaceSize` - which I am quite unsure about: what does swap mean in relation to a container? Do we need to be able to report the OS-level swap space configuration as well as a container specific value? (Assuming the former is available when in a container?) I see in the webrev it uses `getMemoryAndSwapLimit()` but am not sure exactly what that means - the 'and' concerns me.
20-11-2019

David, The updates look good for the three methods you describe but shouldn't this CSR also mention that we will clarify the specifications of other com.sun.management.OperatingSystemMXBean methods to allow for the values to be operating environment specific?
20-11-2019

> David Holmes The new proposal sounds reasonable to me. Thanks for getting this unblo +1
20-11-2019

[~dholmes] The new proposal sounds reasonable to me. Thanks for getting this unblocked!
20-11-2019

I've made an initial pass at rewriting this CSR request for the updates to the `com.sun.management.OperatingSystemMXBean`. The suggested names for the new methods are quite simple and should be uncontroversial. The actual specifications will be fairly minimal as per the existing methods - but updated to mention the operating environment. I have not included anything in relation to `java.lang.management.OperatingSystemMXBean.getSystemLoadAverage` as that is a core SE API and the process for modifying it would be different. Further it is unclear if we actually have a reasonable notion of what should be reported for that API while in a container. I did not see anything proposed in the webrev: http://cr.openjdk.java.net/~sgehwolf/webrevs/aazores/JDK-8226575/01/webrev/
20-11-2019

I have reset the Reviewers list so we can start afresh once this proposal has been updated.
19-11-2019

We have a new proposal for advancing this. The plan is to update the `com.sun.management.OperatingSystemMXBean` with a set of replacement methods for those that currently refer to "system" or "physical" values, and those methods will be fully container aware. The existing methods will be updated to call the new methods and at the same time will be deprecated in favor of the new methods. In that way we both fix the implementations to return the desired information and provide a migration path that allows for the eventual removal of the mis-named "physical" methods. Separately, and as a future project, we will look at whether any new beans are needed.
19-11-2019

Clearly container awareness is a worthwhile property for management systems of current JDKs. Given the lack of consensus for this particular request, I'm moving the CSR back to Draft state. Please continue the discussions and re-propose when there is more agreement on how to proceed.
16-10-2019

On a procedural note the "Problem" section as it stands is not describing the problem at all rather it is an attempt to justify the solution based on things that happen elsewhere. That rationale is not "the problem". The Problem statement should simply be the desire to return container-aware values from this API rather than system-level values, as the former are generally what the user wants anyway. I have discussed things further with Bob and a path forward is proposed. To restate my concerns here: > IMO OperatingSystemMXBean was written with a 1980's machine-view and it's not a good fit for the type of containerized environments we have today. To me trying to subvert it to mean "container" rather than "physical" (or some implied system-level view) is the wrong thing to do. It is wrong because the resulting API is still not good enough for reporting container metrics; and it is wrong because people may actually still want that physical/system-level view of things. The right thing to do is to define a new MXBean to report all the desired and appropriate container related metrics - and efforts are underway to provide that. In the meantime users only have the old `OperatingSystemMXBean` available for use and so as a migration path it is proposed that a flag be added (the default value of which is determined by the value of `UseContainerSupport`) so that the `OperatingSystemMXBean` APIs will report container related values where meaningful (the `getSystemCPULoad` method seems particularly problematic and may be better left untouched). The flag will be deprecated once the new MXBean is available. And likely the `OperatingSystemMXBean` should also be "deprecated". I'm not clear what this means for changing the current specifications of this API to allow this change in interpretation of the values reported. At most I would prefer an `@apiNote` or `@implNote`, though this seems complicated by the fact the implementation is in `com.sun.management.OperatingSystemMXBean`. [~darcy] do you have any suggestions on that front?
16-10-2019

David has a good point on the name of these methods and `getTotalAvailableMemorySize` makes more sense for the container environment. For java.lang.management.OperatingSystemMXBean, we should clarify the spec of getSystemLoadAverage. For com.sun.management.OperatingSystemMXBean, the container environment was neglected when such API was defined. As David suggests, I also think a better alternative is to define a new MXBean that provides the metrics that are sensible for container view and leaves c.s.m.OperatingSystemMXBean as a host view. It's unfortunate but not too bad. We can consider the object name for the new MXBean to have the same OS MXBean prefix followed by a new property e.g. java.lang:type=OperatingSystem,name=cgroupv1
16-10-2019

From the comment left by [~dholmes], I'm inferring there is not yet consensus on the how to proceed in this problem space. Given the description of the problem, I was expecting to see some specification updates to state "Now reports container-aware values when run in a container!" I suggest having [~alanb] or [~mchung] review the module-info updates. On a procedural note, some stand-alone representation of the changes needs to be attached to or listed in-line as part of the CSR. A link to a webrev is informative, but not sufficient on its own. Please work on determining what specification updates are needed to describe this work; I believe such spec work is needed for this CSR to advance. Separately, I'd also like to see agreement with [~dholmes] on how to move forward before this CSR progresses to later states.
16-10-2019

Moving this to forward for actual CSR review.
15-10-2019

[~dholmes] I've added a code example from hotspot for os::physical_memory() as it is now. It seems weird that the OperatingSystem Mbean couldn't or shouldn't be amended to work the same way as the JVM does. Isn't the whole point of a CSR to be able to change or update specification and/or implementations?
15-10-2019

The "issue" with getAvailableProcessors is that it keeps being used as a justification for making other changes: "we return the container value for available processors therefore we must return the container value for everything for consistency.". That would be true if all the API were specified to report what was available to the VM as I have stated, but that is not the case.
15-10-2019

Regardless of the name os::physical_memory is intended to inform the VM how much memory is available to it. It is irrelevant with regards to the existing OperatingSystemMBean API. If that API was specified to report the available memory then it would be fine. CSR requests are intended to ensure that any proposed changes have considered compatibility and do not break compatibility unduly. My view on this (and my view is of no more weight than others) is that the existing OS MBean API's that are specified to report system/host values are not suitable to be changed to report the available value and that a new API should be defined for that purpose. The OS MBean API is quite inappropriate for the environments in which we execute these days (and has been for some time). Rather than try to make a square API fit into a round hole it would be better IMO to define a new MBean that is more suited to actual execution environments.
15-10-2019

In response the David's comment related to getAvailableProcessors ... This is a non issue. This API already does the right thing in containers. The API ends up with a call to the VM to extract the JVM_ActiveProcessorCount.
14-10-2019

I have contrasting feelings on this, I think the API should reflect by default the constraints it's running on, but I agree this beans seems ill defined, or rather it doesn't really stands the test of time. Perhaps the best solution is to deprecate it and introduce better defined one. The question is if it really makes sense to introduce a container aware bean, I think in a container resources like memory and cpu are defined by the container, so for all purposes those are the "system". For this reason I concur with Bob and Severin, even if the names of the methods are misleading.
25-07-2019

> That's what I meant. If the JVM runs in a container, getAvailableProcessors() returns the quota. If the JVM runs outside a container it returns the physical number of processes. s/processes/processors/ No it does not return the physical number of processors. It returns the available number, which can be <= the physical number. The number of processors available to the JVM process can be controlled using various means: cpu affinity settings, tasksets or processor sets, running in a zone that itself is constrained in the number of processors available, etc. I think updating the existing methods that refer to "physical" or "system" is a non-starter. That will just cause confusion and for cases where you could provide either the "container" view or the OS view, you will not be able to provide both. I think this legacy MXBean is too ill-defined in our current environments and needs to be replaced by something more meaningful.
23-07-2019

> I firmly believe that we should update the existing MXBean implementation and spec to report container specific data. +1
22-07-2019

> The comment in the problem statement is incorrect. > "java.lang.management.OperatingSystemMXBean#getAvailableProcessors(), com.sun.management.OperatingSystemMXBean#getAvailableProcessors() return the physical number of processes outside a container, but the CPU quota inside a container." > The getAvailableProcessors method returns the number of processors "inside" a container so there's no inconsistency between this method and quota. [~bobv] That's what I meant. If the JVM runs in a container, `getAvailableProcessors()` returns the quota. If the JVM runs outside a container it returns the physical number of processes. I'll see if I can rephrase.
22-07-2019

I firmly believe that we should update the existing MXBean implementation and spec to report container specific data. Doing anything other than that causes too much confusion for app developers. I'm not convinced that it would be possible to provide a pure Host view from within a container without resorting to multiple processes and/or host based agents. As for the Swap comment, there are definitely container specific swap limits that are imposed by the OS.
22-07-2019

The comment in the problem statement is incorrect. "java.lang.management.OperatingSystemMXBean#getAvailableProcessors(), com.sun.management.OperatingSystemMXBean#getAvailableProcessors() return the physical number of processes outside a container, but the CPU quota inside a container." The getAvailableProcessors method returns the number of processors "inside" a container so there's no inconsistency between this method and quota.
22-07-2019

`getAvailableProcessors()` didn't change its meaning it changed its implementation. The meaning is "the number of processors available to the JVM" but the implementation failed to report that correctly in all cases. These shortcomings were not new with "containers" (we had earlier accounted for cpusets) but containers made the shortcomings more evident and exposed more people to them. Because it was a change in behaviour (even though technically bringing the behaviour inline with the specification) there were compatibility switches added so that the old behaviour could be restored.** ** There was also disagreement about how quotas etc should be interpreted as "available processors". I don't agree that a method like `getTotalPhysicalMemorySize()` can simply be re-interpreted to report container limits without causing confusion and compatibility issues. A new MXBean with `getTotalAvailableMemorySize()` would make much more sense for knowing what memory is available to the JVM whether in a container, a zone, a cgroup, or a raw OS. So this would not be duplicating the existing MXBean because it would report different things except when "total physical" was actually "available". I don't know if the "swap" API's make any sense in a container - does a container manage "swap" space?
22-07-2019

> I think these have their "expected" meanings in the context in which they are used - in particular there is no ambiguity or lack of clarity with regards to "available". FWIW, java.lang.management.OperatingSystemMXBean#getAvailableProcessors() changed its meaning with JDK-8146115. Prior JDK-8146115 within a container it reported the "available" value to the OS (i.e. the host/node value). After, it changed its meaning to "CPU quota in place" unless -XX:-UseContainerSupport is being used as a JVM switch. > One can argue whether "system" and "physical" are appropriate things to report but we cannot just decide to interpret these in a different way. Perhaps. The argument seems to be: Changing the meaning of "available" is OK, but changing the meaning of "system" or "physical" is not. I remain sceptical this is the approach of "least surprises". Note that the javadoc of java.lang.management.OperatingSystemMXBean#getAvailableProcessors() says: "Returns the number of processors available to the Java virtual machine. This method is equivalent to the Runtime.availableProcessors() method. This value may change during a particular invocation of the virtual machine.". It doesn't say that the meaning of it changes whether or not a JVM is running inside a container or outside. What's more, the JVM itself will "behave" inside a container after JDK-8146115 as if it runs on an OS with the container specs, not the OS specs. > And as I said the simplest thing to perhaps attempt as an initial partial solution is a MBean that provides information only about resources available to the JVM process. But again as I said each resource has to be considered individually to come up with a meaningful way to define that resource within a container. Wouldn't this duplicate most of what the OS MXBean return? Does any of the above positions change if there was a JDK switch (like -XX:-UseContainerSupport), a property perhaps, to not take container constraints into account in the OS MXbean for backwards compatibility?
22-07-2019

> One could argue that method names which include quantifiers like system, physical and available should pertain to system, physical an available resources. Yet, it's nowhere explicitly specified what physical, system and available mean in the context of containers. I think these have their "expected" meanings in the context in which they are used - in particular there is no ambiguity or lack of clarity with regards to "available". One can argue whether "system" and "physical" are appropriate things to report but we cannot just decide to interpret these in a different way. Ideally we would simply be reporting what the operating system itself reports for similarly named quantities. Unfortunately when it comes to containers and other virtualization environments, the OS API's themselves are a mess and inconsistent in how they interact with those technologies. I think we have to examine each of the methods in turn and consider whether what they claim to do is reasonable, and if the implementation actually does what the specification implies. (As noted `getSystemLoadAverage()` is logically inconsistent - it cannot provide a "system" average based on "available" processor use.) If we then desire additional container specific information then we need to add new API's to expose those. A further complexity is that although there is a loose hierarchy with regard to the scope of a query i.e. - what is available at the hardware level - what is available at the OS level - what is available at the "container" level - what is available to the JVM process it is not that simple as the hardware and the OS can themselves be virtualized before we even reach user-level container technologies. In addition, resources exposed to the JVM process can be constrained by means independent of any specific layer e.g. by using taskset to constrain CPU availability. In many ways reporting "available to the JVM process" is the simplest and most well-defined entity, but not necessarily the only thing that should be reported. A notion of "physical" or "system" resource becomes much harder to pin down - especially if there are no well-defined APIs for querying such resources. Further, once executing in a container we may not have access to API's that will tell us about resources outside the container as that breaks the isolation boundary that the container is trying to enforce. It may be that we need to reconsider these MXBean interfaces from scratch in light of the more complex world we now find ourselves, instead of trying to retrofit updated semantics on concepts that in a container are vague at best and meaningless at worst. I don't think it makes inherent sense to just try and make the existing `OperatingSystemMXBean` interfaces "container aware" - e.g. what does "swap space" mean in a container environment. The existing MBeans could then be deprecated. And as I said the simplest thing to perhaps attempt as an initial partial solution is a MBean that provides information only about resources available to the JVM process. But again as I said each resource has to be considered individually to come up with a meaningful way to define that resource within a container.
22-07-2019