Bug ID: JDK-8197867 Update CPU count algorithm when both cpu shares and quotas are used

Summary
-------

The algorithm for selecting the count of processors that the JVM is configured to use on Linux systems with cgroups enabled has been changed to allow the algorithm to be selectable via a hotspot flag (-XX:+PreferContainerQuotaForCPUCount).  If true (default), cpu quota settings (if set), will be used to determine the number of cpus without exceeding the number of physical processors available on the host.  If this flag is false, the JVM will revert to the previous behavior of selecting the smaller of cpu shares, quotas or physical processors.

Problem
-------

JDK 10 included an enhancement (https://bugs.openjdk.java.net/browse/JDK-8146115) which allowed the JVM to determine the number of processors it should use based on information extracted from the Linux cgroup file system.   The algorithm used to select the number of processors was:

min(cpu shares, cpu quota/cpu period, physical cpus) *

*if cpu shares or quotas were not specified or configured, they are not included in the calculation.

This algorithm assumed that developers would select either cpu shares or quotas and not both.  There are popular use cases (Kubernetes) which use both cpu shares and cpu quotas in order to provide a minimum to maximum range of cpu resource limits.  Since our formula always selects the minimum, we were potentially underutilizing the host system cpu resources.

Solution
--------

The solution is to alter the formula to provide two modes when both cpu shares and quotas are set.  If the flag PreferContainerQuotaForCPUCount is set to true, take the cpu quota value, if provided, without exceeding the number of physical cpus on the system.  If the flag PreferContainerQuotaForCPUCount is false, use the minimum of cpu shares or cpu quotas, if set, without exceeding the number of physical cpus in the system.  If only one of cpu shares or cpu quotas is provided, then use the specified value limited by the number of physical processors in the system.

Specification
-------------

    diff --git a/src/hotspot/os/linux/globals_linux.hpp b/src/hotspot/os/linux/globals_linux.hpp
    --- a/src/hotspot/os/linux/globals_linux.hpp
    +++ b/src/hotspot/os/linux/globals_linux.hpp
    @@ -62,6 +62,11 @@
       product(bool, UseContainerSupport, true,                              \
               "Enable detection and runtime container configuration support") \
                                                                             \
    +  product(bool, PreferContainerQuotaForCPUCount, true,                  \
    +          "Calculate the container CPU availability based on the value" \
    +          " of quotas \(if set\), when true. Otherwise, use the CPU"    \
    +          " shares value, provided it is less than quota.")             \
    +                                                                        \
       diagnostic(bool, UseCpuAllocPath, false,                              \
                  "Use CPU_ALLOC code path in os::active_processor_count ")
     
    diff --git a/src/hotspot/os/linux/osContainer_linux.cpp b/src/hotspot/os/linux/osContainer_linux.cpp
    --- a/src/hotspot/os/linux/osContainer_linux.cpp
    +++ b/src/hotspot/os/linux/osContainer_linux.cpp
    @@ -499,11 +499,11 @@
     /* active_processor_count
      *
      * Calculate an appropriate number of active processors for the
    - * VM to use based on these three cgroup options.
    + * VM to use based on these three inputs.
      *
      * cpu affinity
    - * cpu quota & cpu period
    - * cpu shares
    + * cgroup cpu quota & cpu period
    + * cgroup cpu shares
      *
      * Algorithm:
      *
    @@ -512,43 +512,62 @@
      * If user specified a quota (quota != -1), calculate the number of
      * required CPUs by dividing quota by period.
      *
    - * If shares are in effect (shares != -1), calculate the number
    - * of cpus required for the shares by dividing the share value
    + * If shares are in effect (shares != -1), calculate the number 
    + * of CPUs required for the shares by dividing the share value
      * by PER_CPU_SHARES.
      *
      * All results of division are rounded up to the next whole number.
      *
    - * Return the smaller number from the three different settings.
    + * If neither shares or quotas have been specified, return the
    + * number of active processors in the system.
      *
    + * If both shares and quotas have been specified, the results are
    + * based on the flag PreferContainerQuotaForCPUCount.  If true,
    + * return the quota value.  If false return the smallest value 
    + * between shares or quotas.
    + *
    + * If shares and/or quotas have been specified, the resulting number
    + * returned will never exceed the the number of active processors.
    + * 
      * return:
    - *    number of cpus
    - *    OSCONTAINER_ERROR if failure occured during extract of cpuset info
    + *    number of CPUs
      */
     int OSContainer::active_processor_count() {
    -  int cpu_count, share_count, quota_count;
    -  int share, quota, period;
    +  int quota_count = 0, share_count = 0;
    +  int cpu_count, limit_count;
       int result;
     
    -  cpu_count = os::Linux::active_processor_count();
    +  cpu_count = limit_count = os::Linux::active_processor_count();
    +  int quota  = cpu_quota();
    +  int period = cpu_period();
    +  int share  = cpu_shares();
     
    -  share = cpu_shares();
    +  if (quota > -1 && period > 0) {
    +    quota_count = ceilf((float)quota / (float)period);
    +    log_trace(os, container)("CPU Quota count based on quota/period: %d", quota_count);
    +  }
       if (share > -1) {
         share_count = ceilf((float)share / (float)PER_CPU_SHARES);
    -    log_trace(os, container)("cpu_share count: %d", share_count);
    -  } else {
    -    share_count = cpu_count;
    +    log_trace(os, container)("CPU Share count based on shares: %d", share_count);
       }
     
    -  quota = cpu_quota();
    -  period = cpu_period();
    -  if (quota > -1 && period > 0) {
    -    quota_count = ceilf((float)quota / (float)period);
    -    log_trace(os, container)("quota_count: %d", quota_count);
    -  } else {
    -    quota_count = cpu_count;
    +  // If both shares and quotas are setup results depend
    +  // on flag PreferContainerQuotaForCPUCount.
    +  // If true, limit CPU count to quota
    +  // If false, use minimum of shares and quotas
    +  if (quota_count !=0 && share_count != 0) {
    +    if (PreferContainerQuotaForCPUCount) {
    +      limit_count = quota_count;
    +    } else {
    +      limit_count = MIN2(quota_count, share_count);
    +    }
    +  } else if (quota_count != 0) {
    +    limit_count = quota_count;
    +  } else if (share_count != 0) {
    +    limit_count = share_count;
       }
     
    -  result = MIN2(cpu_count, MIN2(share_count, quota_count));
    +  result = MIN2(cpu_count, limit_count);
       log_trace(os, container)("OSContainer::active_processor_count: %d", result);
       return result;
     }

Moving to Approved.
23-02-2018
Consensus reached. Reviewed.
22-02-2018
I'm voting to move this request to Provisional, but I share David's concerns that an additional option may be necessary to adequately configure the desired behavior. Additionally, I believe a change in the container policy merits a release note. Please finalize the request once there is more consensus on the proper approach and one or more engineers have reviewed the request.
15-02-2018
The initial assumption may have been that only one of shares or quotas was specified but the logic simply took the minimum of all three values to ensure none of the three "limits" was exceeded - which would be bad. This change will now take the maximum of the shares/quotas value - even if the result exceeds the "limit" set by the other value. This would be a very observable change in behaviour for any existing container deployments IMHO. As I wrote in the original container JEP (JDK-8182070) neither quotas nor shares readily translate into a meaningful number that represents "available processors" such that we can use it to size the number of threads used by various subsystems in the VM and JDK. What we did was provide a simple heuristic calculation that was "better than nothing". Given that, we have even less of an idea how a combination of quotas and shares should be used to determine "available processors". Is it the max? The min? An average? Simple answer is that the VM does not know. We have to provide a reasonable default (and I'd argue the existing behaviour is more reasonable than the proposed change); but we also need a way for the user to tell us how to interpret these values. It is not sufficient to simply punt back to the next level up and say it is up to the Kubernetes deployment to deal with it. I think we have to be able to deal with it at the VM level. Bob has argued that the user can always override this by setting ActiveProcessorCount as desired, but I'm not sure the user deploying this will necessarily know, or can reasonably find out, the four values that the VM is using to make its calculation (shares, quotas, period and cpu-set). At a minimum I think we should have a flag that at least controls where we use the max or min or quotas/shares when both are set. E.g. boolean, MinimizeContainerSharesAndQuotas When true causes the VM to use the minimum of the 'available processors' calculated for container CPU shares and quotas, in the overall available processor calculation. Else use the maximum.
14-02-2018