JDK-8254854 : [cgroups v1] Metric limits not properly detected on some join controller combinations
  • Type: Bug
  • Component: core-svc
  • Sub-Component: tools
  • Affected Version: openjdk8u272,11,15,16
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • OS: linux
  • CPU: generic
  • Submitted: 2020-10-15
  • Updated: 2022-01-05
  • Resolved: 2020-10-22
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11 JDK 13 JDK 15 JDK 16 Other
11.0.10Fixed 13.0.6Fixed 15.0.2Fixed 16 b22Fixed openjdk8u292Fixed
Related Reports
Relates :  
Relates :  
Description
This bug is similar to JDK-8217766. In fact, it had an incomplete fix for the Metrics (Java) side.

On a system with join controllers, /proc/self/cgroup, might look like this:

9:pids:/user.slice/user-1000.slice/session-2.scope
8:perf_event:/
7:blkio:/user.slice
6:rdma:/
5:cpuset:/
4:devices:/user.slice
3:cpu,cpuacct,memory,net_cls,net_prio,hugetlb:/user.slice/user-1000.slice/session-2.scope
2:freezer:/
1:name=systemd:/user.slice/user-1000.slice/session-2.scope
0::/user.slice/user-1000.slice/session-2.scope

Then, the Java code to set the path to for the controller reads:

        try (Stream<String> lines =
                CgroupUtil.readFilePrivileged(Paths.get("/proc/self/cgroup"))) {

            lines.map(line -> line.split(":"))
                 .filter(line -> (line.length >= 3))
                 .forEach(line -> setSubSystemControllerPath(subsystem, line));

        } catch (IOException e) {
            return null;
        }


where setSubSystemController() reads:

/**
     * setSubSystemPath based on the contents of /proc/self/cgroup
     */
    private static void setSubSystemControllerPath(CgroupV1Subsystem subsystem, String[] entry) {
        String controllerName;
        String base;
        CgroupV1SubsystemController controller = null;
        CgroupV1SubsystemController controller2 = null;

        controllerName = entry[1];
        base = entry[2];
        if (controllerName != null && base != null) {
            switch (controllerName) {
                case "memory":
                    controller = subsystem.memoryController();
                    break;
                case "cpuset":
                    controller = subsystem.cpuSetController();
                    break;
                case "cpu,cpuacct":
                case "cpuacct,cpu":
                    controller = subsystem.cpuController();
                    controller2 = subsystem.cpuAcctController();
                    break;
                case "cpuacct":
                    controller = subsystem.cpuAcctController();
                    break;
                case "cpu":
                    controller = subsystem.cpuController();
                    break;
                case "blkio":
                    controller = subsystem.blkIOController();
                    break;
                // Ignore subsystems that we don't support
                default:
                    break;
            }
        }

        if (controller != null) {
            controller.setPath(base);
            if (controller instanceof CgroupV1MemorySubSystemController) {
                CgroupV1MemorySubSystemController memorySubSystem = (CgroupV1MemorySubSystemController)controller;
                boolean isHierarchial = getHierarchical(memorySubSystem);
                memorySubSystem.setHierarchical(isHierarchial);
                boolean isSwapEnabled = getSwapEnabled(memorySubSystem);
                memorySubSystem.setSwapEnabled(isSwapEnabled);
            }
        if (controller != null) {
            controller.setPath(base);
            if (controller instanceof CgroupV1MemorySubSystemController) {
                CgroupV1MemorySubSystemController memorySubSystem = (CgroupV1MemorySubSystemController)controller;
                boolean isHierarchial = getHierarchical(memorySubSystem);
                memorySubSystem.setHierarchical(isHierarchial);
                boolean isSwapEnabled = getSwapEnabled(memorySubSystem);
                memorySubSystem.setSwapEnabled(isSwapEnabled);
            }
            subsystem.setActiveSubSystems();
        }
        if (controller2 != null) {
            controller2.setPath(base);
        }

So for the example /proc/self/cgroup file it only sets the path for "blkio" and "cpuset". Others are not correctly set because they are on a joined path.

The net effect of this is that Metrics doesn't report the container limits correctly on such systems.
Comments
Fix request (13u) Requesting backport to 13u for parity with 11u. The patch doesn't apply cleanly since 13u doesn't have cgroups v2 support (JDK-8231111), so it reapplied manually to Metrics::setSubSystemPath() method instead of CgroupV1Subsystem::setSubSystemControllerPath() from the original patch. Tested with tier1 and container tests. RFR approval: http://mail.openjdk.java.net/pipermail/jdk-updates-dev/2020-December/004515.html
23-12-2020

Fix Request (OpenJDK 8u): Please approve backporting this to OpenJDK 8u. JDK-8217766 has been backported to 8u282 which only has the partial fix. The same issue is present in current OpenJDK 8u as was the case for JDK 11, JDK 15 and 16. The JDK 11u patch applies cleanly after unshuffeling (no module system in JDK 8u). Container tests pass as before this patch on OpenJDK 8u Linux with cgroup v1 (only version supported). Also verified manually that the fix works as expected (see the previous comment). Risk should be minimal as it's only changing some /proc/self/cgroup parsing code which is well covered by container tests and the above manual test. Linux-only change too.
03-12-2020

Before after for OpenJDK 8u: $ sudo docker run --rm -ti --memory 200M <container-with-jdk8> /bin/bash Before: [root@4fd8366a9b63 /]# ./jdk-before/bin/java -XshowSettings:system -version Operating System Metrics: Provider: cgroupv1 Effective CPU Count: 1 CPU Period: 0us CPU Quota: 0us CPU Shares: -1 List of Processors, 1 total: 0 List of Effective Processors, 1 total: 0 List of Memory Nodes, 1 total: 0 List of Available Memory Nodes, 1 total: 0 CPUSet Memory Pressure Enabled: false Memory Limit: 0.00K Memory Soft Limit: 0.00K Memory & Swap Limit: 0.00K Kernel Memory Limit: 0.00K TCP Memory Limit: 0.00K Out Of Memory Killer Enabled: true openjdk version "1.8.0_272-internal" OpenJDK Runtime Environment (build 1.8.0_272-internal-b07) OpenJDK 64-Bit Server VM (build 25.272-b07, mixed mode) After: [root@4fd8366a9b63 /]# ./jdk-after/bin/java -XshowSettings:system -version Operating System Metrics: Provider: cgroupv1 Effective CPU Count: 1 CPU Period: 100000us CPU Quota: -1 CPU Shares: -1 List of Processors, 1 total: 0 List of Effective Processors, 1 total: 0 List of Memory Nodes, 1 total: 0 List of Available Memory Nodes, 1 total: 0 CPUSet Memory Pressure Enabled: false Memory Limit: 200.00M Memory Soft Limit: Unlimited Memory & Swap Limit: 400.00M Kernel Memory Limit: Unlimited TCP Memory Limit: Unlimited Out Of Memory Killer Enabled: true openjdk version "1.8.0_282-internal" OpenJDK Runtime Environment (build 1.8.0_282-internal-b01) OpenJDK 64-Bit Server VM (build 25.282-b01, mixed mode) [root@4fd8366a9b63 /]# grep memory /proc/self/cgroup 2:cpu,cpuacct,memory,net_cls,net_prio,hugetlb:/docker/4fd8366a9b63121781074e1617b0379736f3f21475853d24d45ccb38b3f085d2 Note the correct memory and memory swap limit of 200M and 400M, respectively, in the after case.
03-12-2020

Fix Request (OpenJDK 11u): Please approve backporting this to OpenJDK 11u. The same issue is present in 11.0.9 as with JDK 15 and 16. The patch is different since 11u doesn't have cgroups v2 support. Patch was reviewed by Paul Hohensee. Risk should be minimal. https://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8254854/01/webrev/ RFR: https://mail.openjdk.java.net/pipermail/jdk-updates-dev/2020-November/004060.html
23-11-2020

Before/After for OpenJDK 11: $ sudo docker run --rm -ti --memory=200m fedora-32-jdk11 /bin/bash [root@3d4de76de4ad /]# ./jdk-before/bin/java -XshowSettings:system -version Operating System Metrics: Provider: cgroupv1 Effective CPU Count: 1 CPU Period: 0us CPU Quota: 0us CPU Shares: -1 List of Processors, 1 total: 0 List of Effective Processors, 1 total: 0 List of Memory Nodes, 1 total: 0 List of Available Memory Nodes, 1 total: 0 CPUSet Memory Pressure Enabled: false Memory Limit: 0.00K Memory Soft Limit: 0.00K Memory & Swap Limit: 0.00K Kernel Memory Limit: 0.00K TCP Memory Limit: 0.00K Out Of Memory Killer Enabled: true openjdk version "11.0.10-internal" 2021-01-19 OpenJDK Runtime Environment (build 11.0.10-internal+0-adhoc.sgehwolf.openjdk-11-dev) OpenJDK 64-Bit Server VM (build 11.0.10-internal+0-adhoc.sgehwolf.openjdk-11-dev, mixed mode) [root@3d4de76de4ad /]# ./jdk-after/bin/java -XshowSettings:system -version Operating System Metrics: Provider: cgroupv1 Effective CPU Count: 1 CPU Period: 100000us CPU Quota: -1 CPU Shares: -1 List of Processors, 1 total: 0 List of Effective Processors, 1 total: 0 List of Memory Nodes, 1 total: 0 List of Available Memory Nodes, 1 total: 0 CPUSet Memory Pressure Enabled: false Memory Limit: 200.00M Memory Soft Limit: Unlimited Memory & Swap Limit: 400.00M Kernel Memory Limit: Unlimited TCP Memory Limit: Unlimited Out Of Memory Killer Enabled: true openjdk version "11.0.10-internal" 2021-01-19 OpenJDK Runtime Environment (build 11.0.10-internal+0-adhoc.sgehwolf.openjdk-11-dev) OpenJDK 64-Bit Server VM (build 11.0.10-internal+0-adhoc.sgehwolf.openjdk-11-dev, mixed mode) [root@3d4de76de4ad /]# grep memory /proc/self/cgroup 2:cpu,cpuacct,memory,net_cls,net_prio,hugetlb:/docker/3d4de76de4ad0dcc7fe14daba689abe2a8622b142ffb7c2bf30d9acff145a517
29-10-2020

Fix request (15u): Please approve backporting this to 15u. Same issue applies and the patch applies cleanly. Container testing showed now regressions on cgroupv1. It's a cgroups v1 only fix.
23-10-2020

Changeset: a0b687bf Author: Severin Gehwolf <sgehwolf@openjdk.org> Date: 2020-10-22 16:36:29 +0000 URL: https://git.openjdk.java.net/jdk/commit/a0b687bf
22-10-2020

Proposed patch: https://github.com/jerboaa/jdk/commit/793950fc8a4d9119009d9fff7537b6ae39af688e $ sudo docker run -ti --rm --memory=200m jdk16-f32:latest /bin/bash [root@759704324c64 /]# ./jdk-before/bin/java -XshowSettings:system -version Operating System Metrics: Provider: cgroupv1 Effective CPU Count: 1 CPU Period: -1 CPU Quota: -1 CPU Shares: -1 List of Processors, 1 total: 0 List of Effective Processors, 1 total: 0 List of Memory Nodes, 1 total: 0 List of Available Memory Nodes, 1 total: 0 Memory Limit: Unlimited Memory Soft Limit: Unlimited Memory & Swap Limit: Unlimited openjdk version "16-internal" 2021-03-16 OpenJDK Runtime Environment (fastdebug build 16-internal+0-adhoc.sgehwolf.jdk-jdk) OpenJDK 64-Bit Server VM (fastdebug build 16-internal+0-adhoc.sgehwolf.jdk-jdk, mixed mode, sharing) [root@759704324c64 /]# ./jdk-after/bin/java -XshowSettings:system -version Operating System Metrics: Provider: cgroupv1 Effective CPU Count: 1 CPU Period: 100000us CPU Quota: -1 CPU Shares: -1 List of Processors, 1 total: 0 List of Effective Processors, 1 total: 0 List of Memory Nodes, 1 total: 0 List of Available Memory Nodes, 1 total: 0 Memory Limit: 200.00M Memory Soft Limit: Unlimited Memory & Swap Limit: 400.00M openjdk version "16-internal" 2021-03-16 OpenJDK Runtime Environment (build 16-internal+0-adhoc.sgehwolf.jdk-jdk) OpenJDK 64-Bit Server VM (build 16-internal+0-adhoc.sgehwolf.jdk-jdk, mixed mode, sharing) [root@759704324c64 /]# grep memory proc/self/cgroup 2:cpu,cpuacct,memory,net_cls,net_prio,hugetlb:/docker/759704324c640ad8cbe1214811297f699b3782367a04a2ef6200531f8297da8b
21-10-2020

On an affected system I see this: $ sudo docker run -ti --rm --memory=200m bce2e9ff6c4a /bin/bash [root@a546c0639c8a /]# /jdk-before/bin/java -Xlog:os+container=info -version [0.002s][info][os,container] Memory Limit is: 209715200 openjdk version "16-internal" 2021-03-16 OpenJDK Runtime Environment (fastdebug build 16-internal+0-adhoc.sgehwolf.jdk-jdk) OpenJDK 64-Bit Server VM (fastdebug build 16-internal+0-adhoc.sgehwolf.jdk-jdk, mixed mode, sharing) [root@a546c0639c8a /]# /jdk-before/bin/java -XshowSettings:system -version Operating System Metrics: Provider: cgroupv1 Effective CPU Count: 1 CPU Period: -1 CPU Quota: -1 CPU Shares: -1 List of Processors, 1 total: 0 List of Effective Processors, 1 total: 0 List of Memory Nodes, 1 total: 0 List of Available Memory Nodes, 1 total: 0 Memory Limit: Unlimited Memory Soft Limit: Unlimited Memory & Swap Limit: Unlimited openjdk version "16-internal" 2021-03-16 OpenJDK Runtime Environment (fastdebug build 16-internal+0-adhoc.sgehwolf.jdk-jdk) OpenJDK 64-Bit Server VM (fastdebug build 16-internal+0-adhoc.sgehwolf.jdk-jdk, mixed mode, sharing) [root@a546c0639c8a /]# grep memory /proc/self/cgroup 2:cpu,cpuacct,memory,net_cls,net_prio,hugetlb:/docker/a546c0639c8af4a305a070c7e497f42e224777f1b8d4732764f81201b2346dee So hotspot correctly detects the memory limit, the java metrics don't. "Memory Limit: Unlimited" in -XshowSettings:system output.
21-10-2020

With JDK-8254001 implemented it would be easy to write a regression test independent of the underlying system. It's just too easy to regress in this area without any tests.
15-10-2020