JDK-8286212 : Cgroup v1 initialization causes NPE on some systems
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 17.0.3,18,19
  • Priority: P3
  • Status: Open
  • Resolution: Unresolved
  • Submitted: 2022-05-05
  • Updated: 2022-11-18
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdUnresolved
Related Reports
Relates :  
Relates :  
Relates :  
Description
A user reports that if wildfly tests run within a container they fail to initialize with:

[ERROR] Failed to execute goal org.wildfly.plugins:wildfly-maven-plugin:2.0.1.Final:execute-commands (apply-elytron) on project wildfly-ts-integ-smoke: Failed to execute commands: Exception in thread "main"
 java.lang.NullPointerException
[ERROR]         at java.base/java.util.Objects.requireNonNull(Objects.java:208)
[ERROR]         at java.base/sun.nio.fs.UnixFileSystem.getPath(UnixFileSystem.java:263)
[ERROR]         at java.base/java.nio.file.Path.of(Path.java:147)
[ERROR]         at java.base/java.nio.file.Paths.get(Paths.java:69)
[ERROR]         at java.base/jdk.internal.platform.CgroupUtil.lambda$readStringValue$1(CgroupUtil.java:67)
[ERROR]         at java.base/java.security.AccessController.doPrivileged(AccessController.java:569)
[ERROR]         at java.base/jdk.internal.platform.CgroupUtil.readStringValue(CgroupUtil.java:69)
[ERROR]         at java.base/jdk.internal.platform.CgroupSubsystemController.getStringValue(CgroupSubsystemController.java:65)
[ERROR]         at java.base/jdk.internal.platform.CgroupSubsystemController.getLongValue(CgroupSubsystemController.java:124)
[ERROR]         at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.getLongValue(CgroupV1Subsystem.java:175)
[ERROR]         at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.getHierarchical(CgroupV1Subsystem.java:149)
[ERROR]         at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.initSubSystem(CgroupV1Subsystem.java:84)
[ERROR]         at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.getInstance(CgroupV1Subsystem.java:60)
[ERROR]         at java.base/jdk.internal.platform.CgroupSubsystemFactory.create(CgroupSubsystemFactory.java:116)
[ERROR]         at java.base/jdk.internal.platform.CgroupMetrics.getInstance(CgroupMetrics.java:167)
[ERROR]         at java.base/jdk.internal.platform.SystemMetrics.instance(SystemMetrics.java:29)
[ERROR]         at java.base/jdk.internal.platform.Metrics.systemMetrics(Metrics.java:58)
[ERROR]         at java.base/jdk.internal.platform.Container.metrics(Container.java:43)
[ERROR]         at jdk.management/com.sun.management.internal.OperatingSystemImpl.<init>(OperatingSystemImpl.java:182)
[ERROR]         at jdk.management/com.sun.management.internal.PlatformMBeanProviderImpl.getOperatingSystemMXBean(PlatformMBeanProviderImpl.java:280)
[ERROR]         at jdk.management/com.sun.management.internal.PlatformMBeanProviderImpl$3.nameToMBeanMap(PlatformMBeanProviderImpl.java:199)
[ERROR]         at java.management/java.lang.management.ManagementFactory.lambda$getPlatformMBeanServer$0(ManagementFactory.java:488)
[ERROR]         at java.base/java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:273)
[ERROR]         at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179)
[ERROR]         at java.base/java.util.HashMap$ValueSpliterator.forEachRemaining(HashMap.java:1779)
[ERROR]         at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
[ERROR]         at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
[ERROR]         at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
[ERROR]         at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
[ERROR]         at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
[ERROR]         at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
[ERROR]         at java.management/java.lang.management.ManagementFactory.getPlatformMBeanServer(ManagementFactory.java:489)
[ERROR]         at org.jboss.modules.ModuleLoader$RealMBeanReg$1.run(ModuleLoader.java:1258)
[ERROR]         at org.jboss.modules.ModuleLoader$RealMBeanReg$1.run(ModuleLoader.java:1256)
[ERROR]         at java.base/java.security.AccessController.doPrivileged(AccessController.java:318)
[ERROR]         at org.jboss.modules.ModuleLoader$RealMBeanReg.<init>(ModuleLoader.java:1256)
[ERROR]         at org.jboss.modules.ModuleLoader$TempMBeanReg.installReal(ModuleLoader.java:1240)
[ERROR]         at org.jboss.modules.ModuleLoader.installMBeanServer(ModuleLoader.java:273)
[ERROR]         at org.jboss.modules.Main.main(Main.java:605)

The relevant mountinfo line is:

941 931 0:36 /user.slice/user-1000.slice/session-50.scope /sys/fs/cgroup/memory ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,memory

The relevant line for the memory controller in /proc/self/cgroups is:

9:memory:/user.slice/user-1000.slice/session-3.scope

The relevant code reads:

public void setPath(String cgroupPath) {
        if (root != null && cgroupPath != null) {
            if (root.equals("/")) {
                if (!cgroupPath.equals("/")) {
                    path = mountPoint + cgroupPath;
                }
                else {
                    path = mountPoint;
                }
            }
            else {
                if (root.equals(cgroupPath)) {
                    path = mountPoint;
                }
                else {
                    if (cgroupPath.startsWith(root)) {
                        if (cgroupPath.length() > root.length()) {
                            String cgroupSubstr = cgroupPath.substring(root.length());
                            path = mountPoint + cgroupSubstr;
                        }
                    }
                }
            }
        }
    }

This seems to be a case not covered in the setPath() method in CgroupV1SubsystemController. root == /user.slice/user-1000.slice/session-50.scope, cGroupPath == /user.slice/user-1000.slice/session-3.scope. Therefore 'path' ends up not being set, causing the NPE.

This was originally reported here:
https://bugzilla.redhat.com/show_bug.cgi?id=2082094

More detailed info here:
https://gist.github.com/gaol/4d96eace8290e6549635fdc0ea41d0b4
Comments
https://www.kernel.org/doc/html/v5.5/admin-guide/cgroup-v2.html#core-interface-files mentions: """ cgroup.procs A read-write new-line separated values file which exists on all cgroups. When read, it lists the PIDs of all processes which belong to the cgroup one-per-line. The PIDs are not ordered and the same PID may show up more than once if the process got moved to another cgroup and then back or the PID got recycled while reading. A PID can be written to migrate the process associated with the PID to the cgroup. The writer should match all of the following conditions. It must have write access to the “cgroup.procs” file. It must have write access to the “cgroup.procs” file of the common ancestor of the source and destination cgroups. When delegating a sub-hierarchy, write access to this file should be granted along with the containing directory. In a threaded cgroup, reading this file fails with EOPNOTSUPP as all the processes belong to the thread root. Writing is supported and moves every thread of the process to the cgroup. """ So we might find a process more than once, but I'm not sure we need to account for it. Moving JVM processes around isn't really something that's supported. See also JDK-8286991
01-06-2022

[~sgehwolf] that seems like a good approach.
31-05-2022

[~iklam] Hi Ioi! I'd like to reboot the fix for this bug and would like to propose the following. I'd like to hear your feedback before I go off implementing it: 1. Keep the current logic for short-circuiting the cgroup path for the "host" and "container" cases 2. For the cases where we don't know (like substring match or this bug) use the scanning approach you've proposed: 2.a) If the controller is enabled to begin with for the current process, there must be a 'cgroup.procs' file under the controller hierarchy (mount point) that contains the current pid. If it is found, use the basename of that 'cgroup.procs' file as the cgroup path. 2.b) If there is no such 'cgroup.procs' file (which seems to be the case here), leave the cgroup path unset and/or disable the controller for future lookup. Note that case 2) above seems to be exceptionally rare cases. Otherwise we'd have noticed it earlier as this code has been around for some JDK releases. The scanning part is quite expensive, but it should be correct to use in those cases. What do you think?
24-05-2022

A pull request was submitted for review. URL: https://git.openjdk.java.net/jdk/pull/8629 Date: 2022-05-10 12:29:10 +0000
10-05-2022

Relevant hotspot code is: /* * Set directory to subsystem specific files based * on the contents of the mountinfo and cgroup files. */ void CgroupV1Controller::set_subsystem_path(char *cgroup_path) { char buf[MAXPATHLEN+1]; if (_root != NULL && cgroup_path != NULL) { if (strcmp(_root, "/") == 0) { int buflen; strncpy(buf, _mount_point, MAXPATHLEN); buf[MAXPATHLEN-1] = '\0'; if (strcmp(cgroup_path,"/") != 0) { buflen = strlen(buf); if ((buflen + strlen(cgroup_path)) > (MAXPATHLEN-1)) { return; } strncat(buf, cgroup_path, MAXPATHLEN-buflen); buf[MAXPATHLEN-1] = '\0'; } _path = os::strdup(buf); } else { if (strcmp(_root, cgroup_path) == 0) { strncpy(buf, _mount_point, MAXPATHLEN); buf[MAXPATHLEN-1] = '\0'; _path = os::strdup(buf); } else { char *p = strstr(cgroup_path, _root); if (p != NULL && p == _root) { if (strlen(cgroup_path) > strlen(_root)) { int buflen; strncpy(buf, _mount_point, MAXPATHLEN); buf[MAXPATHLEN-1] = '\0'; buflen = strlen(buf); if ((buflen + strlen(cgroup_path) - strlen(_root)) > (MAXPATHLEN-1)) { return; } strncat(buf, cgroup_path + strlen(_root), MAXPATHLEN-buflen); buf[MAXPATHLEN-1] = '\0'; _path = os::strdup(buf); } } } } } } It has the same problem. _path will be a NULL pointer in that case, but it'll be handled in src/hotspot/os/linux/cgroupSubsystem_linux.hpp template 'subsystem_file_line_contents' here: if (c->subsystem_path() == NULL) { log_debug(os, container)("subsystem_file_line_contents: subsystem path is NULL"); return OSCONTAINER_ERROR; } So it's just returning unlimited.
06-05-2022

It needs to be investigated what happens on the hotspot side in this case too.
05-05-2022