JDK 24 | JDK 25 |
---|---|
24.0.2Fixed | 25 b13Fixed |
Causes :
|
|
Duplicate :
|
|
Duplicate :
|
|
Relates :
|
|
Relates :
|
|
Relates :
|
The following issue was found in Linux cgroup subsystem implementation. The Cgroup V1 subsustem fails to initialize mounted controllers properly in certain cases, that may lead to controllers left undetected/inactive. We observed the behavior in CloudFoundry deployments, it affects also host systems. In cases where the JVM isn't PID 1, for example started from a shell - and the shell process has been moved from one cgroup path to another - then the JVM might set the subsystem path to null (on cg v1). [0.001s][trace][os,container] OSContainer::init: Initializing Container Support [0.001s][debug][os,container] Detected optional pids controller entry in /proc/cgroups [0.002s][debug][os,container] Detected cgroups hybrid or legacy hierarchy, using cgroups v1 controllers [0.002s][trace][os,container] Adjusting controller path for memory: (null) [0.002s][debug][os,container] read_string: subsystem path is null [0.002s][trace][os,container] Memory Limit failed: -2 [0.002s][debug][os,container] read_string: subsystem path is null [0.002s][trace][os,container] Memory Limit failed: -2 [0.002s][trace][os,container] No lower limit found for memory in hierarchy /sys/fs/cgroup/memory, adjusting to original path /test [0.002s][debug][os,container] OSContainer::init: is_containerized() = true because all controllers are mounted read-only (container case) [0.003s][trace][os,container] Path to /cpu.cfs_quota_us is /sys/fs/cgroup/cpu,cpuacct/cpu.cfs_quota_us [0.003s][trace][os,container] CPU Quota is: -1 [0.003s][trace][os,container] Path to /cpu.cfs_period_us is /sys/fs/cgroup/cpu,cpuacct/cpu.cfs_period_us [0.003s][trace][os,container] CPU Period is: 100000 [0.003s][trace][os,container] OSContainer::active_processor_count: 12 [0.003s][trace][os,container] CgroupSubsystem::active_processor_count (cached): 12 [0.003s][trace][os,container] total physical memory: 67163226112 [0.003s][debug][os,container] read_string: subsystem path is null [0.003s][trace][os,container] Memory Limit failed: -2 [0.005s][trace][os,container] CgroupSubsystem::active_processor_count (cached): 12 [0.021s][trace][os,container] CgroupSubsystem::active_processor_count (cached): 12 openjdk 24-internal 2025-03-18 OpenJDK Runtime Environment (build 24-internal-adhoc.sgehwolf.jdk-jdk) OpenJDK 64-Bit Server VM (build 24-internal-adhoc.sgehwolf.jdk-jdk, mixed mode, sharing) On the Java Metrics side this would be observable by a NPE for example when the application code uses some MXBean code. This test code: public class Test { public static void main(String[] args) { java.lang.management.ManagementFactory.getPlatformMBeanServer(); System.out.println("PASSED."); } } would result in the following NPE on affected systems: Exception in thread "main" java.lang.NullPointerException at java.base/java.util.Objects.requireNonNull(Objects.java:220) at java.base/sun.nio.fs.UnixFileSystem.getPath(UnixFileSystem.java:296) at java.base/java.nio.file.Path.of(Path.java:148) at java.base/java.nio.file.Paths.get(Paths.java:69) at java.base/jdk.internal.platform.CgroupUtil.lambda$readStringValue$0(CgroupUtil.java:67) at java.base/java.security.AccessController.doPrivileged(AccessController.java:571) at java.base/jdk.internal.platform.CgroupUtil.readStringValue(CgroupUtil.java:69) at java.base/jdk.internal.platform.CgroupSubsystemController.getStringValue(CgroupSubsystemController.java:65) at java.base/jdk.internal.platform.CgroupSubsystemController.getLongValue(CgroupSubsystemController.java:124) at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.getLongValue(CgroupV1Subsystem.java:190) at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.getHierarchical(CgroupV1Subsystem.java:160) at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.initSubSystem(CgroupV1Subsystem.java:85) at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.getInstance(CgroupV1Subsystem.java:61) at java.base/jdk.internal.platform.CgroupSubsystemFactory.create(CgroupSubsystemFactory.java:119) at java.base/jdk.internal.platform.CgroupSubsystemFactory.create(CgroupSubsystemFactory.java:89) at java.base/jdk.internal.platform.CgroupMetrics.getInstance(CgroupMetrics.java:198) at java.base/jdk.internal.platform.SystemMetrics.instance(SystemMetrics.java:29) at java.base/jdk.internal.platform.Metrics.systemMetrics(Metrics.java:58) at java.base/jdk.internal.platform.Container.metrics(Container.java:43) at jdk.management/com.sun.management.internal.OperatingSystemImpl.<init>(OperatingSystemImpl.java:175) at jdk.management/com.sun.management.internal.PlatformMBeanProviderImpl.getOperatingSystemMXBean(PlatformMBeanProviderImpl.java:316) at jdk.management/com.sun.management.internal.PlatformMBeanProviderImpl$4.nameToMBeanMap(PlatformMBeanProviderImpl.java:235) at java.management/java.lang.management.ManagementFactory.lambda$getPlatformMBeanServer$0(ManagementFactory.java:489) at java.base/java.util.stream.ReferencePipeline$7$1FlatMap.accept(ReferencePipeline.java:289) at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:197) at java.base/java.util.HashMap$ValueSpliterator.forEachRemaining(HashMap.java:1788) at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:570) at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:560) at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:153) at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:176) at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:265) at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:636) at java.management/java.lang.management.ManagementFactory.getPlatformMBeanServer(ManagementFactory.java:490) at Test.main(Test.java:3) The relevant /proc/self/mountinfo line is --- 2207 2196 0:43 /system.slice/garden.service/garden/good/2f57368b-0eda-4e52-64d8-af5c /sys/fs/cgroup/cpu,cpuacct ro,nosuid,nodev,noexec,relatime master:25 - cgroup cgroup rw,cpu,cpuacct --- /proc/self/cgroup: --- 11:cpu,cpuacct:/system.slice/garden.service/garden/bad/2f57368b-0eda-4e52-64d8-af5c --- Note that by default, on cg v1 systems containers run with cgroupns=host while on cg v2 systems containers run with cgroupns=private by default. The issue has been observed on the default configs in unprivileged containers where the JVM runs. Steps to reproduce on a cgroup v1 system are (using --cgroupns=host for clarity): $ sudo podman run -ti --cgroupns=host --rm --volume=$(pwd)/build/linux-x86_64-server-release/images/jdk:/jdk:z --memory 400m fedora:39 bash -c 'bash' [root@5aee0ffdd70b /]# /jdk/bin/java -Xlog:os+container=trace --version [0.000s][trace][os,container] OSContainer::init: Initializing Container Support [0.000s][debug][os,container] Detected optional pids controller entry in /proc/cgroups [0.001s][debug][os,container] Detected cgroups hybrid or legacy hierarchy, using cgroups v1 controllers [0.001s][debug][os,container] OSContainer::init: is_containerized() = true because all controllers are mounted read-only (container case) [0.001s][trace][os,container] Path to /cpu.cfs_quota_us is /sys/fs/cgroup/cpu,cpuacct/cpu.cfs_quota_us [0.001s][trace][os,container] CPU Quota is: -1 [0.001s][trace][os,container] Path to /cpu.cfs_period_us is /sys/fs/cgroup/cpu,cpuacct/cpu.cfs_period_us [0.001s][trace][os,container] CPU Period is: 100000 [0.001s][trace][os,container] OSContainer::active_processor_count: 12 [0.001s][trace][os,container] CgroupSubsystem::active_processor_count (cached): 12 [0.001s][trace][os,container] total physical memory: 67163238400 [0.001s][trace][os,container] Path to /memory.limit_in_bytes is /sys/fs/cgroup/memory/memory.limit_in_bytes [0.001s][trace][os,container] Memory Limit is: 419430400 [0.002s][trace][os,container] CgroupSubsystem::active_processor_count (cached): 12 [0.014s][trace][os,container] CgroupSubsystem::active_processor_count (cached): 12 openjdk 24-internal 2025-03-18 OpenJDK Runtime Environment (build 24-internal-adhoc.sgehwolf.jdk-jdk) OpenJDK 64-Bit Server VM (build 24-internal-adhoc.sgehwolf.jdk-jdk, mixed mode, sharing) In a separate terminal, find the PID of the shell in the container (10391 in this case) and move it to a different path, /sys/fs/cgroup/memory/test, for example like so: $ sudo mkdir /sys/fs/cgroup/memory/test # echo 10391 > /sys/fs/cgroup/memory/test/cgroup.procs In the shell where the container runs try to run 'java --version' again and observe the null subsystem paths: [root@5aee0ffdd70b /]# /jdk/bin/java -Xlog:os+container=trace --version [0.000s][trace][os,container] OSContainer::init: Initializing Container Support [0.000s][debug][os,container] Detected optional pids controller entry in /proc/cgroups [0.001s][debug][os,container] Detected cgroups hybrid or legacy hierarchy, using cgroups v1 controllers [0.001s][trace][os,container] Adjusting controller path for memory: (null) [0.001s][debug][os,container] read_string: subsystem path is null [0.001s][trace][os,container] Memory Limit failed: -2 [0.001s][debug][os,container] read_string: subsystem path is null [0.001s][trace][os,container] Memory Limit failed: -2 [0.001s][trace][os,container] No lower limit found for memory in hierarchy /sys/fs/cgroup/memory, adjusting to original path /test [0.001s][debug][os,container] OSContainer::init: is_containerized() = true because all controllers are mounted read-only (container case) [0.001s][trace][os,container] Path to /cpu.cfs_quota_us is /sys/fs/cgroup/cpu,cpuacct/cpu.cfs_quota_us [0.001s][trace][os,container] CPU Quota is: -1 [0.001s][trace][os,container] Path to /cpu.cfs_period_us is /sys/fs/cgroup/cpu,cpuacct/cpu.cfs_period_us [0.001s][trace][os,container] CPU Period is: 100000 [0.001s][trace][os,container] OSContainer::active_processor_count: 12 [0.001s][trace][os,container] CgroupSubsystem::active_processor_count (cached): 12 [0.001s][trace][os,container] total physical memory: 67163238400 [0.001s][debug][os,container] read_string: subsystem path is null [0.001s][trace][os,container] Memory Limit failed: -2 [0.003s][trace][os,container] CgroupSubsystem::active_processor_count (cached): 12 [0.020s][trace][os,container] CgroupSubsystem::active_processor_count (cached): 12 openjdk 24-internal 2025-03-18 OpenJDK Runtime Environment (build 24-internal-adhoc.sgehwolf.jdk-jdk) OpenJDK 64-Bit Server VM (build 24-internal-adhoc.sgehwolf.jdk-jdk, mixed mode, sharing) [root@5aee0ffdd70b /]# grep memory /proc/self/mountinfo 1476 1473 0:43 /machine.slice/libpod-5aee0ffdd70b215ba4115f31e5438fa4708be8fd3a11ad75cbc93b0869788dfd.scope/container /sys/fs/cgroup/memory ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,memory [root@5aee0ffdd70b /]# grep memory /proc/self/cgroup 11:memory:/test For the NPE issue, reproducer steps are similar.
|