JDK-8293472 : Incorrect container resource limit detection if manual cgroup fs mounts present
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 11.0.16,17.0.4,20
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • OS: linux
  • CPU: generic
  • Submitted: 2022-09-07
  • Updated: 2023-01-09
  • Resolved: 2022-09-15
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11 JDK 17 JDK 20 Other
11.0.18Fixed 17.0.6Fixed 20 b16Fixed openjdk8u372Fixed
Related Reports
Duplicate :  
Relates :  
Description
On some systems where there are multiple cgroup fs mount entries in /proc/self/mountinfo the detected resource limits might be wrong as the path to the cgroup interface files might be wrong.

The symptom on cg1 with a debug vm is similar to JDK-8253435. It will assert:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/data/openjdk/jdk/src/hotspot/os/linux/cgroupSubsystem_linux.cpp:335), pid=578, tid=583
#  assert(cg_infos[3]._mount_path == __null) failed: stomping of _mount_path
#
# JRE version:  (20.0) (fastdebug build )
# Java VM: OpenJDK 64-Bit Server VM (fastdebug 20-internal-adhoc.root.jdk, mixed mode, sharing, tiered, unknown gc, linux-amd64)
# Core dump will be written. Default location: Core dumps may be processed with "/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h %e" (or dumping to /data1/test/java/2022-09-06-21-58-28/core.578)
#
#

---------------  S U M M A R Y ------------

Command Line: Test

Host: VM-235-31-centos, AMD EPYC 7K62 48-Core Processor, 16 cores, 31G, Ubuntu 20.04.4 LTS
Time: Wed Sep  7 15:06:04 2022 CST elapsed time: 0.002658 seconds (0d 0h 0m 0s)

---------------  T H R E A D  ---------------

Current thread is native thread

Stack: [0x00007ffff569b000,0x00007ffff579c000],  sp=0x00007ffff5794290,  free space=996k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x19cded2]  VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long)+0x1a2  (cgroupSubsystem_linux.cpp:335)
V  [libjvm.so+0x19ced8f]  VMError::report_and_die(Thread*, void*, char const*, int, char const*, char const*, __va_list_tag*)+0x2f  (vmError.cpp:1466)
V  [libjvm.so+0xac790b]  report_vm_error(char const*, int, char const*, char const*, ...)+0x11b  (debug.cpp:284)
V  [libjvm.so+0x8d465b]  CgroupSubsystemFactory::determine_type(CgroupInfo*, char const*, char const*, char const*, unsigned char*)+0xabb  (cgroupSubsystem_linux.cpp:335)
V  [libjvm.so+0x8d5236]  CgroupSubsystemFactory::create()+0xe6  (cgroupSubsystem_linux.cpp:53)
V  [libjvm.so+0x14f7011]  OSContainer::init()+0x71  (osContainer_linux.cpp:57)
V  [libjvm.so+0x64eacc]  Arguments::parse_vm_init_args(JavaVMInitArgs const*, JavaVMInitArgs const*, JavaVMInitArgs const*, JavaVMInitArgs const*)+0x16c  (os.hpp:243)
V  [libjvm.so+0x64ef5d]  Arguments::parse(JavaVMInitArgs const*)+0x47d  (arguments.cpp:4014)
V  [libjvm.so+0x1913b5a]  Threads::create_vm(JavaVMInitArgs*, bool*)+0x9a  (threads.cpp:453)
V  [libjvm.so+0x1016539]  JNI_CreateJavaVM+0x99  (jni.cpp:3628)
C  [libjli.so+0x40fa]  JavaMain+0x8a  (java.c:1457)
C  [libjli.so+0x7859]  ThreadJavaMain+0x9  (java_md.c:650)

On cg2 it might continue but uses an incorrect container limit value. See the comment below.

On a cg1 system an additional symptom is a warning on 'java -version':

[0.000s][warning][os,container] Duplicate cpuset controllers detected. Picking /sys/fs/cgroup/cpuset, skipping /cgroup-in/cpuset.
Comments
Withdrawn jdk8u-fix-request label pending PR review.
20-12-2022

A pull request was submitted for review. URL: https://git.openjdk.org/jdk8u-dev/pull/216 Date: 2022-12-20 12:14:12 +0000
20-12-2022

Fix Request (8u) Please consider this for 8u cgroups v2 support. It's not clean: log_debug→tty->print_cr and Files.writeString → Files.write() needed, as well as adjustments for the lack of 8266490 (PID controller support) in 8u. Thanks!
20-12-2022

Fix Request (OpenJDK 11u): Please approve getting this backported to 11u. On some systems this leads to incorrectly detected container limits (and also fixes issues with tooling that don't expect the warning log output). Patch applies cleanly. Container tests pass for me on cg1 and cg2. Risk should be low because it ignores cgroup mounts not in hierarchy /sys/fs/cgroup. Linux only.
17-11-2022

A pull request was submitted for review. URL: https://git.openjdk.org/jdk11u-dev/pull/1525 Date: 2022-11-17 14:02:06 +0000
17-11-2022

Fix Request (OpenJDK 17u): Please approve getting this backported to 17u. On some systems this leads to incorrectly detected container limits (and also fixes issues with tooling that don't expect the warning log output). Patch applies cleanly. Container tests pass for me on cg1 and cg2. Risk should be low because it ignores cgroup mounts not in hierarchy /sys/fs/cgroup.
13-10-2022

A pull request was submitted for review. URL: https://git.openjdk.org/jdk17u-dev/pull/783 Date: 2022-10-12 12:32:58 +0000
12-10-2022

Changeset: 8f3bbe95 Author: casparcwang <casparcwang@tencent.com> Committer: Severin Gehwolf <sgehwolf@openjdk.org> Date: 2022-09-15 08:47:05 +0000 URL: https://git.openjdk.org/jdk/commit/8f3bbe950fb5a3d9f6cae122209df01df0f342f0
15-09-2022

The similar issue is present on cgroups v2, but the symptom is different: $ sudo podman run --rm -ti --memory=300M --memory-swap=300M -v /sys/fs/cgroup:/cgroup-in:ro -v $(pwd)/jdk20-jdk:/opt/jdk:z fedora:36 [root@302ab86dcff8 /]# /opt/jdk/bin/java -Xlog:os+container=trace -version [0.000s][trace][os,container] OSContainer::init: Initializing Container Support [0.001s][debug][os,container] Detected optional pids controller entry in /proc/cgroups [0.001s][debug][os,container] Detected cgroups v2 unified hierarchy [0.001s][trace][os,container] Path to /cpu.max is /cgroup-in/cpu.max [0.001s][debug][os,container] Open of file /cgroup-in/cpu.max failed, No such file or directory [0.001s][trace][os,container] CPU Quota is: -2 [0.001s][trace][os,container] Path to /cpu.max is /cgroup-in/cpu.max [0.001s][debug][os,container] Open of file /cgroup-in/cpu.max failed, No such file or directory [0.001s][trace][os,container] CPU Period is: -2 [0.001s][trace][os,container] OSContainer::active_processor_count: 4 [0.001s][trace][os,container] CgroupSubsystem::active_processor_count (cached): 4 [0.001s][trace][os,container] total physical memory: 5031587840 [0.001s][trace][os,container] Path to /memory.max is /cgroup-in/memory.max [0.001s][debug][os,container] Open of file /cgroup-in/memory.max failed, No such file or directory [0.001s][trace][os,container] Memory Limit is: -2 [0.001s][debug][os,container] container memory limit failed: -2, using host value 5031587840 [0.003s][trace][os,container] CgroupSubsystem::active_processor_count (cached): 4 [0.049s][trace][os,container] Path to /cpu.max is /cgroup-in/cpu.max [0.049s][debug][os,container] Open of file /cgroup-in/cpu.max failed, No such file or directory [0.049s][trace][os,container] CPU Quota is: -2 [0.049s][trace][os,container] Path to /cpu.max is /cgroup-in/cpu.max [0.049s][debug][os,container] Open of file /cgroup-in/cpu.max failed, No such file or directory [0.049s][trace][os,container] CPU Period is: -2 [0.049s][trace][os,container] OSContainer::active_processor_count: 4 openjdk version "20-internal" 2023-03-21 OpenJDK Runtime Environment (fastdebug build 20-internal-adhoc.sgehwolf.jdk-jdk) OpenJDK 64-Bit Server VM (fastdebug build 20-internal-adhoc.sgehwolf.jdk-jdk, mixed mode, sharing) [0.067s][trace][os,container] total physical memory: 5031587840 [0.067s][trace][os,container] Path to /memory.max is /cgroup-in/memory.max [0.067s][debug][os,container] Open of file /cgroup-in/memory.max failed, No such file or directory [0.067s][trace][os,container] Memory Limit is: -2 [0.067s][debug][os,container] container memory limit failed: -2, using host value 5031587840 # cat /sys/fs/cgroup/memory.max 314572800 It fails to detect the correct memory limit since it's using the /cgroup-in path (should use /sys/fs/cgroup). On cg1 it asserts, on cg2 it continues and assumes a wrong limit. The reason that there is no warning is the fact that as soon as any `cgroup2` entry is being found in /proc/self/mountinfo, setting the controller mount paths stops (though it might have picked the wrong one). This can be reproduced on cg2 and an additional cgroup fs mount with test test/hotspot/jtreg/containers/docker/TestMemoryAwareness.java
13-09-2022

A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/10193 Date: 2022-09-07 08:35:47 +0000
07-09-2022