JDK-8302744 : Refactor Hotspot container detection code
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 21
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • OS: linux
  • CPU: generic
  • Submitted: 2023-02-17
  • Updated: 2024-06-11
  • Resolved: 2024-05-29
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 23
23 b25Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Description
Currently most of the metrics retrieval logic in Hotspot are implemented around the GET_CONTAINER_INFO macros and corresponding subsystem_file_line_contents() template.

There are a couple of issues with this approach:
- The macros aren't portable, as they use short-return for error cases
- The macros only support string literals for some of its arguments. This prevents usage of the macros when the input isn't a string literal.
- This combination of macro + template make the code harder to read and understand what's going on.

On top of that they prevent further refactoring. For example, introducing generic helper functions for certain sanity checking like was done with   JDK-8292083 is harder.
Comments
I created https://bugs.openjdk.org/browse/JDK-8333967 for the containers/cgroup/PlainRead.java test issue.
11-06-2024

Yes, it might be related to the change. We changed logging the failure from "CPU Quota is: -2" to "CPU Quota failed: -2" when cpu.max is not being found at that path. Should be a simple test fix.
10-06-2024

On Linux x86_64 and Linux aarch64 we see after this change (starting 30th May) the test containers/cgroup/PlainRead.java failing. Example stderr for test failure, do you think it is related ? Before the test was running fine. stderr ---- snip --------------- Thu May 30 02:29:44 CEST 2024 stdout: [[0.000s][trace][os,container] OSContainer::init: Initializing Container Support [0.001s][debug][os,container] Detected optional pids controller entry in /proc/cgroups [0.001s][debug][os,container] Detected cgroups v2 unified hierarchy [0.001s][trace][os,container] Path to /cpu.max is /sys/fs/cgroup/user.slice/user-3670.slice/session-1002.scope/cpu.max [0.001s][debug][os,container] Open of file /sys/fs/cgroup/user.slice/user-3670.slice/session-1002.scope/cpu.max failed, No such file or directory [0.001s][trace][os,container] Path to /cpu.max is /sys/fs/cgroup/user.slice/user-3670.slice/session-1002.scope/cpu.max [0.001s][debug][os,container] Open of file /sys/fs/cgroup/user.slice/user-3670.slice/session-1002.scope/cpu.max failed, No such file or directory [0.001s][trace][os,container] CPU Period failed: -2 [0.001s][trace][os,container] OSContainer::active_processor_count: 16 [0.001s][trace][os,container] CgroupSubsystem::active_processor_count (cached): 16 [0.001s][trace][os,container] total physical memory: 33631973376 [0.001s][trace][os,container] Path to /memory.max is /sys/fs/cgroup/user.slice/user-3670.slice/session-1002.scope/memory.max [0.001s][trace][os,container] Memory Limit is: -1 [0.001s][debug][os,container] container memory limit unlimited: -1, using host value 33631973376 [0.003s][trace][os,container] CgroupSubsystem::active_processor_count (cached): 16 [0.112s][trace][os,container] total physical memory: 33631973376 [0.112s][trace][os,container] Path to /memory.max is /sys/fs/cgroup/user.slice/user-3670.slice/session-1002.scope/memory.max [0.112s][trace][os,container] Memory Limit is: -1 [0.112s][debug][os,container] container memory limit unlimited: -1, using host value 33631973376 [0.203s][trace][os,container] Path to /cpu.max is /sys/fs/cgroup/user.slice/user-3670.slice/session-1002.scope/cpu.max [0.203s][debug][os,container] Open of file /sys/fs/cgroup/user.slice/user-3670.slice/session-1002.scope/cpu.max failed, No such file or directory [0.203s][trace][os,container] Path to /cpu.max is /sys/fs/cgroup/user.slice/user-3670.slice/session-1002.scope/cpu.max [0.203s][debug][os,container] Open of file /sys/fs/cgroup/user.slice/user-3670.slice/session-1002.scope/cpu.max failed, No such file or directory [0.203s][trace][os,container] CPU Period failed: -2 [0.203s][trace][os,container] OSContainer::active_processor_count: 16 [0.314s][trace][os,container] total physical memory: 33631973376 [0.314s][trace][os,container] Path to /memory.max is /sys/fs/cgroup/user.slice/user-3670.slice/session-1002.scope/memory.max [0.314s][trace][os,container] Memory Limit is: -1 [0.314s][debug][os,container] container memory limit unlimited: -1, using host value 33631973376 ]; stderr: [openjdk version "23-internal" 2024-05-30 OpenJDK Runtime Environment (fastdebug build 23-internal-adhoc.jenkinsi.jdk) OpenJDK 64-Bit Server VM (fastdebug build 23-internal-adhoc.jenkinsi.jdk, mixed mode, sharing) ] exitValue = 0 java.lang.RuntimeException: '^.*CPU Quota is: *(\d+|-1|-2|Unlimited).*$' missing from stdout/stderr at jdk.test.lib.process.OutputAnalyzer.shouldMatch(OutputAnalyzer.java:371) at PlainRead.match(PlainRead.java:43) at PlainRead.isContainer(PlainRead.java:57) at PlainRead.main(PlainRead.java:75) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) at java.base/java.lang.reflect.Method.invoke(Method.java:580) at com.sun.javatest.regtest.agent.MainWrapper$MainTask.run(MainWrapper.java:138) at java.base/java.lang.Thread.run(Thread.java:1575) JavaTest Message: Test threw exception: java.lang.RuntimeException: '^.*CPU Quota is: *(\d+|-1|-2|Unlimited).*$' missing from stdout/stderr JavaTest Message: shutting down test
10-06-2024

[~mbaesken] Please create a bug for this test issue and I'll try to take a look. Edit: I see JDK-8333326 got filed for this.
31-05-2024

after the change we get now build errors on Linux Alpine ; seems there is something special about basename on musl see also https://bugzilla.mozilla.org/show_bug.cgi?id=1041962 . /priv/jenkins/client-home/workspace/openjdk-jdk-dev-linux_alpine_x86_64-opt/jdk/test/hotspot/gtest/runtime/test_cgroupSubsystem_linux.cpp: In member function 'virtual void cgroupTest_read_numerical_key_value_success_cases_Test::TestBody()': /priv/jenkins/client-home/workspace/openjdk-jdk-dev-linux_alpine_x86_64-opt/jdk/test/hotspot/gtest/runtime/test_cgroupSubsystem_linux.cpp:139:19: error: 'basename' was not declared in this scope; did you mean 'rename'? 139 | const char* b = basename(test_file); | ^~~~~~~~ | rename /priv/jenkins/client-home/workspace/openjdk-jdk-dev-linux_alpine_x86_64-opt/jdk/test/hotspot/gtest/runtime/test_cgroupSubsystem_linux.cpp: In member function 'virtual void cgroupTest_read_number_tests_Test::TestBody()': /priv/jenkins/client-home/workspace/openjdk-jdk-dev-linux_alpine_x86_64-opt/jdk/test/hotspot/gtest/runtime/test_cgroupSubsystem_linux.cpp:239:19: error: 'basename' was not declared in this scope; did you mean 'rename'? 239 | const char* b = basename(test_file); | ^~~~~~~~ | rename ... (rest of output omitted)
30-05-2024

Changeset: 3d4eb159 Author: Severin Gehwolf <sgehwolf@openjdk.org> Date: 2024-05-29 08:46:27 +0000 URL: https://git.openjdk.org/jdk/commit/3d4eb159e6d597f37081faf21b7e3f0f1af299e5
29-05-2024

A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/19060 Date: 2024-05-02 12:36:11 +0000
03-05-2024