JDK-8257746 : Regression introduced with JDK-8250984 - memory might be null in some machines
Type:Bug
Component:hotspot
Sub-Component:runtime
Affected Version:8,11
Priority:P3
Status:Resolved
Resolution:Fixed
Submitted:2020-12-04
Updated:2025-01-16
Resolved:2021-01-28
The Version table provides details related to the release that this issue/RFE will be addressed.
Unresolved : Release in which this issue/RFE will be addressed. Resolved: Release in which this issue/RFE has been resolved. Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.
Fix request (15u)
Requesting backport to 15u as follow-up fix for JDK-8250984 that is already included to 15u.
The patch applies cleanly.
Tested with tier1 and container tests.
13-05-2021
Fix request (13u)
Requesting backport to 13u for parity with 11u.
The patch doesn't apply cleanly since 13u doesn't have cgroups v2 support (JDK-8231111), so it reapplied manually to similar places in cgroupv1/Metrics.java.
Tested with tier1 and container tests.
RFR: http://mail.openjdk.java.net/pipermail/jdk-updates-dev/2021-March/005150.html
01-03-2021
Fix Request (16u)
Backporting this small low-risk fix prevents this bug from occurring in JDK-16u. The original bug fix patch applied cleanly. After applying the patch to a JDK-16u repo, the fix was regression tested by running Mach5 tiers 1 and 2 on Linux, Windows, and Mac OS, and running tiers 3-5 on Linux x64.
01-03-2021
Fix Request (OpenJDK 11u):
Please approve backporting this to OpenJDK 11u. Patch doesn't apply cleanly due to the cgroups v2 patch in JDK 17. Rewritten for JDK 11u and reviewed by Matthias Baesken. Risk should be low as it's only adding null checks before actually using the controller. Matthias tested the patch and confirmed it fixes the regression on an affected system. I've also tested using the container tests (cgroups v1) which pass too.
webrev: https://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8257746/jdk11/01/webrev/
RFR: https://mail.openjdk.java.net/pipermail/jdk-updates-dev/2021-February/005110.html
25-02-2021
Thanks Matthias! I'll propose for review then.
RFR: https://mail.openjdk.java.net/pipermail/jdk-updates-dev/2021-February/005110.html
24-02-2021
hi Severin, I did a quick check and the proposed change solves the issue we noticed with openjdk11 on the SLES11 linux x86_64 machine.
24-02-2021
Candidate webrev for 11u:
https://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8257746/jdk11/01/webrev/
24-02-2021
Ah, OK. Thanks! It's jtreg itself which fails not the test. It also confirms that it's OperatingSystemMXBean which triggers that code path. Work-around is to use -XX:-UseContainerSupport on these systems.
24-02-2021
Interesting. Would you have the full stack trace of the jdk/jdk/jfr/event/sampling/TestNative.java failure?
Yes, the 11u backport would need some rewrite as there is no cgroups v2 support there. I can do it, but since I've no real way of reproducing/testing it I'd rely on somebody else for confirming the fix.
24-02-2021
My colleague noticed the error on a SLES11 linux x86_64 box, when running the test jdk/jdk/jfr/event/sampling/TestNative.java .
SLES11 is rather old, for some reason /proc/self/cgroup misses the memory entry; that entry is present on higher SLES Linux versions I checked (e.g. SLES12).
I would like to have the fix in jdk11 as well, because the issue is present there.
The jdk17 change does not apply directly to jdk11 (because the change is in separate files when comparing 11 and 17) . A separate backport request tbd.
FWIW, I have JDK-8254001 in the pipeline and after that is in add a regression test for this issue. See my wip patch here: https://bugs.openjdk.java.net/browse/JDK-8254001?focusedCommentId=14399868&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14399868
24-02-2021
[~mbaesken] Do you know how to reproduce? Could you provide us with the following info: a) What code is triggering the issue? I'm guessing OperatingSystemMXBean is involved, but would like to confirm. b) How do relevant cgroups file look like on the affected system: /proc/cgroups, /proc/self/cgroup and /proc/self/mountinfo. Thanks!
24-02-2021
Looks like we are facing the issue too in OpenJDK11 on an older SLES11 based machine (where memory is NULL).
> Error: Unexpected exception occurred! java.lang.NullPointerException
> java.lang.NullPointerException
> at
> java.base/jdk.internal.platform.cgroupv1.Metrics.getMemoryAndSwapLimit(Metrics.java:484)
24-02-2021
full stack reported when running the jtreg test with OpenJDK11 by my colleague was :
Error: Unexpected exception occurred! java.lang.NullPointerException
java.lang.NullPointerException
at java.base/jdk.internal.platform.cgroupv1.Metrics.getMemoryAndSwapLimit(Metrics.java:484)
at jdk.management/com.sun.management.internal.OperatingSystemImpl.getTotalSwapSpaceSize(OperatingSystemImpl.java:57)
at com.sun.javatest.regtest.config.OS.<init>(OS.java:160)
at com.sun.javatest.regtest.config.OS.current(OS.java:59)
at com.sun.javatest.regtest.config.RegressionContext.<init>(RegressionContext.java:77)
at com.sun.javatest.regtest.config.RegressionContext.getDefault(RegressionContext.java:52)
at com.sun.javatest.regtest.config.RegressionTestFinder.<init>(RegressionTestFinder.java:93)
at com.sun.javatest.regtest.config.RegressionTestSuite.createTestFinder(RegressionTestSuite.java:100)
at com.sun.javatest.regtest.config.RegressionTestSuite.<init>(RegressionTestSuite.java:82)
at com.sun.javatest.regtest.config.RegressionTestSuite.open(RegressionTestSuite.java:65)
at com.sun.javatest.regtest.config.TestManager.getTestSuites(TestManager.java:165)
at com.sun.javatest.regtest.tool.Tool.run(Tool.java:1127)
at com.sun.javatest.regtest.tool.Tool.run(Tool.java:1078)
at com.sun.javatest.regtest.tool.Tool.main(Tool.java:147)
at com.sun.javatest.regtest.Main.main(Main.java:58)
24-02-2021
JDK-8253797 happened since and has JDK-8250984 equivalent code for cgroup v2 branch. Has it been investigated cgroups v2 won't face the same issue?
After the backport of JDK-8250984, there are places where memory.isSwapEnabled() is called. For example:
public long getMemoryAndSwapFailCount() {
if (!memory.isSwapEnabled()) {
return getMemoryFailCount();
}
return SubSystem.getLongValue(memory, "memory.memsw.failcnt");
}
But memory could be Null on some machines that have cgroup entries for CPU but not for memory. This would cause a NullPointerException when memory is accessed.