Bug ID: JDK-8279484 Runtime.availableProcessors reports incorrect processor count

JDK 19
19Resolved

-XX:ActiveProcessorCount=xx can be used as a workaround. Another option is to explicitly set Kubernetes CPU limits for the deployments. Google recommends always setting resource limits: https://cloud.google.com/blog/products/containers-kubernetes/kubernetes-best-practices-resource-requests-and-limits <quote> Conclusion While your Kubernetes cluster might work fine without setting resource requests and limits, you will start running into stability issues as your teams and projects grow. Adding requests and limits to your Pods and Namespaces only takes a little extra effort, and can save you from running into many headaches down the line! </quote> When using Kubernetes, this bug happens only when CPU resource limits are not set.
23-02-2022
Same issue as JDK-8281181
10-02-2022
OK, reproduced. The issue is that kubernetes in such an environment spawns the container with an --cpu-shares value of 2. That is, "... or 2" applies here from the Kubernetes docs: The spec.containers[].resources.requests.cpu is converted to its core value, which is potentially fractional, and multiplied by 1024. The greater of this number or 2 is used as the value of the --cpu-shares flag in the docker run command. $ kubectl get pod jdk17-test-df6b46fbc-l2brt -o custom-columns=CPU_LIMIT:.spec.containers[0].resources.limits.cpu,CPU_REQUESTS:.spec.containers[0].resources.requests.cpu CPU_LIMIT CPU_REQUESTS <none> <none> I.e. no limit and requests are being set, thus defaults kick in (which is different than not setting it at all!) When we inspect the container we see 'CpuShares' is being set to 2: root@minikube:/# docker inspect 815ba0288cc7 \| grep -A3 Entrypoint "Entrypoint": [ "bash", "-c", "echo 'System.out.println(\"getParallelism() = \" + ForkJoinPool.commonPool().getParallelism())' \| jshell - -J-showversion 2>&1 \| cat > /tmp/log.txt; sleep 1000000" root@minikube:/# docker inspect 815ba0288cc7 \| grep CpuShares "CpuShares": 2, This results in a raw value of 1 in cgroup v2's 'cpu.weight' file. Re-creating this in a plain container run we see: $ sudo docker run --cpu-shares=2 -ti --rm -v $(pwd)/jdk-17:/opt/jdk:z fedora:35 /opt/jdk/bin/java -Xlog:os+container=trace --version \| grep Raw [0.001s][trace][os,container] Raw value for memory limit is: max [0.002s][trace][os,container] Raw value for CPU quota is: max [0.002s][trace][os,container] Raw value for CPU shares is: 1 [0.022s][trace][os,container] Raw value for memory limit is: max [0.042s][trace][os,container] Raw value for memory limit is: max
07-02-2022
Going by https://kubernetes.io/docs/tasks/configure-pod-container/assign-cpu-resource/#if-you-do-not-specify-a-cpu-limit there could be a default limit in place.
03-02-2022
If this is cgroups v2 specific it's certainly not a regression. Only JDK 15+ have cgroups v2 support.
03-02-2022
FWIW, without specifying a cpu-shares value using docker... $ sudo docker run -ti --rm -v $(pwd)/jdk17:/opt/jdk:z fedora:35 [root@8e2a4a8d35fb /]# /opt/jdk/bin/java -Xlog:os+container=trace --version [0.014s][trace][os,container] OSContainer::init: Initializing Container Support [0.014s][debug][os,container] Detected cgroups v2 unified hierarchy [0.014s][trace][os,container] Path to /memory.max is /sys/fs/cgroup//memory.max [0.014s][trace][os,container] Raw value for memory limit is: max [0.014s][trace][os,container] Memory Limit is: Unlimited [0.014s][trace][os,container] Path to /cpu.max is /sys/fs/cgroup//cpu.max [0.014s][trace][os,container] Raw value for CPU quota is: max [0.014s][trace][os,container] CPU Quota is: -1 [0.014s][trace][os,container] Path to /cpu.max is /sys/fs/cgroup//cpu.max [0.014s][trace][os,container] CPU Period is: 100000 [0.015s][trace][os,container] Path to /cpu.weight is /sys/fs/cgroup//cpu.weight [0.015s][trace][os,container] Raw value for CPU shares is: 100 [0.015s][debug][os,container] CPU Shares is: -1 [0.015s][trace][os,container] OSContainer::active_processor_count: 4 [0.015s][trace][os,container] CgroupSubsystem::active_processor_count (cached): 4 [0.015s][debug][os,container] container memory limit unlimited: -1, using host value [0.015s][debug][os,container] container memory limit unlimited: -1, using host value [0.019s][trace][os,container] CgroupSubsystem::active_processor_count (cached): 4 [0.026s][debug][os,container] container memory limit unlimited: -1, using host value [0.089s][trace][os,container] Path to /cpu.max is /sys/fs/cgroup//cpu.max [0.089s][trace][os,container] Raw value for CPU quota is: max [0.089s][trace][os,container] CPU Quota is: -1 [0.089s][trace][os,container] Path to /cpu.max is /sys/fs/cgroup//cpu.max [0.089s][trace][os,container] CPU Period is: 100000 [0.089s][trace][os,container] Path to /cpu.weight is /sys/fs/cgroup//cpu.weight [0.089s][trace][os,container] Raw value for CPU shares is: 100 [0.089s][debug][os,container] CPU Shares is: -1 [0.089s][trace][os,container] OSContainer::active_processor_count: 4 [0.105s][trace][os,container] Path to /memory.max is /sys/fs/cgroup//memory.max [0.105s][trace][os,container] Raw value for memory limit is: max [0.105s][trace][os,container] Memory Limit is: Unlimited [0.105s][debug][os,container] container memory limit unlimited: -1, using host value openjdk 17.0.2-internal 2022-01-18 OpenJDK Runtime Environment (build 17.0.2-internal+0-adhoc.sgehwolf.jdk17u) OpenJDK 64-Bit Server VM (build 17.0.2-internal+0-adhoc.sgehwolf.jdk17u, mixed mode, sharing) $ nproc 4 I'd have to set it to a value via --cpu-shares in order to get a value of one. Note that the runtime (runc for docker) doesn't set it to the value specified on the command line. It maps it to an appropriate value given a CLI value from cgroups v1. See for example: https://github.com/containers/crun/blob/main/crun.1.md#cpu-controller For me a value of --cpu-shares=1024 on CLI maps to a value of 39 in cpu.weight.
03-02-2022
Am I correct that the user is not specifying any cpu limits, right? Neither of: spec.containers[].resources.limits.cpu spec.containers[].resources.requests.cpu are set? If so this bit is interesting: [0.001s][trace][os,container] Path to /cpu.weight is /sys/fs/cgroup//cpu.weight [0.001s][trace][os,container] Raw value for CPU shares is: 1 The raw value is the cpu.weight read from the cgroups interface file. It's set to 1, why? The default is supposed to be 100 according to [1] and is properly handled in the JDK via [2]. Is minicube/kubernetes setting a cpu shares value? [1] https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#cpu """ cpu.weight A read-write single value file which exists on non-root cgroups. The default is “100”. The weight in the range [1, 10000]. """ [2] https://github.com/iklam/jdk/blame/35172cdaf38d83cd3ed57a5436bf985dde2d802b/src/hotspot/os/linux/cgroupV2Subsystem_linux.cpp#L41
03-02-2022
Is /bin/stress cgroups aware? many tools are not. For example 'top' is not.
03-02-2022
[~iklam] Do I understand correctly the issue is cgroups v2 specific? If so this not being reproducible with JDK 11 is probably caused by missing cgroups v2 support there and it using host values.
03-02-2022
I can't say I understand the logic but this: if (share > -1) { share_count = ceilf((float)share / (float)PER_CPU_SHARES); log_trace(os, container)("CPU Share count based on shares: %d", share_count); } will produce a share_count of 1, which becomes the limit_count and hence the active processor count. EDIT: for an explanation of this calculation see this thread (thanks [~sgehwolf]): http://mail.openjdk.java.net/pipermail/hotspot-dev/2019-January/036093.html
03-02-2022
It seems like the JVM's calculation of OSContainer::active_processor_count is incorrect. The code is here: https://github.com/iklam/jdk/blame/35172cdaf38d83cd3ed57a5436bf985dde2d802b/src/hotspot/os/linux/cgroupV2Subsystem_linux.cpp#L46 I added -J-Xlog:os+container=trace to the jshell comment-line above. When running inside minikube: [0.001s][trace][os,container] OSContainer::init: Initializing Container Support [0.001s][debug][os,container] Detected cgroups v2 unified hierarchy [0.001s][trace][os,container] Path to /memory.max is /sys/fs/cgroup//memory.max [0.001s][trace][os,container] Raw value for memory limit is: max [0.001s][trace][os,container] Memory Limit is: Unlimited [0.001s][trace][os,container] Path to /cpu.max is /sys/fs/cgroup//cpu.max [0.001s][trace][os,container] Raw value for CPU quota is: max [0.001s][trace][os,container] CPU Quota is: -1 [0.001s][trace][os,container] Path to /cpu.max is /sys/fs/cgroup//cpu.max [0.001s][trace][os,container] CPU Period is: 100000 [0.001s][trace][os,container] Path to /cpu.weight is /sys/fs/cgroup//cpu.weight [0.001s][trace][os,container] Raw value for CPU shares is: 1 [0.001s][trace][os,container] Scaled CPU shares value is: 28 [0.001s][debug][os,container] CPU Shares is: 28 [0.001s][trace][os,container] CPU Share count based on shares: 1 [0.001s][trace][os,container] OSContainer::active_processor_count: 1 [0.001s][trace][os,container] CgroupSubsystem::active_processor_count (cached): 1 root@jdk17-test-df6b46fbc-xdb5p:/# cat /sys/fs/cgroup//cpu.weight 1 However, by using the Linux "stress" command, I am able to use up to 32 CPUs (the number of physical threads on my host): root@jdk17-test-df6b46fbc-xdb5p:/# time /bin/stress -c 1 -t 1 stress: info: [252] dispatching hogs: 1 cpu, 0 io, 0 vm, 0 hdd stress: info: [252] successful run completed in 1s real 0m1.003s user 0m0.998s sys 0m0.002s root@jdk17-test-df6b46fbc-xdb5p:/# time /bin/stress -c 10 -t 1 stress: info: [254] dispatching hogs: 10 cpu, 0 io, 0 vm, 0 hdd stress: info: [254] successful run completed in 1s real 0m1.004s user 0m9.836s sys 0m0.003s root@jdk17-test-df6b46fbc-xdb5p:/# time /bin/stress -c 32 -t 1 stress: info: [265] dispatching hogs: 32 cpu, 0 io, 0 vm, 0 hdd stress: info: [265] successful run completed in 1s real 0m1.006s user 0m30.029s sys 0m0.026s root@jdk17-test-df6b46fbc-xdb5p:/# time /bin/stress -c 64 -t 1 stress: info: [298] dispatching hogs: 64 cpu, 0 io, 0 vm, 0 hdd stress: info: [298] successful run completed in 1s real 0m1.010s user 0m29.614s sys 0m0.023s
01-02-2022
OK, I managed to reproduce the reported scenario with minikube. TL/DR; "getParallelism() = 1" is printed when openjdk 17.0.2 is executed inside minikube ======================================================================== [1] Set up minikube using instructions here: https://minikube.sigs.k8s.io/docs/start/ [2] Baseline: outside of minikube, run this command -- it runs a docker container to run jshell which calls ForkJoinPool.commonPool().getParallelism(): $ docker run --rm -it docker.io/library/openjdk:17-jdk-slim bash -c "echo 'System.out.println(\"getParallelism() = \" + ForkJoinPool.commonPool().getParallelism())' \| jshell - -J-showversion" openjdk version "17.0.2" 2022-01-18 OpenJDK Runtime Environment (build 17.0.2+8-86) OpenJDK 64-Bit Server VM (build 17.0.2+8-86, mixed mode, sharing) Feb 01, 2022 7:56:55 AM java.util.prefs.FileSystemPreferences$1 run INFO: Created user preferences directory. getParallelism() = 31 [3] Do the following to deploy the same JDK image inside minikube (inspired from examples in [1]): $ kubectl create deployment jdk17-test --image=docker.io/library/openjdk:17-jdk-slim -- bash -c "echo 'System.out.println(\"getParallelism() = \" + ForkJoinPool.commonPool().getParallelism())' \| jshell - -J-showversion 2>&1 \| cat > /tmp/log.txt; sleep 1000000" deployment.apps/jdk17-test created $ kubectl get po -A \| grep jdk17 default jdk17-test-df6b46fbc-xdb5p 1/1 Running 0 7m39s [4] The minikube "node" is actually a docker container. Get into it first: $ docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 5f440059606e gcr.io/k8s-minikube/kicbase:v0.0.29 "/usr/local/bin/entr\u2026" 4 hours ago Up 2 hours 127.0.0.1:49187->22/tcp, 127.0.0.1:49186->2376/tcp, 127.0.0.1:49185->5000/tcp, 127.0.0.1:49184->8443/tcp, 127.0.0.1:49183->32443/tcp minikube $ docker exec -it 5f440059606e /bin/bash root@minikube:/# whoami root [5] jdk17-test runs as a (nested) container inside 5f440059606e root@minikube:/# docker ps \| grep jdk17 b547e9eb8647 2a9873d16f48 "bash -c 'echo 'Syst…" 11 minutes ago Up 11 minutes k8s_openjdk_jdk17-test-df6b46fbc-xdb5p_default_240b87e7-9380-4d0b-93bc-ab316b5c41d8_0 [6] Now get inside the b547e9eb8647 container and see our log file root@minikube:/# docker exec -it b547e9eb8647 bash root@jdk17-test-df6b46fbc-xdb5p:/# cat /tmp/log.txt openjdk version "17.0.2" 2022-01-18 OpenJDK Runtime Environment (build 17.0.2+8-86) OpenJDK 64-Bit Server VM (build 17.0.2+8-86, mixed mode, sharing) Feb 01, 2022 7:51:55 AM java.util.prefs.FileSystemPreferences$1 run INFO: Created user preferences directory. getParallelism() = 1
01-02-2022
Please ask submitter to also add "-Xlog:os+container=trace" to the JVM command-line and send us the output. That will help provide additional information for diagnosing this problem.
31-01-2022
Not sure if this should be in core-svc or hotspot->runtime, but as [~iklam] is now looking at container support I've assigned to him for initial evaluation.
30-01-2022
The cause of the issue may be that Runtime.availableProcessors on certain K8s environments is reporting an incorrect processor count. The fork join pool derives its default parallelism from the result of Runtime.availableProcessors. Container metrics support was added by JDK-8203357. There might be a regression and/or something specific to the K8s version or environment. The java command line option -XshowSettings:system can be used to display the system or container configuration on linux systems. That could be useful to help diagnose the problem.
27-01-2022
I am glad `-XX:ActiveProcessorCount` works around the problem. I would like to keep this issue open and allow for the submitter to test 17.0.2 available from https://www.oracle.com/java/technologies/downloads/
19-01-2022
Further information from the submitter: 1. The issue is not specific to minikube, it is observed in OKE too – where the fork join parallelism is computed 2 or below. 2. The issue was observed with the following JDK: bash-4.4# java --version openjdk 17.0.1 2021-10-19 OpenJDK Runtime Environment (build 17.0.1+12-39) 3. Using the jvm flag -XX:ActiveProcessorCount=xx does fix the problem and that is the workaround we are using for now.
19-01-2022
I think there is a difference in testing, since the submitter says "Deploy this app in kubernetes infrastructure ...", and it may be reproducible using minikube rather than just docker. Using ` -XX:ActiveProcessorCount=1` could be a way to short cut use of kubernetes. JDK-8274349, which likely fixes the issue, has been backported to 17.0.2. I am unsure when that will be available, but it may be possible to verify with 18-ea (https://jdk.java.net/18/) in the interim.
12-01-2022
Is the issue issue here is that the container in the minikube environment is a uniprocessor and the computed parallelism for the common pool is 0? If so, can the submitter try the JDK 18 EA builds as JDK 18 has a change for JDK-8274349 that will ensure that the parallelism for the common pool is at least 1?
10-01-2022
Paul - can you give this a quick look/triage?
07-01-2022
The observations with docker desktop on Windows 10: docker run --rm -it -v "c:/Users/TONGWAN/:/tmp/external" debian:bookworm bash JDK 11: Passed, ForkJoinPool parallelism : 15 JDK 17 ea+16: Failed, ForkJoinPool parallelism : 1 JDK 17 ea+17: Passed JDK 17 ea+18: Passed JDK 17.0.1: Passed JDK 17.0.2: Passed Looks like a duplicate of JDK-8264572.
07-01-2022
Requested more details of the reproducer from the submitter.
30-12-2021

Duplicate :	JDK-8281181 - Do not use CPU Shares to compute active processor count
Relates :	JDK-8264572 - ForkJoinPool.getCommonPoolParallelism() reports always 1
Relates :	JDK-8274349 - ForkJoinPool.commonPool() does not work with 1 CPU