JDK-8296125 : Add a command line option to set a refresh rate of the OS cached metrics in Linux
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 20
  • Priority: P4
  • Status: Closed
  • Resolution: Won't Fix
  • OS: linux
  • Submitted: 2022-10-31
  • Updated: 2024-04-03
  • Resolved: 2024-04-03
Related Reports
CSR :  
Relates :  
Description
A user has an environment with 50+ containerized java (JDK 17) running simultaneously on Linux. Some performance issue is observed and it is suspected that the root cause is the hardcoded timeout of 20 ms between re-readings of the OS cached metrics (memory limit and active processor count) that is expensive in a containerized environment. This timeout was introduced in JDK-8232207. In relation to this there is a request to have a possibility to set the refresh rate of the OS cached metrics at the launch time.
Comments
Reclosing as "Won't Fix" instead of Resolving as...
03-04-2024

It turned out that using -XX:-UseDynamicNumberOfCompilerThreads "fixed" the issue for the user. See the review.
31-01-2023

From the user: 15GB memory per container with how much memory available on the host? - The host machine has 768GB memory. Do you disable swap? - We disable swap there. It was also suggested to test with -XX:-UseContainerSupport (and possibly setting -XX:ActiveProcessorCount=n and -XX:MaxRAM=m) and it removed the performance issue. Waiting for results of testing the eval build.
14-11-2022

Uploaded for the user the patched jdk17, waiting for results. The further questions were also asked. Thanks!
10-11-2022

Thanks, but we need some more evidence that the caching timeout is indeed the issue. What was the reason that the user deduced the caching timeout is the issue? It would be best to have a reproducer or, alternatively, patch it downstream, test it, and then propose for integration in mainline after it got verified. The system appears to be an over-provisioned system. 50*2 = 100 or more on a 96 CPU system (depending on the actual number of containers). 15GB memory per container with how much memory available on the host? Do they disable swap? In theory, it could be a thrashing system as well.
09-11-2022

From the user: Is the performance issue observed on that single system (where the 50+ containers are running) or individually on those apps running in containers? It happens to both. The single system experiences lockups, but the adjacent containers (apps) in there also experience issues What specs does the system have and what shares are given to each container? The host system has 96 CPUs. We don't use shares. Each container had 2 CPUs Does each container run with certain resource limits or not? If yes, which? We also have a memory limit for containers. It is 15GB per container in this case.
08-11-2022

[~omikhaltcova] What exactly does "environment with 50+ containerized java running simultaneously on Linux" mean? Are they running 50+ containers on a single system? If so, is the performance issue observed on that single system (where the 50+ containers are running) or individually on those apps running in containers? What specs does the system have and what shares are given to each container? Does each container run with certain resource limits or not? If yes, which? It would be good to get some evidence that the re-reading of metrics is indeed the issue.
03-11-2022

Yes, they run those 50+ containers on a single system. Concerning other questions I need to recheck with the user.
03-11-2022

A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/10918 Date: 2022-10-31 13:27:13 +0000
31-10-2022