JDK-8302264 : Improve dynamic compiler threads creation
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 17,18,19,20,21
  • Priority: P4
  • Status: Open
  • Resolution: Unresolved
  • OS: generic
  • CPU: generic
  • Submitted: 2023-02-10
  • Updated: 2024-02-06
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdUnresolved
Related Reports
Relates :  
Relates :  
Relates :  
Description
I found few issue investigating RSS memory increase when UseDynamicNumberOfCompilerThreads is used (default currently).

One is small idle time for C2 and C1 (100 and 500 ms) which leads to cycles of remove/add compiler threads (example from TestOverloadCompileQueues.java test):

Added initial compiler thread C2 CompilerThread0
Added initial compiler thread C1 CompilerThread0
Added compiler thread C1 CompilerThread1 (available memory: 13537MB, available profiled code cache: 115MB)
Added compiler thread C1 CompilerThread2 (available memory: 13537MB, available profiled code cache: 115MB)
Added compiler thread C1 CompilerThread3 (available memory: 13537MB, available profiled code cache: 115MB)
Added compiler thread C1 CompilerThread4 (available memory: 13537MB, available profiled code cache: 115MB)
Added compiler thread C1 CompilerThread5 (available memory: 13537MB, available profiled code cache: 115MB)
Added compiler thread C2 CompilerThread1 (available memory: 14222MB, available non-profiled code cache: 115MB)
Removing compiler thread C2 CompilerThread1 after 137 ms idle time
Removing compiler thread C1 CompilerThread5 after 515 ms idle time
Removing compiler thread C1 CompilerThread4 after 517 ms idle time
Added compiler thread C2 CompilerThread1 (available memory: 14221MB, available non-profiled code cache: 115MB)
Added compiler thread C2 CompilerThread2 (available memory: 14221MB, available non-profiled code cache: 115MB)
Added compiler thread C2 CompilerThread3 (available memory: 14221MB, available non-profiled code cache: 115MB)
Removing compiler thread C2 CompilerThread3 after 121 ms idle time
Added compiler thread C2 CompilerThread3 (available memory: 14187MB, available non-profiled code cache: 115MB)

The second is that we allocate only 2 compiler threads during VM initialization and add more later during compilation requests as you see in the example above.

Both these lead to memory fragmentation and increase in RSS.

I suggest to increase number of threads creating during VM initialization.
Switch off UseDynamicNumberOfCompilerThreads if CICompilerCount is small (4?).
Keep ratio 1 C1 vs 2 C2 threads when we allocate them (the example shows that we allocate too many C1 threads).
Increase idle times before we remove threads.
We need to experiment to get the best number and time.

Comments
I'm not against tuning these parameters, but I'm still missing a more general solution for glibc malloc arena problems (JDK-8193521). GC threads are also dynamic by default and may have similar issues. In addition, I'd prefer to keep it as dynamic as possible because there are so many different workloads and optimizing for one may have drawbacks for other ones. It sounds natural that at some times one compile queue has much more compile jobs than the other one. So, a fixed ratio doesn't make sense to me.
06-02-2024

(I deleted my comment about RSS usage in jython because I cannot reproduce the experiment anymore, so the difference may be due to other uncontrolled factors).
13-04-2023

[~dholmes] thanks, good to know there is standard terminology for it. I agree with your comment above, we should look at how thread pools are managed to inform this work.
06-04-2023

[~rcastanedalo] what you are describing is basically the notion of "core pool size" for a thread-pool.
05-04-2023

A possible refinement of the dynamic compiler thread creation policy is to set a threshold (in number of compiler threads) below which compiler threads are not removed: diff --git a/src/hotspot/share/compiler/compileBroker.cpp b/src/hotspot/share/compiler/compileBroker.cpp index 7829b196cee..6c6365c6e2d 100644 --- a/src/hotspot/share/compiler/compileBroker.cpp +++ b/src/hotspot/share/compiler/compileBroker.cpp @@ -273,6 +273,8 @@ bool CompileBroker::can_remove(CompilerThread *ct, bool do_it) { // Keep at least 1 compiler thread of each type. if (compiler_count < 2) return false; + if (compiler_count < CompilerThreadRetainThreshold) return false; + // Keep thread alive for at least some time. if (ct->idle_time_millis() < (c1 ? 500 : 100)) return false; This threshold is essentially a soft version of ReduceNumberOfCompilerThreads, where CompilerThreadRetainThreshold=2 is equivalent to -XX:+ReduceNumberOfCompilerThreads and CompilerThreadRetainThreshold=max_jint is equivalent to -XX:-ReduceNumberOfCompilerThreads. The default value could be set to 2 or to some value relative to the initial number of compiler threads. A quick experiment on DaCapo-jython shows that setting e.g. CompilerThreadRetainThreshold to 4 avoids almost 90% of compiler thread creation in this benchmark: - default policy (-XX:CompilerThreadRetainThreshold=2): $ java -XX:CompilerThreadRetainThreshold=2 -jar dacapo-9.12-MR1-bach.jar --thread-count 6 --iterations 10 --size large --no-pre-iteration-gc jython 34 compiler threads added, 32 compiler threads removed - preserve two additional compiler threads (-XX:CompilerThreadRetainThreshold=4): $ java -XX:CompilerThreadRetainThreshold=4 -jar dacapo-9.12-MR1-bach.jar --thread-count 6 --iterations 10 --size large --no-pre-iteration-gc jython 4 compiler threads added, 0 compiler threads removed
05-04-2023

Yes you were right, David. We need to re-evaluate this code.
15-02-2023

https://bugs.openjdk.org/browse/JDK-8198756?focusedCommentId=14167091&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14167091 "The more you push on this in terms of when to start and when to die, the more you enter the realm of thread-pools. Take a look at java.util.concurrent.ThreadPoolExecutor to get an idea about all the various policy decisions regarding when to start new threads and when to allow one to terminate. This is non-trivial to manage well and any "simple" approaches will simply cause further work down the line. "
15-02-2023

I think it will be useful to add a JVM option allowing to configure the idle time for compiler threads.
14-02-2023

We also have weird scaling of CICompilerCount based on CPUs count: > java -XX:+PrintFlagsFinal -XX:ActiveProcessorCount=14 -version | grep CICompilerCount intx CICompilerCount = 4 {product} {ergonomic} > java -XX:+PrintFlagsFinal -XX:ActiveProcessorCount=16 -version | grep CICompiler intx CICompilerCount = 12 {product} {ergonomic}
11-02-2023