JDK-8198756 : Lazy allocation of compiler threads
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 11
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2018-02-27
  • Updated: 2019-11-19
  • Resolved: 2018-04-20
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11
11 b11Fixed
Related Reports
CSR :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Sub Tasks
JDK-8201189 :  
Description
The VM currently starts a large amount of compiler threads on systems with many CPUs regardless of the code cache size. They consume memory even if they are idle almost all of the time.
This doesn't make sense for very small code cache sizes. A simple approach would be to limit the number of threads dependent on the code cache size.

A more complex approach is to allocate the compiler threads on demand depending on the compile queue lengths.

New implementation: -XX:+UseDynamicNumberOfCompilerThreads, active by default
It only starts 1 compiler thread per type (C1 and C2/Graal) at startup and the value determined by CICompilerCount is only used as an upper limit for compiler threads.
Additional threads get started depending on the compile queue sizes and the available memory. C2/Graal threads get started up to a number of half of their compile queue length. The start gets prevented when the operating system reports that the available memory is less than 200MB per C2/Graal compiler thread.
Similarly, C1 threads get started up to a number of one fourth of their compile queue length and the start gets prevented when the available memory is less than 100MB per C1 compiler thread.
Idle compiler threads die after some time (500 ms for C1, 100 ms for C2/Graal) in reverse order of their creation. One thread of each type is kept alive. It's also possible to keep all started ones alive by using the diagnostic flag -XX:-ReduceNumberOfCompilerThreads.
CompilerThread creation and death can be traced by activating the diagnostic flag -XX:+TraceCompilerThreads.

Comments
Testing of webrev.05 passed clean.
19-04-2018

Release notes are also updated.
18-04-2018

I updated CSR to have only one product flag UseDynamicNumberOfCompilerThreads based on latest discussion and changes: http://cr.openjdk.java.net/~mdoerr/8198756_CompilerCount/webrev.05/
18-04-2018

Hi David, >Why do we have TraceCompilerThreads instead of using unified logging? compileBroker doesn't use unified logging. It would be strange to have the TraceCompilerThreads output in the unified logging while the remaining code directly prints to tty. I think this could get improved, but better in a separate RFE as this change is already large enough. >Does InjectCompilerCreationFailure really need to be diagnostic rather than develop? Is it really expected to be enabled in product builds? That's a good question. I implemented it like InjectGCWorkerCreationFailure as I'd like to keep the implementation close to the GC code. Not sure if anybody really wants to use them in product. >do we really need multiple flags here? I like UseDynamicNumberOfCompilerThreads more because it's very close to UseDynamicNumberOfGCThreads. The second flag was proposed by Vladimir as a workaround for the case that problems with remoing threads show up. I think ReduceNumberOfCompilerThreads should only be used for such a case.
06-04-2018

I guess conversion to UL is a future RFE. :) I never looked at the GC thread handling when it was introduced. I wouldn't feel compelled to follow it just because it already exists. There are other areas where we have allowed "policies" to be specified. I'd give it some consideration rather than introducing multiple flags - and also thinking as to how you may want to vary things in the future. As I've mentioned earlier the closer you move to a thread pool like situation the more complexities there are that you may want to control.
06-04-2018

There needs to be a CSR request for this in relation to the proposed new product flags. With my CSR group member hat on ... Why do we have TraceCompilerThreads instead of using unified logging? Does InjectCompilerCreationFailure really need to be diagnostic rather than develop? Is it really expected to be enabled in product builds? UseDynamicNumberOfCompilerThreads / ReduceNumberOfCompilerThreads: do we really need multiple flags here? Would it not be better and more flexible in the future to define one flag that takes an int that represents a thread management policy: - 0: static thread creation at VM init (default) - 1: dynamic startup and terminate when idle - 2: dynamic startup but never terminate - 3: whatever you want to try next ...
06-04-2018

Latest webrev which includes all suggested changes: http://cr.openjdk.java.net/~mdoerr/8198756_CompilerCount/webrev.04/
05-04-2018

Performance results show no significant difference between removing unused compiling threads and keeping them. Based on this I am fine to remove them if needed as in Martin's webrev.
05-04-2018

Regression testing passed for webrev.03 plus additional fix for log files. I ran tier1 and tier2 Hotspot, tier1 JDK, Hotspot test with -Xcomp flag and testing with Graal as JIT compiler (instead of C2). I submitted performance testing for both cases: flag ReduceNumberOfCompilerThreads is on and off.
04-04-2018

http://cr.openjdk.java.net/~mdoerr/8198756_CompilerCount/webrev.03/ plus diff -r 51a762273af1 src/hotspot/share/compiler/compileBroker.cpp --- a/src/hotspot/share/compiler/compileBroker.cpp +++ b/src/hotspot/share/compiler/compileBroker.cpp @@ -1691,10 +1691,14 @@ // Find Compiler number by its threadObj. jobject* compiler_objects = c1 ? _compiler1_objects : _compiler2_objects; + assert(compiler_objects != NULL, "must be initialized at this point"); + CompileLog** logs = c1 ? _compiler1_logs : _compiler2_logs; + assert(logs != NULL, "must be initialized at this point"); + int count = c1 ? _c1_count : _c2_count; oop compiler_obj = ct->threadObj(); int compiler_number = 0; bool found = false; - for (; compiler_number < (c1 ? _c1_count : _c2_count); compiler_number++) { + for (; compiler_number < count; compiler_number++) { if (oopDesc::equals(JNIHandles::resolve_non_null(compiler_objects[compiler_number]), compiler_obj)) { found = true; break; @@ -1703,10 +1707,7 @@ assert(found, "Compiler must exist at this point"); // Determine pointer for this threads log. - assert(_compiler1_logs != NULL, "must be initialized at this point"); - assert(_compiler2_logs != NULL, "must be initialized at this point"); - CompileLog** log_ptr = c1 ? &_compiler1_logs[compiler_number] - : &_compiler2_logs[compiler_number]; + CompileLog** log_ptr = &logs[compiler_number]; // Return old one if it exists. CompileLog* log = *log_ptr;
04-04-2018

http://cr.openjdk.java.net/~mdoerr/8198756_CompilerCount/webrev.02/
29-03-2018

Here my changes to Martin's webrev.01: diff -r 2dec2a948dff src/hotspot/share/compiler/compileBroker.cpp --- a/src/hotspot/share/compiler/compileBroker.cpp +++ b/src/hotspot/share/compiler/compileBroker.cpp @@ -300,6 +300,8 @@ */ static bool can_remove(CompilerThread *ct, bool do_it) { assert(UseDynamicNumberOfCompilerThreads, "or shouldn't be here"); + if (!ReduceNumberOfCompilerThreads) return false; + AbstractCompiler *compiler = ct->compiler(); int compiler_count = compiler->num_compiler_threads(); // Keep at least 1 compiler thread of each type. @@ -906,7 +908,7 @@ if (_c1_compile_queue != NULL) { int old_c1_count = _compilers[0]->num_compiler_threads(); - int new_c1_count = MIN3(_c1_compile_queue->size() / 2, + int new_c1_count = MIN3(_c1_compile_queue->size() / 4, CompilationPolicy::policy()->compiler_count(CompLevel_simple), (int)(available_memory / 100*M)); diff -r 2dec2a948dff src/hotspot/share/runtime/globals.hpp --- a/src/hotspot/share/runtime/globals.hpp +++ b/src/hotspot/share/runtime/globals.hpp @@ -2422,6 +2422,9 @@ product(bool, UseDynamicNumberOfCompilerThreads, true, \ "Dynamically choose the number of parallel compiler threads") \ \ + product(bool, ReduceNumberOfCompilerThreads, false, \ + "Reduce the number of parallel compiler threads when they are not used") \ + \ diagnostic(bool, TraceCompilerThreads, false, \ "Trace creation and removal of compiler threads") \ \ diff -r 2dec2a948dff src/hotspot/share/runtime/thread.cpp --- a/src/hotspot/share/runtime/thread.cpp +++ b/src/hotspot/share/runtime/thread.cpp @@ -3356,7 +3356,7 @@ CompilerThread::~CompilerThread() { // Delete objects which were allocated on heap. delete _counters; - delete _log; + // _log is referenced in global CompileLog::_first chain and used on exit. } bool CompilerThread::can_call_java() const {
27-03-2018

On other hand it could be what we need. If there are no compilation tasks in queue why start a lot of compiler threads.
27-03-2018

An other issue is with -Xbatch (and -Xcomp which also set -Xbatch) - number of compiler threads is not growing because java threads wait completion of compilation request and as result compilation queue has only one request per running java thread. -Xbatch, -Xcomp and -XX:+CompileTheWorld are special JIT compiler testing options. May be we should disable dynamic number of compiler threads in these cases.
27-03-2018

It looks like removing threads have some unexpected issues. May be for first implementation we don't delete them and reduce rate of grow.
27-03-2018

An other problem with removing compiler threads. runtime/whitebox/WBStackSize.java failed because, it looks like, Java thread reuse OS thread created by Compiler thread. The test asks for 512K stack: -Xss512k' but Compiler thread by default use 1Mb stacks. As result test failed: Removing compiler thread C1 CompilerThread3 Removing compiler thread C1 CompilerThread2 Removing compiler thread C1 CompilerThread1 Added compiler thread C1 CompilerThread1 (available memory: 36613MB) Added compiler thread C1 CompilerThread2 (available memory: 36613MB) Added compiler thread C1 CompilerThread3 (available memory: 36613MB) Added compiler thread C1 CompilerThread4 (available memory: 36613MB) Added compiler thread C1 CompilerThread5 (available memory: 36613MB) ThreadStackSize VM option: 524288 Size of protected shadow pages: 90112 Full stack size: 1052672 STDERR: java.lang.RuntimeException: getThreadFullStackSize value [1052672] should be within 90%..110% of the value returned by HotSpotDiagnosticMXBean at WBStackSize.main(WBStackSize.java:95) When I use -XX:CompilerThreadStackSize=512 (value in Kb) test passed.
27-03-2018

Thank you, David. I will look.
27-03-2018

The more you push on this in terms of when to start and when to die, the more you enter the realm of thread-pools. Take a look at java.util.concurrent.ThreadPoolExecutor to get an idea about all the various policy decisions regarding when to start new threads and when to allow one to terminate. This is non-trivial to manage well and any "simple" approaches will simply cause further work down the line.
27-03-2018

I also see that we C1 compiler threads are removed too soon which cause their re-activation again. This may eat memory: $ java -XX:+TraceCompilerThreads -XX:+LogCompilation t Added initial compiler thread C2 CompilerThread0 Added initial compiler thread C1 CompilerThread0 Warning: TraceDependencies results may be inflated by VerifyDependencies Added compiler thread C1 CompilerThread1 (available memory: 37040MB) Added compiler thread C1 CompilerThread2 (available memory: 37033MB) Added compiler thread C1 CompilerThread3 (available memory: 37032MB) Removing compiler thread C1 CompilerThread3 Removing compiler thread C1 CompilerThread2 Removing compiler thread C1 CompilerThread1 Added compiler thread C1 CompilerThread1 (available memory: 37027MB) May be we should take into account for how long these threads are not used.
27-03-2018

We can't delete _log when deleting CompilerThread. Log is referenced globally and used on VM exit to generate final log file when -XX:+LogCompilation is specified. compilercontrol tests passed after I change it: +CompilerThread::~CompilerThread() { + // Delete objects which were allocated on heap. + delete _counters; + // _log is referenced in global CompileLog::_first chain and used on exit. +}
27-03-2018

It crashes on exit with simple -XX:+LogCompilation flag: $ java -XX:+TraceCompilerThreads -XX:+LogCompilation t Added initial compiler thread C2 CompilerThread0 Added initial compiler thread C1 CompilerThread0 Warning: TraceDependencies results may be inflated by VerifyDependencies Added compiler thread C1 CompilerThread1 (available memory: 37040MB) Added compiler thread C1 CompilerThread2 (available memory: 37033MB) Added compiler thread C1 CompilerThread3 (available memory: 37032MB) Removing compiler thread C1 CompilerThread3 Removing compiler thread C1 CompilerThread2 Removing compiler thread C1 CompilerThread1 Added compiler thread C1 CompilerThread1 (available memory: 37027MB) Error: Could not find or load main class t Caused by: java.lang.ClassNotFoundException: t # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x00007f4241e8b0cc, pid=444259, tid=444266 #
26-03-2018

Testing found issues with compiler/compilercontrol tests: FAILED: compiler/compilercontrol/commandfile/LogTest.java FAILED: compiler/compilercontrol/commands/LogTest.java FAILED: compiler/compilercontrol/directives/LogTest.java FAILED: compiler/compilercontrol/jcmd/AddLogTest.java FAILED: compiler/compilercontrol/jcmd/StressAddMultiThreadedTest.java FAILED: compiler/compilercontrol/logcompilation/LogTest.java # V [libjvm.so+0xa940cc] CompileLog::finish_log_on_error(outputStream*, char*, int)+0x5c V [libjvm.so+0xa940cc] CompileLog::finish_log_on_error(outputStream*, char*, int)+0x5c V [libjvm.so+0xa9444c] CompileLog::finish_log(outputStream*)+0x1c V [libjvm.so+0x14a2c78] defaultStream::finish_log()+0x28 V [libjvm.so+0x14a40b0] ostream_exit()+0x120 V [libjvm.so+0x176052d] Threads::destroy_vm()+0x5bd V [libjvm.so+0xf95912] jni_DestroyJavaVM+0x182 C [libjli.so+0x3bbb] JavaMain+0x26b
26-03-2018

First version from Martin: http://cr.openjdk.java.net/~mdoerr/8198756_CompilerCount/webrev.01/
23-03-2018

I want to change this RFE to implement dynamic allocation and activation of JIT compiler threads. My proposal: - Use current code which calculate number of compiler threads to set max number (it is based on number of active processors and CICompilerCount flag value): http://hg.openjdk.java.net/jdk/hs/file/527a563046d0/src/hotspot/share/compiler/compileBroker.cpp#l531 - Start with 1 of each type of compiler thread (1 C1 and 1 C2) or calculate minimum/start count as function of other system parameters: for example, size of CodeCache: http://cr.openjdk.java.net/~mdoerr/8198756_CompilerCount/webrev.00/ - Allocate and active more threads based on number of compilation tasks in queues or/and may be other parameters. See GC code example JDK-8190426. - Possible deactivate/deallocate threads if collected data (number of tasks in queues/sec or/and other) for some period of time shows that they are not needed. It could be application specific - there could be cases with compilations "waves". We may do this part in separate RFE - For Tiered Compilation keep ratio of C1:C2 threads as 1:2 as max ratio for number of C2 threads (otherwise we can request to many C2 threads) or to have a flag to set this parameter http://hg.openjdk.java.net/jdk/hs/file/527a563046d0/src/hotspot/share/runtime/advancedThresholdPolicy.cpp#l78 - Add new product flag UseDynamicNumberOfCompilerThreads to activate this optimization - Add new diagnostic flag InjectCompilerCreationFailure to stress test new code (similar to GC's InjectGCWorkerCreationFailure) Note, the above should work with Graal JIT as well (Graal replaces C2 when UseJVMCICompiler is set).
21-03-2018