JDK-8215555 : TieredCompilation C2 threads can excessively block handshakes
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 11,12
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2018-12-18
  • Updated: 2020-05-12
  • Resolved: 2018-12-20
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11 JDK 12 JDK 13
11.0.8-oracleFixed 12 b26Fixed 13Fixed
Related Reports
Relates :  
Relates :  
Description
ThreadLocalHandshakes introduce a diagnostic flag to abort the VM if a handshake takes more than some set time:

java -XX:+UnlockDiagnosticVMOptions -XX:HandshakeTimeout=20 -version

Setting this to a low value, like 5 or 10 (ms) makes the VM likely to abort on startup on my machine:

$ java -XX:+UnlockDiagnosticVMOptions -XX:HandshakeTimeout=10 -version
#
# A fatal error has been detected by the Java Runtime Environment:
#

This does not happen with -Xint, -XX:-TieredCompilation or even -XX:TieredStopAtLevel=1, all these get through startup just fine even with -XX:HandshakeTimeout=1. 

This indicates there's a handshake blocked by a compiler thread for an excessive amount of time, which consistently happen during startup. This appears to be a major contributor to increased work during startup since the handshake is causing excessive spin.

This turns out to be due to interaction between the NMethod sweeper thread - which is started very early and attempts early handshakes in NMethodSweeper::do_stack_scanning - and the stub code generation in OptoRuntime::generate(ciEnv* env) which compiles code in VM mode for ~10ms. Moving this to native mode resolves the issue, but a simpler and safer fix is to instruct the NMethod Sweeper thread not to do the scan on first run.
Comments
jdk11 backport request I would like to have the patch in OpenJDK11 as well (for better parity with 11.0.8_oracle). The patch applies cleanly.
07-05-2020

Too late for the reviews but very nice find!
03-01-2019

The _should_sweep check was added by JDK-8211129 fix in JDK12 - very recent. The reason was failure after changes for JDK-8132849. Initializing _should_sweep to false should not affect that fix but please run sweeper jtreg tests.
20-12-2018

Can confirm initializing to false instead gets rid of the timeout issue and has about the same effect as -XX:-MethodFlushing on startup (of course starting the thread costs a little, but that disappears in the noise compared to the ~50 million cycles we spent spinning on the handshake attempt).
20-12-2018

Could it be that _should_sweep is initialized to true? volatile bool NMethodSweeper::_should_sweep = true; // Indicates if we should invoke the sweeper
20-12-2018

Yes, it does not make sense to run sweeper during initialization. But it should happened automatically - nothing should trigger sweeping during compiler initialization. That is why I am asking which condition triggers it. I am not agains to add additional condition (initialization is not complete) in possibly_sweep() but I want to know what is wrong with existing conditions.
20-12-2018

Either way it seems reasonable to delay initialization of the NMethod Sweeper thread at least until after C2 has completed initialization (including stub generation).
20-12-2018

Good point - maybe there's a bug there during initialization?
20-12-2018

May be some conditions in possibly_sweep() are not correct during startup?
20-12-2018

It is strange that NMethodSweeper::do_stack_scanning() is caller early. It should be called only when no space left in CodeCache: http://hg.openjdk.java.net/jdk/jdk/file/747d29313e5a/src/hotspot/share/runtime/sweeper.cpp#l447
20-12-2018

While the NMethod sweeper likely has interacted negatively with C2 initialization for some time, this became more pronounced with TLH changes in 12, so I consider this a regression. If we can produce a simple and trivial fix we should get it into 12.
20-12-2018

This appears to be due to interaction between the NMethod sweeper thread - which is started very early and attempts early handshakes in NMethodSweeper::do_stack_scanning - and the stub code generation in OptoRuntime::generate(ciEnv* env) which runs compilation in VM mode for ~10ms (typically compilations run in native mode, so that handshakes will be executed by the VM thread and not block handshakes) Turning off NMethod sweeping using -XX:-MethodFlushing removes this interaction completely, making it possible to run with -XX:HandshakeTimeout=1. This also improve startup on minimal programs by a couple of milliseconds (reduce cycles/instruction count by ~20%). A reasonable fix thus seems to be to delay the start of the NMethod sweeper thread, which should be harmless to everyone since it's about. Should also consider moving more of the stub compilation steps from VM to native so that they don't excessively block attempts to do handshakes during initialization.
20-12-2018