JDK-8086045 : Improve the java.lang.invoke first initialization costs
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.lang.invoke
  • Affected Version: 9
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2015-06-09
  • Updated: 2020-08-15
  • Resolved: 2016-08-31
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 9
9Fixed
Related Reports
Blocks :  
Duplicate :  
Relates :  
Relates :  
Relates :  
Relates :  
Sub Tasks
JDK-8163369 :  
JDK-8163370 :  
JDK-8163371 :  
JDK-8163878 :  
JDK-8164044 :  
JDK-8164451 :  
JDK-8164483 :  
JDK-8164525 :  
JDK-8164569 :  
JDK-8164739 :  
JDK-8164858 :  
Description
It is known from Indify String Concat, Lambdas, and probably Jigsaw that initialization costs for java.lang.invoke infra take a significant time on first access to the infrastructure.

See a simple experiment here:
  http://cr.openjdk.java.net/~shade/8086045/indy-startup.txt

The goal for this task is to investigate if we can improve this, or should we conclude the first initialization overheads can be justifiably ignored.
Comments
By adding capabilities to jlink to pre-generate common forms - tied together with minimal build-time profiling to ensure the default configuration evolves with common use - as well as a series of cleanup and micro-optimizations to the runtime internals, the cost of initializing java.lang.invoke using the attached test is down to less than 3 ms on my machine, executing a mere 64K bytecode (compared to 690K in 8u), and loading only 21 classes statically. There's more work to be done to improve on more advanced use cases, but I think we're done here.
31-08-2016

With the patches for JDK-8164483 and JDK-8164569 applied (currently undergoing testing) I've gotten down to having no class generation during bootstrap for the HelloMH test linked in the description of this bug. This brings the total runtime difference between HelloVirtual and HelloMH down to ~5ms, which can be explained by HelloMH loading an additional 35 classes and doing a fair amount of setup ceremony. I think after simplifying how to determine and configure the set of classes to generate (JDK-8163371) that the work here is done. JDK-8163372 could be moved to a separate RFE.
22-08-2016

Prototyped code to generate DirectMethodHandles ahead of time, getting further improvements on these tests: http://cr.openjdk.java.net/~redestad/scratch/dmh_gen.txt Together with previous improvements the total overhead of running Shipilev's HelloMH test compared to HelloVirtual is down to 12ms. Only 4 MethodHandles are still generated at runtime (getObjectField, indentity_L, zero_L and invokeExact_MT), which could possibly be dealt with separately. I intend to supply the patch to generate DMH's ahead-of-time in jlink with this RFE ID.
05-08-2016

Experiment to generate BMH$Species classes with jlink plugin: http://cr.openjdk.java.net/~redestad/scratch/bmh_species_gen.txt
22-03-2016

Startup applications can observe a hit of ~50-100ms from using indy compared to other alternatives, depending on the extent of the usage. Most is one-time setup costs that might be possible to optimize at link-time (as in with a jlink plugin) in the JDK 9 timeframe, but when comparing the cost of j.l.invoke infra initialization in isolation we're already in a better place than JDK 8, mostly due to allowing more of the infrastructure to be initialized lazily. New features such as ISC make the initialization happen earlier or at more inopportune times, though. Coincidentally I was looking at related things and noticed that desugaring the initializer in StringConcatFactory cuts back on some of the parts of indy initialized when doing string concatenation, see JDK-8152074. I'll assess how much this helps JDK-8151887 and update that bug.
17-03-2016

How are our startup benchmarks affected by this? What is our acceptance criteria for startup for JDK 9? We see extreme slowdowns in some situations - eg JDK-8151887 - and given a lot of testing is done in fastdebug this can't just be dismissed.
17-03-2016

Aleksey, thank you for the data! So, initializing the java.lang.invoke infrastructure causes more methods to be executed and also more methods to be compiled that can contribute to the startup overhead of the VM. So I'm wondering: Is this a problem for us? We have to execute more methods, so we loose time with that anyway. We can try to make the execution fast, e.g., by AOT compiling the whole or part of the infrastructure. I'm wondering what fraction of the java.lang.invoke infrastructure can an AOT compiler successfully deal with (given that some part of the infrastructure is generate at runtime, if I understand well how the infrastructure works).
26-02-2016

I agree, Vladimir. JDK-8148940 is a test case that shows the performance effect of bootstrapping/more compilations in an extreme case. The performance effect is much less in more realistic cases (but there is an effect, nevertheless, that is why we have this issue).
26-02-2016

I'd say JDK-8148940 is orthogonal to general improvements in j.l.i bootstrap. The real problem is in bad interaction between -Xcomp mode and the test case: the more pressure on JIT compilers is, the longer (non-linearly) it takes to progress on test execution.
26-02-2016

FTR, a simple -XX:+PrintCompilation experiment with a single String concatenation with and without -Xcomp: http://cr.openjdk.java.net/~shade/8150717/compilation-oob.txt http://cr.openjdk.java.net/~shade/8150717/compilation-Xcomp.txt While going through java.lang.invoke infra, of course, causes more compilations, it is unlikely we can or should do anything here. In oob case, we seem to be mostly tripping the already warm methods into hot ones; and in -Xcomp case we just go through many methods that constitute java.lang.invoke infra.
26-02-2016

An example of this problem appearing in a rather extreme way is JDK-8148940. I've filed JDK-8150717 to investigate (in general) the problem behind the costs of bootstrapping. But Aleksey has told me about this issue. So I'll close JDK-8150717 as a duplicate of this issue.
26-02-2016

Assigned to Claes as he is looking into these issues.
01-12-2015

JDK-8142334, JDK-8142487 and a few others help reduce the java.lang.invoke initialization cost by making the laziness more fine-grained, reducing the number of LambdaForms that are created during initialization and avoiding loading and initialization of various helper classes, such as Byte$ByteCache: http://cr.openjdk.java.net/~redestad/scratch/8086045.txt
18-11-2015

I did a couple of more runs where I lowered the stack depth of what I sampled to reduce the overhead. Combining these with the full stack sampling helps show which flows are the largest time consumers. These three (*deep.svg) are all time lines. Overhead - total runtime of code tiered 0.22s int 0.24s int + profile_6deep 0.36s int + profile_10deep 0.40s int + profile_15deep 0.50s int + profile_all 1.15s
06-07-2015

Did a simple HelloWorld application that instantiates a lamba and ran it with -Xint using a modified JVM that prints the timing for each method entry and exit. Using the collected methods I built up stacktrace and use Flamegraph to build the attached visualization. The file attached is lambdatest.svg While small method are certainly going to be impacted performancewise due to -Xint as well as two printouts for each call, this should give an understand on what areas to look at. Note that the graph is not a timeline, it shows just the distribution of stacktraces. To limit the size of the graph methods with the same timestamp for entry and exit have been filter out.
01-07-2015

This file (lambdatest_timeline.svg) show the same data but layed out like a time line instead. It also filters all method with the same start and exit time, but generated with a different script so is a bit uglier and doesn't support zoom.
01-07-2015

Michael Haupt expressed interest in this, assigning.
09-06-2015