JDK-8053904 : JVM Warmup: Investigate class installation overhead
  • Type: Task
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 9
  • Priority: P3
  • Status: Closed
  • Resolution: Fixed
  • Submitted: 2014-07-29
  • Updated: 2014-10-10
  • Resolved: 2014-10-10
Related Reports
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Description
When weaving and installing a lot of bytecode in Nashorn, the problem isn't so much the code regeneration as the class installation that is a bottleneck we can't do anything about. Even if we install many classes in bulk inside one doPrivileged operation it is still quite slow and about 15-25% of the compile time overhead.

For example:

cd nashorn/make
ant externals
java -jar dist/nashorn.jar -scripting --log=time ../test/script/basic/compile-octane.js 

will show the class installation time as part of the code generation time as an exit hook.

Is there any trick to minimize class installation overhead or low hanging fruit inside the JVM to pick at regarding class installation overhead
Comments
Since we understood a lot about class installation in Nashorn, fixed most of the quirks, and the Nashorn performance work exploded into looking at things outside the class installation, I am closing this issue. Interested parties are welcome to track JDK-8059760 that collects the Nashorn-related performance issues.
10-10-2014

JDK-8059357 is submitted for more performance work in Verifier.
29-09-2014

I think my work is done here for at least a few weeks. I'll keep this issue open to re-profile things after all known issues are fixed.
09-09-2014

JDK-8057936 is submitted for URLClassLoader using the exceptions excessively
09-09-2014

About that case (c) and SymbolTable bottleneck: apart from actual time spent in hashing/equal-ing strings, the SymbolTable itself is overloaded. Here is the run with different SymbolTableSize-s: http://cr.openjdk.java.net/~shade/8053904/class-install-symbol-table-size-1.txt Dynamically-resized SymbolTables should help us here.
09-09-2014

JDK-8057846 is submitted for ClassVerifier::change_sig_to_verificationType bottleneck.
08-09-2014

JDK-8057845 is submitted for ClassVerifier:verify_exception_handler_targets bottleneck.
08-09-2014

So, I put together a classloader benchmark: http://cr.openjdk.java.net/~shade/8053904/benchmarks.jar Marcus had put together the Nashorn feature to dump the classes, I captured them here: http://cr.openjdk.java.net/~shade/8053904/classes.jar If you run the benchmark with jdk9 and: $ java -jar benchmarks.jar -wi 0 -i 30 -r 10s -p file=classes.jar ...it will yield this profile: http://cr.openjdk.java.net/~shade/8053904/class-install-calltree-2.txt There are obvious things to look at: a) ClassVerifier::change_sig_to_verificationType seems to copy the symbols a lot: +- 30.740 (9%) ClassVerifier::change_sig_to_verificationType(SignatureStream*,VerificationType*,Thread*) | +- 15.070 (4%) SignatureStream::as_symbol(Thread*) | +- 14.850 (4%) ClassVerifier::create_temporary_symbol(const Symbol*,int,int,Thread*) b) ClassVerifier::verify_exception_handler_targets seems to spend a lot of time reconstructing the exception table: +- 30.470 (9%) ClassVerifier::verify_exception_handler_targets(unsigned short,bool,StackMapFrame*,StackMapTable*,Thread*) | +- 11.350 (3%) ConstMethod::exception_table_length()const | +- 11.270 (3%) ConstMethod::exception_table_start()const c) ClassFileParser seems to bottleneck at SymbolTable: +- 45.230 (13%) ClassFileParser::parseClassFile(Symbol*,ClassLoaderData*,Handle,KlassHandle,GrowableArray<Handle>*,TempNewSymbol&,bool,Thread*) | +- 32.080 (9%) ClassFileParser::parse_constant_pool(Thread*) | | +- 19.680 (6%) ClassFileParser::parse_constant_pool_entries(int,Thread*) | | | +- 15.600 (5%) SymbolTable::lookup_only(const char*,int,unsigned&) I can file RFEs against (a) and (b), if others concur with these. Case (c) seems to be generic to SymbolTable.
08-09-2014

Dump all classes to disk patch
04-09-2014

Until I've submitted it to the open JDK you can apply the attached diff to get every class generated to dump to disk Then just run nashorn with the -d=<dir> flag, for example jjs -scripting -d=dingo ../test/script/basic/run-octane.js -- crypto --verbose --iterations 10 will dump all generated classes during the crypto benchmark run to the directory ./dingo which will be created if it doesn't exist.
04-09-2014

Yes, -Xverify:none makes the entire branch of Verifier::verify to go away, and "Class Installation" phase in Nashorn becomes faster 2x.
26-08-2014

If you run with -verify:none, do you see the 51% spent in bytecode verifier completely vanish? It should. This would be a good sanity check to validate your measurements.
26-08-2014

The call tree for Class Installation Phase in Nashorn: http://cr.openjdk.java.net/~shade/8053904/class-install-calltree-1.txt VM parts are clearly dominating: - 51.0% is spent in bytecode Verifier - 26.2% is spent in parsing the class file - 11.2% is spent dancing at java.lang.Classloader side - 3.5% is spent dancing around java.lang.Class cache Note that these percents are relative to 3% spent in class installation phase. With the headlines like that, we might invest into polishing the Verifier and ClassFileParser.
26-08-2014

Now that many low-hanging fruits are fixed in both generic Nashorn, and the class installation phase, some performance recap. I have done 10 --log=time'd executions of compile-octane.js with: for I in `seq 1 10`; do ~/trunks/jdk9-dev/build/linux-x86_64-normal-server-release/images/j2sdk-image/bin/java -jar ~/trunks/jdk9-dev/build/linux-x86_64-normal-server-release/images/j2sdk-image/jre/lib/ext/nashorn.jar -Dnashorn.typeInfo.disabled=false --class-cache-size=0 --persistent-code-cache=false -scripting --log=time test/script/basic/compile-octane.js -- --iterations 1 2>&1 | grep "\[time\]"; done ...and here are the aggregated results. Before we started the work yesterday: Total time: (64.1 +- 2.5) sec Class Installation time: (4.03 +- 0.84) sec Class Installation %: 6.2% Current Nashorn (989:fa79d912da1b): Total time: (54.6 +- 2.4) sec Class Installation time: (1.57 +- 0.05) sec Class Installation %: 2.9% That is, we are below 3% overhead, which seems to be split in 2% for verifying the bytecode (see JDK-8055909), and 1% of actual classloading. We will quantify those two parts more rigorously later.
26-08-2014

So, the bottom-line so far: most of the issues at this point are Nashorn-specific. Because of JDK-8055923, the --log=time is probably lying about the execution time. We can cross-reference the timing with external profile information though, and have --log=time as semi-reliable performance metric. There are four major contributors to class installation performance in Nashorn * JDK-8055954 sets back the class installation performance by overusing parallelism * JDK-8055909 forces the class linkage, and we spend considerable time there * JDK-8056025 tells the internal Nashorn state book-keeping consumes significant time * The fourth contributor is the classloading itself, but there is nothing terribly suspicious there, we do spend time parsing the bytecode. We can re-measure and dive further into the fourth one, if after fixing the first three the class installation phase is still problematic.
26-08-2014

Note that there is overhead for the anonymous internal classes created by the LambdaForms as well. It is a significant part of the class installation overhead.
20-08-2014