JDK-8057967 : CallSite dependency tracking scales devastatingly poorly
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 8u60,9
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2014-09-09
  • Updated: 2019-08-08
  • Resolved: 2015-04-17
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 9
9 b64Fixed
Related Reports
Blocks :  
Relates :  
Description
If you go ahead and profile Nashorn doing the entire suite of benchmarks:

~/trunks/jdk9-dev/build/linux-x86_64-normal-server-release/images/j2sdk-image/bin/java -jar ~/trunks/jdk9-dev/build/linux-x86_64-normal-server-release/images/j2sdk-image/jre/lib/ext/nashorn.jar -Dnashorn.typeInfo.disabled=false --class-cache-size=0 --persistent-code-cache=false -scripting --log=time test/script/basic/run-octane.js 

...then you will see this profile:
 http://cr.openjdk.java.net/~shade/8057967/native-call-tree-1.txt

There lots of <Unknown> things, and that is expected: Java code hides there. But, notice how bad we are at tracking the dependencies: out of 3070 seconds of CPU time, we spend 442 seconds (~15%!) trying to identify the dependencies to flush. It seems we are walking through InstanceKlass::_dependencies nmethods linked list, and trying to ask each nmethod if it needs deoptimizing. This is a linear search in the best case, and a very quick instrumentation with:
  http://cr.openjdk.java.net/~shade/8057967/ik-trace-1.patch

...shows an immense miss rate when walking through that list. Notice the list size is also *growing*:
  http://cr.openjdk.java.net/~shade/8057967/instrumented.log

We need to figure out whether we can:
 a) Get less nmethods in the InstanceKlass::_dependencies list;
 b) Re-index InstanceKlass dependency information based on the DepChange type and/or CallSite instance
 c) Provide better statistics for dependency tracking events (akin to the brain-dead patch used above)
Comments
It seems it's too late for the fix to go into 8u40. The fix is more complicated than I initially thought. Will fix it in 9.
26-11-2014

Experimented with alternative approach: use target method's holder as a context. Box2D shows the following (-XX:CallSiteDependencyType=1): - 40k call site updates - significantly reduce amount of visited dependencies (2,6M => 650k) - still far from the precise solution (650k vs 1,2k); there are still populated contextes (~200 call site deps), related to popular LambdaForms (e.g. exactInvoker, guards) Patch: call_site_deps.all.01.hotspot.patch + call_site_deps.precise.jdk Configurations: (1) -XX:CallSiteDependencyType=0: use call site class as a context (original behavior) (2) -XX:CallSiteDependencyType=1: use target method holder class as a context (3) -XX:CallSiteDependencyType=2 -Djava.lang.invoke.MethodHandle.USE_CALL_SITE_CONTEXT=true: dedicated context per call site Tracing info: -XX:+TraceCallSiteChanges Sample output: CallSiteDepChange 42404 cs{0x00000005c9d1c7c0} java/lang/invoke/LambdaForm$MH::guard (0x00000007c01cf428) 1 / 239 (104 / 684318) - event count, call site oop, holder::method, deps (marked/visited), accumulated visited / total,
20-11-2014

More accurate performance data on Nashorn: http://cr.openjdk.java.net/~shade/8057967/csdependency.txt TL;DR: Immensely helps Typescript benchmark (~14% faster). Mandreel seems to be improving, but the measurement errors are too large. Other benchmark seem largely unaffected.
14-09-2014

Prototyped precise call site dependencies tracking (call_site_deps.precise.hotspot, call_site_deps.precise.jdk): attach a custom class to each MutableCallSite instance and use it as a context for tracking dependencies. New logic is turned off by default. Use -Djava.lang.invoke.MethodHandle.USE_CALL_SITE_CONTEXT=true.
12-09-2014

Vladimir's patch makes the entire subtree from setCallSiteTargetNormal to disappear from the hotspots: 0.310 (0%) MHN_setCallSiteTargetNormal + 0.200 (0%) Universe::flush_dependents_on(Handle,Handle) There is an anecdotal evidence for significantly better warmup on typescript (up to 20% faster).
12-09-2014

ILW: MMH => P3
10-09-2014

Another data point, running the same scenario with -Djava.lang.invoke.MethodHandle.COMPILE_THRESHOLD=0 yields the similar problem from Unsafe_DefineAnonymousClass: 1639.820 <Total> + 128.900 Unsafe_DefineAnonymousClass + 122.490 SystemDictionary::parse_stream(Symbol*,Handle,Handle,ClassFileStream*,KlassHandle,GrowableArray<Handle>*,Thread*) + 50.540 Universe::flush_dependents_on(instanceKlassHandle) + 49.600 CodeCache::mark_for_deoptimization(DepChange&) + 48.930 InstanceKlass::mark_dependent_nmethods(DepChange&) + 44.810 ClassFileParser::parseClassFile(Symbol*,ClassLoaderData*,Handle,KlassHandle,GrowableArray<Handle>*,TempNewSymbol&,bool,Thread*) + 22.270 InstanceKlass::link_class(Thread*) Notice the different entry method in Universe, but it still falls back to the same InstanceKlass::mark_dependent_nmethods. We need better dependency tracking statistics to triage this.
10-09-2014

Possibility: Add a plain-java private Class<?> field to at least MutableCallSite and VolCS. (CCS doesn't need it.) The JIT would set it to C when the MCS becomes a dependee of C. Initial value null means no dependee. Multiple conflicting deps needs a flag; could be Object.class, meaning brute force. The list of dependees would then begin in a tight scope. Current dependency mechanisms do not support scopes smaller than class or method. Method is less natural to express in Java-land; we could consider using a MemberName.
09-09-2014