Bug ID: JDK-8059399 Dynalink constants can be shorter

Type: Enhancement
Component: core-libs
Sub-Component: jdk.nashorn
Affected Version: 9

Priority: P4
Status: Resolved
Resolution: Duplicate
OS: generic
CPU: generic

Submitted: 2014-09-29
Updated: 2015-11-02
Resolved: 2015-11-02

If you run Nashorn/Octane benchmarks with a simple instrumentation that prints out the symbols looked up from VM's Symbol table...

~/trunks/jdk9-dev/build/linux-x86_64-normal-server-release/images/j2sdk-image/bin/java  -jar ~/trunks/jdk9-dev/build/linux-x86_64-normal-server-release/images/j2sdk-image/jre/lib/ext/nashorn.jar -Dnashorn.typeInfo.disabled=false --class-cache-size=0 --persistent-code-cache=false -scripting --log=time test/script/basic/run-octane.js -- --iterations 1 

Then you will get the output like this:
 http://cr.openjdk.java.net/~shade/8059399/all-symbols.log.gz

Sifting through this output yields an interesting observation (the log is sorted and uniq-qed):

$ zcat all-symbols.log.gz | wc
60051   63459 3515831

$ zcat all-symbols.log.gz | grep dyn: | wc
9190    9190  345292

That is, around 10% of all symbol space is consumed by "dyn:..." constants, and the average length of "dyn:..." constant is around 37 characters. The look of these constants reveals an interesting structure: "dyn:setProp|setElem:xrefstms". It seems profitable to condense the control prefixes of these constants to something more dense, e.g. "D:P|E:xrefstms", which will save ~14 bytes per constant. This adds up to ~128K characters saved in metaspace memory in Nashorn/Octane case. Careful redesign may also improve constant parsing time, and therefore contribute to better warmup.

Upon further thinking, I realized we don��t need a string encoding for the operations at all. We could take a page from the design of java.nio.File which defines a typesafe yet open set of operations for various purposes in its CopyOption and OpenOption interfaces, with standard ones being available as enums implementing the interfaces (StandardCopyOption, StandardOpenOption), but allowing for any additional filesystem-specific ones to extend it; see http://docs.oracle.com/javase/8/docs/api/java/nio/file/package-summary.html As a matter of fact, it looks like accommodating standard but extensible set of operations on diverse filesystems is a problem not unlike accommodating standard but extensible set of operations on diverse object models :-) so we could have an empty interface named "Operation" in Dynalink with a standard implementation "enum StandardOperation implements Operation" and providing GET_PROPERTY, SET_PROPERTY, GET_ELEMENT, SET_ELEMENT, GET_METHOD, CALL_METHOD, CALL, and NEW values. For Dynalink composite operations, we can have a ��class CompositeOperation implements Operation" class to express a composition of e.g. GET_PROPERTY,GET_ELEMENT,GET_METHOD. so a CallSiteDescriptor constructor could look like this: CallSiteDescriptor(Lookup lookup, Operation operation, MethodType methodType) and instead of the funky business with getNameToken() etc. it��d just have an "Operation getOperation()�� method. Named operations (mostly, getProp/setProp carrying their property name) could be expressed as ��class NamedOperation implements Operation�� expressing a pair of (Operation, String). The ultimate beauty of the approach? We no longer need to prescribe how should the operations be encoded in the invokedynamic instructions! Nashorn can choose to encode the operations in the bytecode any way it wants, it just needs to decode them accordingly in the bootstrap method. Both emitting invokedynamic instructions and their bootstrapping are responsibilities that always remain within a single language runtime, so there��s no need for enforcing an encoding in the Dynalink specification. They can be encoded either in the call site name, or in the static parameters to the bootstrap method, or combination thereof �� whatever makes most sense for the language runtime being implemented! Tthis makes a lot of sense to me. Instead of ��dyn:getProp\|getElem\|getMethod:x��, we can now just leave maybe the operand name ��x�� in the method name, and move the description of the operation into the single "flags" static parameter Nashorn normally takes. Nashorn basically only uses 6 dynamic operations: the usual ��.��, ��[]��, ��()��, and ��new ()��, as well as ��callee" variants of ��.�� and ��[]�� that are used when we know from source code that whatever they return will be followed by ��()�� (these favor GET_METHOD over GET_PROPERTY or GET_ELEMENT in their compositions); that��s 3 bits required. The bootstrap method will materialize the appropriate Operation objects from the flags (and maybe property names).
09-10-2015
Deferring this from 8u40 to 9.
15-10-2014
Or possibly move partially tokenized operation specifiers into extra bootstrap arguments, leaving the name part less variable. So, e.g. current INVOKEDYNAMIC "dyn:getProp\|getElem\|getMehod:foo" [] (where square brackets denote extra bootstrap args) would become INVOKEDYNAMIC "dyn:getProp\|getElem\|getMehod:" ["foo"] The benefit of this is that we'll have, on average, fewer constants (worst case scenario for m operations and n names when they aren't separated is m*n constants, while with separation it's always m+n).
15-10-2014
We could use a special unicode character as the dynalink prefix instead of "dyn:" ;-) Capital Delta maybe?
29-09-2014

Blocks :	JDK-8059760 - VM/JDK fixes for Nashorn performance (warmup/footprint, indy)
Duplicate :	JDK-8139931 - Introduce Operation objects in Dynalink instead of string encoding