Blocks :
|
|
Blocks :
|
|
Relates :
|
|
Relates :
|
|
Relates :
|
|
Relates :
|
|
Relates :
|
|
Relates :
|
|
Relates :
|
|
Relates :
|
|
Relates :
|
JDK-8148483 :
|
|
JDK-8148604 :
|
|
JDK-8148730 :
|
Summary ------- Change the static `String`-concatenation bytecode sequence generated by `javac` to use `invokedynamic` calls to JDK library functions. This will enable future optimizations of `String` concatenation without requiring further changes to the bytecode emitted by `javac`. Goals ----- Lay the groundwork for building optimized `String` concatenation handlers, implementable without the need to change the Java-to-bytecode compiler. Multiple translation strategies should be tried as the motivational examples for this work. Producing (and possibly switching to) the optimized translation strategies is a stretch goal for this JEP. Non-Goals --------- It is not a goal to: Introduce any new `String` and/or `StringBuilder` APIs that might help to build better translation strategies, revisit the JIT compiler's support for optimizing `String` concatenation, support advanced `String` interpolation use cases, or explore other changes to the Java programming language. Success Metrics --------------- - A stable, future-proof bytecode sequence is generated by `javac`. The intent is to change the bytecode shape at most once in any major Java release. - `String` concatenation performance does not regress. - Startup time and time to performance do not regress beyond reasonable levels. Motivation ---------- Currently `javac` translates `String` concatenation into `StringBuilder::append` chains. Sometimes this translation is not optimal, and sometimes we need to presize the `StringBuilder` appropriately. The `-XX:+OptimizeStringConcat` option enables aggressive optimizations in the JIT compiler, which recognize `StringBuilder` append chains and do the presizing and in-place copying. Those optimizations, while fruitful, are fragile and difficult to extend and maintain; see, e.g., [JDK-8043677](https://bugs.openjdk.java.net/browse/JDK-8043677), [JDK-8076758](https://bugs.openjdk.java.net/browse/JDK-8076758), and [JDK-8136469](https://bugs.openjdk.java.net/browse/JDK-8136469). It is tempting to change the translation in `javac` itself; see, e.g., the [recent proposal on the compiler-dev list](http://mail.openjdk.java.net/pipermail/compiler-dev/2015-March/009356.html). When we consider changing anything in `javac`, it seems inconvenient to change the compiler and the bytecode shape every time we want to improve performance. Users generally expect the same bytecode to run faster on newer JVMs. Asking users to recompile their Java programs for performance is not friendly, and it also explodes the testing matrix, since JVMs should recognize all the variants of the generated bytecode. Therefore, we might want to employ some trick to declare the intent of concatenating strings in bytecode, and then expand that intent in runtime. In other words, we are introducing something *similar* to a new bytecode, "string concat". As in the Lambda work, this closes the gap between the JLS allowing a language feature (string concatentation, in this case) and the JVMS not having the appropriate facilities for it, forcing `javac` to translate it into low-level code, and further forcing VM implementations to recognize all low-level translated forms. `invokedynamic` allows us to overcome this impedance mismatch with one leap by providing a guaranteed interface to string concatenation at the level of the core libraries. The actual implementation of the interface can even be implemented using (potentially unsafe) private JVM APIs for concatenation, such as fixed-length string builders, tied to a particular use case. Making use of these APIs directly in `javac` would mean exposing them to the public. Description ----------- We will use the power of `invokedynamic`: It offers the facilities for a lazy linkage, by providing the means to bootstrap the call target once, during the initial invocation. This approach is nothing new, and we borrow extensively from current code that [translates lambda expressions](http://cr.openjdk.java.net/~briangoetz/lambda/lambda-translation.html). The idea is to replace the entire `StringBuilder` append dance with a simple `invokedynamic` call to `java.lang.invoke.StringConcatFactory`, that will accept the values in the need of concatenation. For example, String m(String a, int b) { return a + "(" + b + ")"; } is currently compiled to: java.lang.String m(java.lang.String, int); 0: new #2 // class java/lang/StringBuilder 3: dup 4: invokespecial #3 // Method java/lang/StringBuilder."<init>":()V 7: aload_1 8: invokevirtual #4 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder; 11: ldc #5 // String ( 13: invokevirtual #4 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder; 16: iload_2 17: invokevirtual #6 // Method java/lang/StringBuilder.append:(I)Ljava/lang/StringBuilder; 20: ldc #7 // String ) 22: invokevirtual #4 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder; 25: invokevirtual #8 // Method java/lang/StringBuilder.toString:()Ljava/lang/String; 28: areturn But even with the naive indy translation, available in the proposed implementation via `-XDstringConcat=indy`, it can be significantly simpler: java.lang.String m(java.lang.String, int); 0: aload_1 1: ldc #2 // String ( 3: iload_2 4: ldc #3 // String ) 6: invokedynamic #4, 0 // InvokeDynamic #0:makeConcat:(Ljava/lang/String;Ljava/lang/String;ILjava/lang/String;)Ljava/lang/String; 11: areturn BootstrapMethods: 0: #19 invokestatic java/lang/invoke/StringConcatFactory.makeConcat:(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/String;[Ljava/lang/Object;)Ljava/lang/invoke/CallSite; Notice how we pass an `int` argument without boxing. At runtime, the bootstrap method (BSM) runs and links in the actual code doing the concatenation. It rewrites the `invokedynamic` call with an appropriate `invokestatic` call. This loads the constant string from the constant pool, but we can leverage the BSM static args to pass these and other constants straight to the BSM call. This is what a proposed `-XDstringConcat=indyWithConstants` flavor does: java.lang.String m(java.lang.String, int); 0: aload_1 1: iload_2 2: invokedynamic #2, 0 // InvokeDynamic #0:makeConcat:(Ljava/lang/String;I)Ljava/lang/String; 7: areturn BootstrapMethods: 0: #15 invokestatic java/lang/invoke/StringConcatFactory.makeConcatWithConstants:(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/String;[Ljava/lang/Object;)Ljava/lang/invoke/CallSite; Method arguments: #16 \u0001(\u0001) Notice how we only pass the dynamic arguments (`"a"` and `"b"`) to the BSM. The static constants would be already handled during linkage. The BSM method is provided with a recipe that tells in what order to concatenate the dynamic and static arguments, and what the static arguments are. This strategy also treats nulls, primitives, and empty strings separately. It is an open question which bytecode flavor should be the default. More details on bytecode shapes, concat flavors, and performance data can be found in the [experimental notes](http://cr.openjdk.java.net/~shade/8085796/notes.txt). A proposed bootstrap method API can be found [here](http://cr.openjdk.java.net/~shade/8085796/javadocs/). A complete implementation can be found in the sandbox branch: $ hg clone http://hg.openjdk.java.net/jdk9/sandbox sandbox $ cd sandbox/ $ sh ./common/bin/hgforest.sh up -r JDK-8085796-indyConcat $ sh ./configure $ make images One can see the difference between the baseline and patched runtime by using: $ hg diff -r default:JDK-8085796-indyConcat $ cd langtools/ $ hg diff -r default:JDK-8085796-indyConcat $ cd jdk/ $ hg diff -r default:JDK-8085796-indyConcat The benchmarks can be found here: [http://cr.openjdk.java.net/~shade/8085796/](http://cr.openjdk.java.net/~shade/8085796/) The proposed implementation successfully builds the JDK, runs the regression tests (including new ones that test `String` concat), and passes the smoke tests on all platforms. Multiple strategies can be used for the actual concatenation. Our proposed implementation shows there are no throughput hits when we move the bytecode sequence generated by `javac` into the BSM that injects the same bytecode, which validates the approach. Optimized strategies are shown to perform as well or better than the baseline, especially when default `StringBuilder` length is not enough and/or the VM's optimizations fail. See the [experimental notes](http://cr.openjdk.java.net/~shade/8085796/notes.txt) for more data. Alternatives ------------ There are three alternatives to this proposal. *First, obvious alternative:* Change the bytecode sequence in `javac` itself, by transplanting one of the bytecode-generation strategies from the proposed implementation. While simple in the short-term, it breaks away from the unspoken contract with the Java community at large: JVMs are smart enough to recognize and optimize the existing bytecode. We cannot easily coerce users to recompile their Java programs every time we want to tune the `String`-concatenation translation. Every time a bytecode shape changes the JIT compilers are forced to recognize a new form, and that explodes the test matrix considerably, as well as sets up unsuspecting users for strange performance (mis)behaviors. If we are to change the bytecode shape for performance reasons anyway, we better make it once in a major release, and in a future-proof way. Using private APIs for optimized string contatentation is out of the question as well: `javac`-generated code should use only the public APIs. *Second alternative:* Introduce a vararg `StringConcat.concat(Object... args)` method, and use it to concatenate the arguments. The major downside is the boxing of primitives, and then boxing all the arguments into the `Object[]`. While advanced JITs may recognize these idioms, we are putting ourselves at the mercy of new and untested optimizations in already-complex optimizing compilers. `invokedynamic` does roughly the same, but it supports various target method signatures out of the box. *Third alternative:* Continue doing as we do at present, and extend `-XX:+OptimizeStringConcat` to cover more cases. The downside is that optimizing in the JIT compiler requires spelling out the `toString` conversions and storage management with a compiler IR, which makes it hard to extend and requires rare expertise to get right. Also, the minute differences and oddities in the bytecode shape break `-XX:+OptimizeStringConcat`, and we have to be extra careful in `javac` to avoid this. See e.g. [JDK-8043677](https://bugs.openjdk.java.net/browse/JDK-8043677), [JDK-8076758](https://bugs.openjdk.java.net/browse/JDK-8076758), [JDK-8136469](https://bugs.openjdk.java.net/browse/JDK-8136469). All three alternatives basically put us at the mercy of fragile compiler optimizations. Testing ------- String concatenation is exercised routinely by most Java code. In addition to regular functional and performance tests, we plan to investigate the performance model of this change with microbenchmarks. New regression tests for different strategies may need to be developed. Risks and Assumptions --------------------- *Other `String`-related changes.* This change has the potential to collide with other `String`-related changes, notably with [Compact Strings (JEP 254)](http://openjdk.java.net/jeps/254). The mitigation plan is to generate exactly the same bytecode sequence as the current JDK does, then a "Compact Strings"-enabled JVM would exercise the same code path, oblivious of any `String`-concat translation changes. Indeed, our preliminary experiments show that our proposal plays nicely with Compact Strings. *java.base exemptions.* Since this feature uses a core library feature (`java.lang.invoke`) to implement a core language feature (`String` concatenation), we have to exempt the `java.base` module from using the indified `String` concat. Otherwise a circularity happens when the `java.lang.invoke.*` machinery requires `String` concat to work, which in turn requires the `java.lang.invoke.*` machinery. This exemption may limit the performance improvements observed from this feature, since many `java.base` classes will not be able to use it. We consider this an acceptable downside, which should be covered by the VM's optimizing compilers. *Maintenance cost.* The legacy `String`-concat translation is not going away, and therefore we have to maintain both `-XX:+OptimizeStringConcat` and the optimizations for newer translation strategies in the VM. `javac` will still need to provide a fallback for the cases where indified `String` concat cannot be used, notably some system classes. This does not pose a significant problem, since our performance story includes involving `-XX:+OptimizeStringConcat` in most strategies. *Compatibility: Platforms that don't spell `invokedynamic`.* What should they do once they see the indified String concat? There is experience with Lambdas already, where the non-indy-spelling platforms have to desugar indys away, and re-generate statically that very same code that indy would emit on the fly. While the code sequence generated for naive `String` concat is simpler than `LambdaFactory`, it still creates more complexity in the VM code. Since `javac` still retains the code that emits the legacy `String` concat sequence anyway, the aforementioned platforms can use this fallback. *Compatibility: Other compilers.* Should other (non-`javac`) compilers emit the same bytecode? Do we propose to change all compiler implementations to use the new translation strategy? Since the legacy `String` concat does not go away, other compilers are free to emit the same bytecode sequence they emit today. They can incrementally change some of the translations to the proposed indy-based translation scheme, as they see fit. *Compatibility: Bytecode manipulators/weavers.* Tools would need to handle new indys in places they are not used to even in absence of source code changes, just by the virtue of recompilation. We don't see this as the major problem, since tools are expected to handle Lambdas already, and so they need to recognize `invokedynamic`. Corrections would be needed for the tools that used to instrument the `StringBuilder::append` chains generated by `javac`. *Static footprint.* Classfile size will probably be an issue in `String`-concat-heavy code. There are machine-generated files out there which use up nearly all the entries in the CP and do a huge number of string concatenations. It seems that the method bytecode footprint is much lower for indified `String` concat, and there is additional static overhead in the constant pool for the `java.lang.invoke` machinery, plus a bit of overhead per each concat shape. See the notes above for more data. *Startup overhead.* There is a risk that startup overhead would be prohibitive for this work to move forward. The implementation of Lambdas introduced these kinds of translation strategies, but you only pay the price if you use Lambdas. With String concat, once `invokedynamic` is compiled into the bytecode, there is no escape for uses of `+` in existing code (unless you change the code to avoid `+` and invoke `StringBuilder` explicitly). This risk is associated with the initialization of the `invokedynamic` machinery (see [JDK-8086045](https://bugs.openjdk.java.net/browse/JDK-8086045)), and therefore accepting the startup regression here would mean the absence of startup regressions in other changes that require `invokedynamic` to be initialized, notably the [module system (JEP 261)](http://openjdk.java.net/jeps/261). Conversely, if the module system lands before this change then the startup costs will be amortized. Dependencies ----------- - [JEP 254: Compact Strings](http://openjdk.java.net/jeps/254) - [java.lang.invoke initialization costs (JDK-8086045)](https://bugs.openjdk.java.net/browse/JDK-8086045)
|