JDK-8027827 : Improve performance of catchException combinator
  • Type: Enhancement
  • Component: core-libs
  • Sub-Component: java.lang.invoke
  • Affected Version: 8
  • Priority: P2
  • Status: Resolved
  • Resolution: Fixed
  • OS: generic
  • CPU: generic
  • Submitted: 2013-11-05
  • Updated: 2016-05-27
  • Resolved: 2014-02-28
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 8 JDK 9
8u20Fixed 9 b04Fixed
Related Reports
Relates :  
Relates :  
Description
We're currently working on an OSR (on stack replacement) compilation framework for Nashorn. We're compiling functions to bytecode with optimistic assumptions, and when they're invalidated during execution, we jump out of the function, recompile it without the invalidated optimistic assumption, and then resume it. 

(The mechanism of resumption is not important for the purposes of discussion here). 

The "jump out of the function" part is important though, and we can only implement it by throwing an exception in the executing method, and by linking all such methods using a catchException combinator which does the recompilation/resume magic in its exception handler.

However, since all of our optimistically compiled functions are linked using catchException, we absolutely need it to be fast, so we need some fast, non-boxing hot paths for it.

Optimistic compilation gets most performance when we can force all arguments to be ints, so an educated guess is that Nashorn would benefit most from hot paths of the form (Object, int, int, ..., int) and (Object, Object, int, int, ..., int) (first one or two object arguments are either "this", or "callee, this" in Nashorn's case).
Comments
Fix attached which introduces shared invoker per arity (8027827.catchException.shared_arities.patch). LF shape: guardWithCatch=Lambda(a0:L,a1:L,a2:L,a3:L)=>{ t4:L=BoundMethodHandle$Species_LLL.argL2(a0:L); t5:L=BoundMethodHandle$Species_LLL.argL1(a0:L); t6:L=BoundMethodHandle$Species_LLL.argL0(a0:L); t7:L=ValueConversions.array(a1:L,a2:L,a3:L); t8:L=MethodHandleImpl.invokeWithCatch(t6:L,t5:L,t4:L,t7:L);t8:L} Performance results: j.CatchException.testNormal 1.630 0.025 nsec/op j.CatchException.testNormal_GWC 2.656 0.083 nsec/op j.CatchException.testNormal1 1.639 0.044 nsec/op j.CatchException.testNormal1_GWC 2.667 0.047 nsec/op j.CatchException.testNormal5int 1.631 0.052 nsec/op j.CatchException.testNormal5int_GWC 4.637 0.134 nsec/op j.CatchException.testNormal10 1.636 0.018 nsec/op j.CatchException.testNormal10_GWC 2.756 0.349 nsec/op j.CatchException.testNormal10int 1.675 0.077 nsec/op j.CatchException.testNormal10int_GWC 7.656 0.292 nsec/op
22-11-2013

Next step is to introduce generic invoker shared across method handles with the same arity. Looking at the performance results for existing implementation, there are no performance problems with special cases for arities <8.
22-11-2013

Updated the fix. Cleaned implementation a little. Got rid of assertion failures. Passes tests w/ -esa.
22-11-2013

Attila, thanks for the feedback. These loads go away during JIT-compilation (see generated code and benchmark results). I didn't embed target, catcher MHs & exception class in the first place, because I need to do some sharing of invokers later.
22-11-2013

Looks good overall. One thing I'm curious about is the storage of target, catcher, and exception class in instance fields of some object. When I whipped up a small catch combinator yesterday for internal Nashorn use, I just ended up using LDC to load method handle constants, e.g. 0 ldc "I_PLACEHOLDER" 2 checkcast MethodHandle 5 aload 0 6 aload 1 7 invokevirtual MethodHandle.invokeExact(ScriptFunction;Object;)Object; 10 areturn (Disregard that I used invokeExact and a custom reference type ScriptFunction - I know about invokeBasic) Wouldn't doing an ALOAD 0/GETFIELD/ASTORE/ALOAD cause HotSpot to not realize the method handle is constant? I'd think using LDC instead would be better for that purpose.
22-11-2013

Rough prototype fix is attached (approach: custom LF & bytecode per catchException instance). Performance difference for arities >8 is fixed: j.CatchException.testNormal10 1.623 0.069 nsec/op j.CatchException.testNormal10_GWC 1.654 0.133 nsec/op j.CatchException.testNormal10int 1.685 0.108 nsec/op j.CatchException.testNormal10int_GWC 1.621 0.040 nsec/op Bytecode shape is optimal, but LF representation is ugly (and hence bytecode generation code has some dirty hacks). Strictly speaking, LF representation is not correct - see ValueConversions.array function type and it's arguments' types mismatch. LF SHAPE guardWithCatch=Lambda(a0:L,a1:L,a2:I,a3:I,a4:I,a5:I,a6:I)=>{ t7:L=BoundMethodHandle$Species_LLL.argL2(a0:L); t8:L=BoundMethodHandle$Species_LLL.argL1(a0:L); t9:L=BoundMethodHandle$Species_LLL.argL0(a0:L); t10:L=ValueConversions.array(a1:L,a2:I,a3:I,a4:I,a5:I,a6:I); t11:L=MethodHandleImpl.invokeWithCatch(t9:L,t8:L,t7:L,t10:L); t12:I=MethodHandle(Object)int(t11:L);t12:I} where t7: catcher MH t8: exception class t9: target MH t10: boxing of parameter t11: invocation t12: result unboxing, if necessary BYTECODE ALOAD 0 CHECKCAST java/lang/invoke/BoundMethodHandle$Species_LLL GETFIELD java/lang/invoke/BoundMethodHandle$Species_LLL.argL2 : Ljava/lang/Object; ASTORE 7 ALOAD 0 CHECKCAST java/lang/invoke/BoundMethodHandle$Species_LLL GETFIELD java/lang/invoke/BoundMethodHandle$Species_LLL.argL1 : Ljava/lang/Object; ASTORE 8 ALOAD 0 CHECKCAST java/lang/invoke/BoundMethodHandle$Species_LLL GETFIELD java/lang/invoke/BoundMethodHandle$Species_LLL.argL0 : Ljava/lang/Object; ASTORE 9 TRYCATCHBLOCK L0 L1 L2 java/lang/Throwable L0 ALOAD 9 CHECKCAST java/lang/invoke/MethodHandle ALOAD 1 ILOAD 2 ILOAD 3 ILOAD 4 ILOAD 5 ILOAD 6 INVOKEVIRTUAL java/lang/invoke/MethodHandle.invokeBasic (Ljava/lang/Object;IIIII)I L1 GOTO L3 L2 DUP ALOAD 8 CHECKCAST java/lang/Class SWAP INVOKEVIRTUAL java/lang/Class.isInstance (Ljava/lang/Object;)Z IFEQ L4 ALOAD 7 CHECKCAST java/lang/invoke/MethodHandle SWAP ALOAD 1 ILOAD 2 ILOAD 3 ILOAD 4 ILOAD 5 ILOAD 6 INVOKEVIRTUAL java/lang/invoke/MethodHandle.invokeBasic (Ljava/lang/Object;Ljava/lang/Object;IIIII)I GOTO L3 L4 ATHROW L3 IRETURN Complete results: j.CatchException.testExceptional 1661.077 89.297 nsec/op j.CatchException.testExceptional_GWC 1602.973 62.182 nsec/op j.CatchException.testExceptional1 1739.222 107.465 nsec/op j.CatchException.testExceptional1_GWC 1629.994 146.241 nsec/op j.CatchException.testExceptional10 1685.451 306.015 nsec/op j.CatchException.testExceptional10_GWC 1609.218 108.329 nsec/op j.CatchException.testExceptional10int 1573.367 130.111 nsec/op j.CatchException.testExceptional10int_GWC 1677.120 109.548 nsec/op j.CatchException.testExceptional5int 1555.665 79.697 nsec/op j.CatchException.testExceptional5int_GWC 1603.100 109.214 nsec/op j.CatchException.testNormal 1.644 0.064 nsec/op j.CatchException.testNormal_GWC 1.628 0.070 nsec/op j.CatchException.testNormal1 1.626 0.047 nsec/op j.CatchException.testNormal1_GWC 1.627 0.060 nsec/op j.CatchException.testNormal10 1.623 0.069 nsec/op j.CatchException.testNormal10_GWC 1.654 0.133 nsec/op j.CatchException.testNormal10int 1.685 0.108 nsec/op j.CatchException.testNormal10int_GWC 1.621 0.040 nsec/op j.CatchException.testNormal5int 1.643 0.041 nsec/op j.CatchException.testNormal5int_GWC 1.636 0.065 nsec/op TESTING MethodHandlesTest: === catchException: 1488 positive test cases run MethodHandlesTest: COMPILE_THRESHOLD=0 === catchException: 1488 positive test cases run Fails with -esa due to inconsistency in LF representation.
22-11-2013

The problematic part is fast path for high arity cases: j.CatchException.testNormal10 1.644 0.088 nsec/op j.CatchException.testNormal10_GWC 36.639 6.740 nsec/op j.CatchException.testNormal10int 1.625 0.091 nsec/op j.CatchException.testNormal10int_GWC 40.729 1.728 nsec/op In this situation, the framework fallbacks to generic invoker and it considerably slows down invocation.
19-11-2013

Scores(JMH config: -wi 5 -i 5 -r 1 -w 1): j.CatchException.testNormal 1.584 0.034 nsec/op j.CatchException.testNormal_GWC 1.594 0.049 nsec/op j.CatchException.testNormal1 1.643 0.102 nsec/op j.CatchException.testNormal1_GWC 1.598 0.043 nsec/op j.CatchException.testNormal5int 1.603 0.042 nsec/op j.CatchException.testNormal5int_GWC 3.196 0.106 nsec/op j.CatchException.testNormal10 1.644 0.088 nsec/op j.CatchException.testNormal10_GWC 36.639 6.740 nsec/op j.CatchException.testNormal10int 1.625 0.091 nsec/op j.CatchException.testNormal10int_GWC 40.729 1.728 nsec/op j.CatchException.testExceptional 1684.907 125.331 nsec/op j.CatchException.testExceptional_GWC 1732.043 157.733 nsec/op j.CatchException.testExceptional1 1593.607 192.898 nsec/op j.CatchException.testExceptional1_GWC 1840.828 433.357 nsec/op j.CatchException.testExceptional5int 1535.928 114.934 nsec/op j.CatchException.testExceptional5int_GWC 1643.549 72.015 nsec/op j.CatchException.testExceptional10 1504.518 55.241 nsec/op j.CatchException.testExceptional10_GWC 1743.941 130.572 nsec/op j.CatchException.testExceptional10int 1532.347 82.212 nsec/op j.CatchException.testExceptional10int_GWC 1926.935 125.884 nsec/op
19-11-2013

Microbenchmarks attached(sources and binaries).
19-11-2013