JDK-8267532 : C2: Profile and prune untaken exception handlers
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 16,17
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2021-05-21
  • Updated: 2024-01-12
  • Resolved: 2023-11-28
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 22
22 b26Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Description
Disclaimer: while this investigation was motivated by Project Panama, I can reproduce the findings on a vanilla JDK.

There seems to be an issue when calling a close() method in a try/catch or try/finally block. In other words, this idiom:

Resource r = createResource()
doSomething(r);
r.close();

Is significantly faster than this other idiom:

Resource r = createResource()
try {
   doSomething(r);
   r.close();
} finally {
   r.close();
}

or this idiom:

Resource r = createResource()
try {
   doSomething(r);
   r.close();
} catch (Throwable t) {
   r.close();
   throw t;
}

The attached benchmark can be used to reproduce these findings:

Benchmark                                   Mode  Cnt   Score   Error  Units
ResourceScopeCloseMin.confined_close        avgt   30  14.398 ? 0.175  ns/op
ResourceScopeCloseMin.confined_close_notry  avgt   30   6.375 ? 0.053  ns/op

I've tried to remove all factors which could lead to potential confusion in JIT, such as interfaces, or javac-generated try-with-resources blocks (which contain extra calls to add suppressed exceptions which might have skewed the benchmark).

If I run with -XX:+PrintInlining I always see something like this:

                              @ 25   org.openjdk.bench.jdk.incubator.foreign.ResourceScopeCloseMin$ConfinedScope::close (48 bytes)   already compiled into a medium method

And, inspecting the bytecode reveals that @25 is indeed the BCI of the close() call inside the catch block. Commenting out that call makes everything fast again.

---

The issue is essentially that the second call to close() in the finally/catch block is never reached, so C2 doesn't inline the call, which makes the Resource object escape, and not scalarizable. This leads then to the difference in performance. C2 normally replaces infrequently taken branches with an uncommon trap, but doesn't use an uncommon trap for the finally/catch blocks, since there is currently no information available to indicate whether finally/catch blocks are taken or not. They are entered through exception dispatch rather than a regular branch through the 'goto' bytecode (which does have profiling implemented for it).

To proposed solution is to implement profiling for exception handlers (which is the underlying bytecode implementation of the finally/catch blocks), and then use that profiling information in C2 to emit uncommon traps in place of exception handlers that are never entered. In the examples above, this would eliminate the second call to close() altogether, meaning the resource object also does not escape and can be scalar replaced. This brings the performance on par with the base case.
Comments
Changeset: a5ccd3be Author: Jorn Vernee <jvernee@openjdk.org> Date: 2023-11-28 10:17:58 +0000 URL: https://git.openjdk.org/jdk/commit/a5ccd3beaf069bdfe81736f6c62e5b4b9e18b5fe
28-11-2023

[~jcao] See the linked issue: https://bugs.openjdk.org/browse/JDK-8310011
01-11-2023

A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/16416 Date: 2023-10-30 14:10:33 +0000
01-11-2023

I can see this investigation is motivated by Panama. Do we have any links to real Panama code with similar code shape?
13-09-2023

the reason 25 Allocate is GlobalEscape because it escapes from 'scope.close()' in catch(Throwable t) block. ======== Connection graph for ResourceScopeCloseMin::confined_close invocation #1: 3 iterations and 0.000032 sec to build connection graph with 1610 nodes and worklist size 56 JavaObject(13) ArgEscape(GlobalEscape) [ 289F 213F 222F [ 37 42 ]] 25 Allocate === 5 6 7 8 1 (23 21 22 1 1 10 1 1 ) [[ 26 27 28 35 36 37 ]] rawptr:NotNull ( int:>=0, java/lang/Object:NotNull *, bool, top, bool ) ResourceScopeCloseMin::confined_close @ bci:0 (line 41) Type:{0:control, 1:abIO, 2:memory, 3:rawptr:BotPTR, 4:return_address, 5:rawptr:NotNull} !jvms: ResourceScopeCloseMin::confined_close @ bci:0 (line 41) LocalVar(34) ArgEscape(GlobalEscape) [ 25P [ 42 ]] 37 Proj === 25 [[ 38 42 ]] #5 Type:rawptr:NotNull !jvms: ResourceScopeCloseMin::confined_close @ bci:0 (line 41) LocalVar(43) ArgEscape(GlobalEscape) [ 37 25P [ 289b 213b 222b ]] 42 CheckCastPP === 39 37 [[ 643 546 1296 1258 1241 1215 1196 289 1160 1435 629 91 91 91 1056 1043 1030 289 222 213 213 222 ]] Oop:ResourceScopeCloseMin$ConfinedScope:NotNull:exact * !orig=1409 !jvms: ResourceScopeCloseMin::confined_close @ bci:0 (line 41) See 1409 in https://bugs.openjdk.org/secure/attachment/101269/ResourceScopeCloseMin_after_Parse.png it's supposed to materialize in my PEA RFC. I will see if I can make it happens.
01-11-2022

> Resource r = createResource(); > try { > r = new Resource(field1, field2,...fieldN); // initialize it with the current state. > doSomething(r); // inlined > r.close(); // inlined > } catch (Throwable r) { > r = new Resource(field1, field2, ... fieldN); // initialize it with the state of exception. > r.close(); // not inlined >} It doesn't look equivalent to the original code unless the allocation in try block is reliably scalarized. (Imagine a situation when an exception is thrown. There'll be 2 instances allocated: one in try block and the other in catch block.)
26-10-2022

> I see your point. try-finally is nothing special. Inliner should bias to high-frequent calls. finally-block shares the same frequent with try-block. Ideally, inliner should inline r.close() in finally-block like it does in try-block. try-finally is just a syntactic sugar. Javac turns it into try-catch when translating into bytecodes: 0: invokestatic #7 // Method createResource:()LResource; 3: astore_1 4: aload_1 5: invokestatic #13 // Method doSomething:(LResource;)V 8: aload_1 9: invokevirtual #17 // Method Resource.close:()V 12: goto 22 15: astore_2 16: aload_1 17: invokevirtual #17 // Method Resource.close:()V 20: aload_2 21: athrow 22: return Exception table: from to target type 4 8 15 any try-with-resources is even more complicated: https://docs.oracle.com/javase/specs/jls/se19/html/jls-14.html#jls-14.20.3.1
26-10-2022

I see your point. try-finally is nothing special. Inliner should bias to high-frequent calls. finally-block shares the same frequent with try-block. Ideally, inliner should inline r.close() in finally-block like it does in try-block. Try-catch is the new case for C2. The idiom separates into hot/cold paths in nature. If inliner doesn't have special treatment for the cold path, we may resort to partial escape analysis. We literally split the original object into two. One for the hot path and the other one is for cold path. The divider is the point where it throws an exception. I think it's a new application of PEA. Resource r = createResource(); try { r = new Resource(field1, field2,...fieldN); // initialize it with the current state. doSomething(r); // inlined r.close(); // inlined } catch (Throwable r) { r = new Resource(field1, field2, ... fieldN); // initialize it with the state of exception. r.close(); // not inlined }
25-10-2022

try-finally/try-with-resources introduce 2 calls (on normal and exceptional exits). Currently, both have to be inlined in order for scalarization to happen. It's commonly the case on normal exit (hot path), but exceptional case is usually very rarely taken and inlining doesn't happen there. Resource r = createResource(); try { doSomething(r); // inlined r.close(); // inlined } catch (Throwable r) { r.close(); // not inlined } If there are no other escape points for r, the allocation can be placed right before close() call which already has JVM state associated with it.
25-10-2022

> Actually, considering the call already keeps the JVM state associated with it, it may be quite straightforward to extend current EA to rematerialize scalar replaced allocations at such escape points (calls on infrequently taken branches). Do you mean C2 generates code to 'rematerialize' for r.close()? catch (Throwable t) { // r = rematerialize() from the SafepointScalarObjectNode r.close(); throw t; } let's assume it works, how to process the final-block? we will need to 'rematerialize' r 100%. it means we lose the point to do scalar replacement. Resource r = createResource() try { doSomething(r); r.close(); } finally { r.close(); }
21-10-2022

I added EA label to keep track of this case.
06-10-2022

Actually, considering the call already keeps the JVM state associated with it, it may be quite straightforward to extend current EA to rematerialize scalar replaced allocations at such escape points (calls on infrequently taken branches).
06-10-2022

Even if profiling data is available, we can't do much to help EA when exceptions are rarely encountered. Looks like a perfect match for partial EA (moving allocation into the rarely taken catch block would solve the problem).
06-10-2022

[~kvn] yes, non-inlined close() inside catch block definitely hinders EA. If either the block is not pruned or the call is not inlined, the allocation is marked as non-scalar-replaceable.
06-10-2022

[~vlivanov] Thank you for explanation. I thought the reason is Resource object escapes into not-inlined close() method (it could be the case in try/finally test). May be we should profile `catch` blocks.
06-10-2022

[~kvn] the actual problem is that the catch block is not pruned. The best option is to put a trap there, since there are no exceptions thrown by the benchmark. But since there are no profiling info available (catch blocks aren't profiled) and the exception class (Throwable) is loaded, the block is not pruned and an unresolved call site is encountered.
06-10-2022

"already compiled into a medium method" indicates that it is related to -XX:InlineSmallCode=value limitation used by C2 to avoid inlining a method which was previously compiled into big code.
06-10-2022

Converting this to an enhancement for now because it's not a regression but probably never worked.
21-05-2021

Actually, even calling Scope::toString in the catch block is gonna lead to same performance degradation. I'm starting to think that maybe this is escape analysis-related?
21-05-2021