JDK-8276998 : [REDO] Optimization of Box nodes in uncommon_trap
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 17,18,19
  • Priority: P4
  • Status: Open
  • Resolution: Unresolved
  • OS: generic
  • CPU: generic
  • Submitted: 2021-11-11
  • Updated: 2024-02-07
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdUnresolved
Related Reports
Cloners :  
Duplicate :  
Relates :  
Relates :  
Relates :  
Description
JDK-8075052 has removed useless autobox. However, in some cases, the box is still saved. For instance:

@Benchmark
public void testMethod(Blackhole bh) {
  int sum = 0;
  for (int i = 0; i < data.length; i++) {
      Integer ii = Integer.valueOf(data[i]);
      if (i < data.length) {
          sum += ii.intValue();
      }
  }
  bh.consume(sum);
}

Although the variable ii is only used at ii.intValue(), it cannot be eliminated as a result of being used by a hidden uncommon_trap.
The uncommon_trap is generated by the optimized "if", because its condition is always true.

We can postpone box in uncommon_trap in this situation. We treat box as a scalarized object by adding a SafePointScalarObjectNode in the uncommon_trap node,
and deleting the use of box:

```
@@ -603,6 +603,51 @@ void CallGenerator::do_late_inline_helper() {
     C->remove_macro_node(call);
   }
 
+  if (callprojs.resproj != NULL && call->is_CallStaticJava() &&
+      call->as_CallStaticJava()->is_boxing_method()) {
+    GraphKit kit(call->jvms());
+    PhaseGVN& gvn = kit.gvn();
+
+    Node_List delay_boxes;
+    bool no_use = true;
+    for (DUIterator_Fast imax, i = callprojs.resproj->fast_outs(imax); i < imax; i++) {
+      Node* m = callprojs.resproj->fast_out(i);
+      if (m->is_CallStaticJava() &&
+          m->as_CallStaticJava()->uncommon_trap_request() != 0) {
+        delay_boxes.push(m);
+      } else {
+        no_use = false;
+        break;
+      }
+    }
+
+    if (no_use) {
+      // delay box node in uncommon_trap runtime, treat box as a scalarized object
+      while (delay_boxes.size() > 0) {
+        ProjNode* res = callprojs.resproj->as_Proj();
+        Node* uncommon_trap_node = delay_boxes.pop();
+        int in_edge = uncommon_trap_node->find_edge(res);
+        assert(in_edge > 0, "sanity");
+
+        ciInstanceKlass* klass = call->as_CallStaticJava()->method()->holder();
+        int n_fields = klass->nof_nonstatic_fields();
+        assert(n_fields == 1, "sanity");
+
+        uint first_ind = (uncommon_trap_node->req() - uncommon_trap_node->jvms()->scloff());
+        Node* sobj = new SafePointScalarObjectNode(gvn.type(res)->isa_oopptr(),
+#ifdef ASSERT
+                                                  NULL,
+#endif // ASSERT
+                                                  first_ind, n_fields);
+        sobj->init_req(0, C->root());
+        uncommon_trap_node->add_req(call->in(res->_con));
+        sobj = gvn.transform(sobj);
+        uncommon_trap_node->jvms()->set_endoff(uncommon_trap_node->req());
+        uncommon_trap_node->set_req(in_edge, sobj);
+      }
+    }
+  }
+
```
It is shown in the following table that the patch improves the performance on x64 and  aarch64.
The following table shows results of the performance test on x64 and aarch64:
x64:
base: 
Benchmark                     Mode  Samples   Score  Score error  Units
o.s.MyBenchmark.testMethod    avgt       30  50.374        3.927  us/op
delay box in uncommon_trap:
Benchmark                     Mode  Samples  Score  Score error  Units
o.s.MyBenchmark.testMethod    avgt       30  4.571        0.004  us/op
aarch64:
base:
Benchmark                     Mode  Samples   Score  Score error  Units
o.s.MyBenchmark.testMethod    avgt       30  51.782        0.357  us/op
delay box in uncommon_trap:
Benchmark                     Mode  Samples  Score  Score error  Units
o.s.MyBenchmark.testMethod    avgt       30  3.876        0.002  us/op

There is no additional fail/error(s) of jtreg after this patch.
Comments
[~kvn] mentioned in the PR for JDK-8317299 (https://github.com/openjdk/jdk/pull/17500#issuecomment-1913364161) that it could resolve the issue JDK-8276112 (and so, this [REDO]). Unfortunately I couldn't reproduce the issue in JDK-8276112 to check if this is really the case but it is probably something to check/keep in mind.
07-02-2024

> Sounds reasonable to me but I wonder if this would be a problem for other tools/components looking at debug information and expecting a valid value for this local at bci 11. I don't think external tools will probe uncommon_traps. they are stubs given by the nature of speculative compilation. they exist or don't exit based on profiling data. Besides, I have another implementation which retains bci at do_if and do_ifnull. I just pull liveness of citypeflow using next bci. it has a corner case that if bytecodes themself use references which are dead at next bci. handling this corner case makes the patch ugly. It's easier to change bci directly of JVMState. > As [~jrose] pointed out in a private conversation, it's already the case that dead locals are removed from the JVM state. I think ciEnv::should_retain_local_variables takes care of this. from my reading, current c2 can't remove local ii at bci 11: if_icmple 21. Could you take a look at my Test.java of JDK-8286104. I also convert it to IRTest.
05-05-2022

As [~jrose] pointed out in a private conversation, it's already the case that dead locals are removed from the JVM state. I think ciEnv::should_retain_local_variables takes care of this.
05-05-2022

Sounds reasonable to me but I wonder if this would be a problem for other tools/components looking at debug information and expecting a valid value for this local at bci 11. For example, what happens when we attach a Java level debugger right after deoptimization and inspect the local?
04-05-2022

hi, Tobias and Yan, I take a look at the Test.java. I come up another idea for this issue. 'ii' is one of uncommon_trap's inputs because GraphKit::kill_dead_locals can't kill it at bci=11. public static int foo(int); Code: 0: iload_0 1: invokestatic #4 // Method java/lang/Integer.valueOf:(I)Ljava/lang/Integer; 4: astore_1 5: iconst_0 6: istore_2 7: iload_0 8: sipush 999 11: if_icmple 21 14: iload_2 15: aload_1 16: invokevirtual #5 // Method java/lang/Integer.intValue:()I 19: iadd 20: istore_2 21: invokestatic #6 // Method black:()V 24: iload_2 25: ireturn Interpreter will start over at bci 11, 'ii' is indeed alive at 11. however, "unstable if" is a special case. We know that we will take the road "less taken by" this time, so we can use liveness information for that path. In this case, ii happens to be dead at bci=21. By changing that, I change the debuginfo from call,static wrapper for: uncommon_trap(reason='unstable_if' action='reinterpret' debug_id='0') # Test::foo @ bci:11 (line 15) L[0]=_ L[1]=#ScObj0 L[2]=#0 STK[0]=RBP STK[1]=#999 # ScObj0 java/lang/Integer={ [value :0]=RBP } # OopMap {off=104/0x68} to: call,static wrapper for: uncommon_trap(reason='unstable_if' action='reinterpret' debug_id='0') # Test::foo @ bci:11 (line 15) L[0]=_ L[1]=_ L[2]=#0 STK[0]=RBP STK[1]=#999 # OopMap {off=60/0x3c} Not only it helps to escape analysis in this case, it is also helpful when deoptimization does happen. we can save the space and effort for 'ii' which is dead. What do you think about this?
28-04-2022

Yes, this RFE is for re-implementing the optimization. The broken implementation from JDK-8261137 will be removed by JDK-8284198. [~whuang] do you plan to work on this? If not, please un-assign.
11-04-2022

This RFE is exactly for such purpose: [REDO]. But we may need different approach and can't reuse JDK-8261137 implementation which had issues.
08-04-2022

Should we also create a new issue to fix and re-enable the optimization?
08-04-2022

I will undo the optimization and have created a new issue for that: https://bugs.openjdk.java.net/browse/JDK-8284198
02-04-2022

Yes, that makes sense to me. Let's revert the changes with another (cleanup) RFE and leave this open for future investigations.
01-04-2022

[~thartmann] AFAIK, this has not been investigated yet. The optimization has been disabled by JDK-8276112, but the code changes are still there and not test-covered for several months. So I doubt whether the code changes are still valid. Shall we revert the original patch for now?
01-04-2022

Thank you for your[~thartmann] comments. We will recheck this problem and redo that.
15-11-2021

JDK-8261137 has been disabled by JDK-8276112 and needs to be fixed/re-implemented.
11-11-2021