JDK-8343377 : Performance regression in reflective invocation of native methods
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.lang:reflect
  • Affected Version: 18,21
  • Priority: P3
  • Status: In Progress
  • Resolution: Unresolved
  • OS: generic
  • CPU: generic
  • Submitted: 2024-10-24
  • Updated: 2024-11-15
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 24
24Unresolved
Related Reports
Relates :  
Description
ADDITIONAL SYSTEM INFORMATION :
Windows 10
i7 14700k

A DESCRIPTION OF THE PROBLEM :
The performance of cloneMethod.invoke is 1000% slower in JDK 21 relative to JDK 11.

REGRESSION : Last worked in version 11.0.25

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
JDK 21 should have the same or better performance over JDK 11.
ACTUAL -
JDK 21 is 10 times slower than JDK 11.

---------- BEGIN SOURCE ----------
See comment below for source code
---------- END SOURCE ----------

FREQUENCY : always



Comments
A pull request was submitted for review. Branch: master URL: https://git.openjdk.org/jdk/pull/22169 Date: 2024-11-15 22:17:10 +0000
15-11-2024

I prepared a preliminary patch to use native accessor only for the known signature-polymoprhic native methods and use MH for the rest. Preliminary results shows the fix works and the time for 3 methods should be close. (I tested with `jshell -R--add-opens -Rjava.base=ALL-UNNAMED` and pasted the source code in) Since this change touches infrastrcture, I will submit to CI for extensive testing to ensure there is no regression or other bad side effects.
15-11-2024

I think I know where the problem is: the Object.clone method is a native method, and as a result the method accessor created for it is a native accessor instead of a regular MethodHandle accessor. Can you try reflect on Object.hashCode (which should have the same slowness problem) and Object.toString (which should not have this problem at all) to verify my hypothesis?
15-11-2024

Additional Information from submitter: ====================================== Yes,I change SimpleTest.java, moving the fetching of clone method from the beginning of benchmark methods to the static initalizer. -------------------CODE BEGIN------------------ import java.lang.reflect.Method; import java.util.ArrayList; import java.util.List; public class SimpleTest { public static final Method cloneMethod1; public static final Method cloneMethod2; public static final Method cloneMethod3; static { try { cloneMethod1 = Object.class.getDeclaredMethod("clone"); cloneMethod1.setAccessible(true); cloneMethod2 = Object.class.getDeclaredMethod("clone"); cloneMethod2.setAccessible(true); cloneMethod3 = DataWithClone.class.getDeclaredMethod("clone"); cloneMethod3.setAccessible(true); } catch (NoSuchMethodException e) { throw new RuntimeException(e); } } public static void main(String[] args) { // prepare data List<Data> dataList = new ArrayList<Data>(); for (int i = 0; i < 100; i++) { dataList.add(new Data()); } List<DataWithClone> dataWithCloneList = new ArrayList<DataWithClone>(); for (int i = 0; i < 100; i++) { dataWithCloneList.add(new DataWithClone()); } // warnUp c2 compile System.out.println("starting warnUp"); warnUp(dataList, dataWithCloneList); System.out.println("finish warnUp"); // real test System.out.println("starting test"); runTest(dataList, dataWithCloneList); System.out.println("finish test"); } public static void warnUp(List<Data> dataList, List<DataWithClone> dataWithCloneList) { for (int i = 0 ; i < 200 ; i++) { case1(dataList, true); case2(dataWithCloneList, true); case3(dataWithCloneList, true); } } public static void runTest(List<Data> dataList, List<DataWithClone> dataWithCloneList) { case1(dataList, false); case2(dataWithCloneList, false); case3(dataWithCloneList, false); } private static void case1(List<Data> dataList, boolean warmUp) { int loopTime = warmUp ? 1 : 1000000; long startTime = System.currentTimeMillis(); for (int i = 0; i < loopTime; i++) { for (int j = 0; j < dataList.size(); j++) { Data data = dataList.get(j); try { Object clone = cloneMethod1.invoke(data); } catch (Exception e) { throw new RuntimeException(e); } } } if (!warmUp) { System.out.println("case1:" + (System.currentTimeMillis() - startTime)); } } private static void case2(List<DataWithClone> dataList, boolean warmUp) { int loopTime = warmUp ? 1 : 1000000; long startTime = System.currentTimeMillis(); for (int i = 0; i < loopTime; i++) { for (int j = 0; j < dataList.size(); j++) { DataWithClone data = dataList.get(j); try { Object clone = cloneMethod2.invoke(data); } catch (Exception e) { throw new RuntimeException(e); } } } if (!warmUp) { System.out.println("case2:" + (System.currentTimeMillis() - startTime)); } } private static void case3(List<DataWithClone> dataList, boolean warmUp) { int loopTime = warmUp ? 1 : 1000000; long startTime = System.currentTimeMillis(); for (int i = 0; i < loopTime; i++) { for (int j = 0; j < dataList.size(); j++) { DataWithClone data = dataList.get(j); try { Object clone = cloneMethod3.invoke(data); } catch (Exception e) { throw new RuntimeException(e); } } } if (!warmUp) { System.out.println("case3:" + (System.currentTimeMillis() - startTime)); } } } -----------------------CODE END------------------------- New Result: case1:12421 case2:8309 case3:82 We can see,using clone method from Object.class the performance no change,but using clone method from DataWithClone.class the performance back to normal.
15-11-2024

I believe since JEP 416, we recommend users to fetch a method object and store it in a static final field to enable constant folding. Can you try move the fetching of clone method from the beginning of benchmark methods to the static initalizer, like `static final Method cloneMethod; static { try { cloneMethod = ...} catch {} }` and see if the issue persists?
04-11-2024

Moving to generic OS and CPU as this is reproduced on other platforms as well.
04-11-2024

JEP 416 integrated in jdk-18+b22.
04-11-2024

[~alanb] Yes Alan, thanks that is it. -Djdk.reflect.useDirectMethodHandle=false regained the perf in b22.
04-11-2024

[~ecaspole] JEP 416 re-implemented core reflection in JDK 18. Running with -Djdk.reflect.useDirectMethodHandle=false on JDK 18 will use the old implementation so maybe you could establish if this is the issue. The old implementation (and this system property) have since been removed.
04-11-2024

Additional Information from submitter: ================================= Attached is the test case raw source code from submitter This project include 3 file. --------------------------BEGIN SOURCE ----------------- SimpleTest.java: import java.lang.reflect.Method; import java.util.ArrayList; import java.util.List; public class SimpleTest { public static void main(String[] args) { // prepare data List<Data> dataList = new ArrayList<Data>(); for (int i = 0; i < 100; i++) { dataList.add(new Data()); } List<DataWithClone> dataWithCloneList = new ArrayList<DataWithClone>(); for (int i = 0; i < 100; i++) { dataWithCloneList.add(new DataWithClone()); } // warnUp c2 compile System.out.println("starting warnUp"); warnUp(dataList, dataWithCloneList); System.out.println("finish warnUp"); // real test System.out.println("starting test"); runTest(dataList, dataWithCloneList); System.out.println("finish test"); } public static void warnUp(List<Data> dataList, List<DataWithClone> dataWithCloneList) { for (int i = 0 ; i < 200 ; i++) { case1(dataList, true); case2(dataWithCloneList, true); case3(dataWithCloneList, true); } } public static void runTest(List<Data> dataList, List<DataWithClone> dataWithCloneList) { case1(dataList, false); case2(dataWithCloneList, false); case3(dataWithCloneList, false); } private static void case1(List<Data> dataList, boolean warmUp) { int loopTime = warmUp ? 1 : 1000000; Method cloneMethod = null; try { cloneMethod = Object.class.getDeclaredMethod("clone"); cloneMethod.setAccessible(true); } catch (NoSuchMethodException e) { throw new RuntimeException(e); } long startTime = System.currentTimeMillis(); for (int i = 0; i < loopTime; i++) { for (int j = 0; j < dataList.size(); j++) { Data data = dataList.get(j); try { Object clone = cloneMethod.invoke(data); } catch (Exception e) { throw new RuntimeException(e); } } } if (!warmUp) { System.out.println("case1:" + (System.currentTimeMillis() - startTime)); } } private static void case2(List<DataWithClone> dataList, boolean warmUp) { int loopTime = warmUp ? 1 : 1000000; Method cloneMethod = null; try { cloneMethod = Object.class.getDeclaredMethod("clone"); cloneMethod.setAccessible(true); } catch (NoSuchMethodException e) { throw new RuntimeException(e); } long startTime = System.currentTimeMillis(); for (int i = 0; i < loopTime; i++) { for (int j = 0; j < dataList.size(); j++) { DataWithClone data = dataList.get(j); try { Object clone = cloneMethod.invoke(data); } catch (Exception e) { throw new RuntimeException(e); } } } if (!warmUp) { System.out.println("case2:" + (System.currentTimeMillis() - startTime)); } } private static void case3(List<DataWithClone> dataList, boolean warmUp) { int loopTime = warmUp ? 1 : 1000000; Method cloneMethod = null; try { cloneMethod = DataWithClone.class.getDeclaredMethod("clone"); } catch (NoSuchMethodException e) { throw new RuntimeException(e); } long startTime = System.currentTimeMillis(); for (int i = 0; i < loopTime; i++) { for (int j = 0; j < dataList.size(); j++) { DataWithClone data = dataList.get(j); try { Object clone = cloneMethod.invoke(data); } catch (Exception e) { throw new RuntimeException(e); } } } if (!warmUp) { System.out.println("case3:" + (System.currentTimeMillis() - startTime)); } } } DataWithClone.java: import java.util.concurrent.ThreadLocalRandom; public class DataWithClone implements Cloneable { private int number1; private int number2; private long number3; private long number4; private float number5; private float number6; private double number7; private double number8; public DataWithClone() { number1 = ThreadLocalRandom.current().nextInt(); number2 = ThreadLocalRandom.current().nextInt(); number3 = ThreadLocalRandom.current().nextLong(); number4 = ThreadLocalRandom.current().nextLong(); number5 = ThreadLocalRandom.current().nextInt(); number6 = ThreadLocalRandom.current().nextInt(); number7 = ThreadLocalRandom.current().nextDouble(); number8 = ThreadLocalRandom.current().nextDouble(); } public int getNumber1() { return number1; } public void setNumber1(int number1) { this.number1 = number1; } public int getNumber2() { return number2; } public void setNumber2(int number2) { this.number2 = number2; } public long getNumber3() { return number3; } public void setNumber3(long number3) { this.number3 = number3; } public long getNumber4() { return number4; } public void setNumber4(long number4) { this.number4 = number4; } public float getNumber5() { return number5; } public void setNumber5(float number5) { this.number5 = number5; } public float getNumber6() { return number6; } public void setNumber6(float number6) { this.number6 = number6; } public double getNumber7() { return number7; } public void setNumber7(double number7) { this.number7 = number7; } public double getNumber8() { return number8; } public void setNumber8(double number8) { this.number8 = number8; } @Override public Object clone() { try { return super.clone(); } catch (CloneNotSupportedException e) { throw new RuntimeException(e); } } } Data.java: import java.util.concurrent.ThreadLocalRandom; public class Data implements Cloneable { private int number1; private int number2; private long number3; private long number4; private float number5; private float number6; private double number7; private double number8; public Data() { number1 = ThreadLocalRandom.current().nextInt(); number2 = ThreadLocalRandom.current().nextInt(); number3 = ThreadLocalRandom.current().nextLong(); number4 = ThreadLocalRandom.current().nextLong(); number5 = ThreadLocalRandom.current().nextInt(); number6 = ThreadLocalRandom.current().nextInt(); number7 = ThreadLocalRandom.current().nextDouble(); number8 = ThreadLocalRandom.current().nextDouble(); } public int getNumber1() { return number1; } public void setNumber1(int number1) { this.number1 = number1; } public int getNumber2() { return number2; } public void setNumber2(int number2) { this.number2 = number2; } public long getNumber3() { return number3; } public void setNumber3(long number3) { this.number3 = number3; } public long getNumber4() { return number4; } public void setNumber4(long number4) { this.number4 = number4; } public float getNumber5() { return number5; } public void setNumber5(float number5) { this.number5 = number5; } public float getNumber6() { return number6; } public void setNumber6(float number6) { this.number6 = number6; } public double getNumber7() { return number7; } public void setNumber7(double number7) { this.number7 = number7; } public double getNumber8() { return number8; } public void setNumber8(double number8) { this.number8 = number8; } } --------------------------END SOURCE ----------------- Introduction: This project used for compare the performance difference when reflection invoke clone method in Java 11 and Java 21. dependent: 1.oracle jdk 11 2.oracle jdk 21 Test Way: Copy raw source code to empty project into ide,run SimpleTest.clsss in Java 11 and Java21 separately,record and compare the console log. note:Need jvm args `--add-opens java.base/java.lang=ALL-UNNAMED` to run correctly. case describe: case1: By Reflecting Object.class's clone method to invoke Data.class(no have clone method implement) case2: By Reflecting Object.class's clone method to invoke DataWithClone.class(have clone method implement) case3: By Reflecting DataWithClone.class's clone method to invoke DataWithClone.class(have clone method implement) Reference result(I7 14700K): Java11: case1:94 case2:138 case3:191 Java21: case1:12286 case2:7437 case3:970
04-11-2024