JDK-8251318 : search of secondary_supers does not scale well
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 16
  • Priority: P4
  • Status: Closed
  • Resolution: Duplicate
  • Submitted: 2020-08-07
  • Updated: 2024-04-22
  • Resolved: 2024-04-22
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdResolved
Related Reports
Duplicate :  
Description
Searching Klass::secondary_supers is slower than it should be,
and this shows up as a performance problem with some specialized
workloads.

There are two sources of slowness:

1. We use the SCASQ instruction on x86 to scan the supers array.
We should put in a platform-specific option to use a plain loop
(MOVQ/CMPQ/JCCNE) instead, because SCASQ may not
be well-supported on newer platforms.

2. Rarely, a class might have a very long secondary_supers array,
*and* tests against it (either positive/unstable or negative) might
be in the application's set of hot operations.  In such applications,
the O(N) linear search of secondary_supers appears in an ugly way.

Suggested improvements (in order):

A. Supply an option to replace SCASQ, maybe turn it on by default
on newer systems.

B. Sort the items in the array by metadata address (they don't
move anymore, unless maybe when classes are redefined).
Re-sort when they might change (after CDS, for example).
Use an out-of-line binary search routine to traverse the array.
Use the inline code (from A) for small arrays, certainly of
size 0 or 1, to avoid useless branching.

C. Add a negative test cache, managed similarly to the
secondary_super_cache (see JDK-8180450) to avoid
searching the whole array on misses which are predictable.

This would help with long decision chains (reported with Scala)
which look like this:

   //Class<?> T0 = x.getClass();
   if (x instanceof T1)  doT1(x);  // usually not taken after O(N) search!
   if (x instanceof T2)  doT2(x);  // usually not taken after O(N) search!
   if (x instanceof T3)  doT3(x);  // usually not taken after O(N) search!
   if (x instanceof T4)  doT4(x);  // OK, maybe it's a T4
   ... // or maybe not, so more O(N) searches...

Here, it might be useful to store on T1 and T2 and T3
that there's a frequent query of them for T0, and note
that T0 is a "guaranteed no".  Such a cache should have
multiple entries (and thus spin up only if hot) to cover
cases where T0 varies frequently.

Or, it might be better to store on T0 that T1 and T2 and T3
are all "guaranteed no".  This requires enough cache entries
on T0 to cover the depth of the decision chain given above.
Since the length of the chain could be very long (like the
number of frequent T0 values in the previous paragraph),
the cache should size itself to the heat of the testing.

Or (third try, and maybe the best), there could be a global
negative cache indexed by pairs of T0/Ti.  It could be fixed
in size and leaky, in the sense that interfering queries would
evict each other (after some hysteresis to avoid memory
thrashing, as in JDK-8180450).
Comments
Overview of subtype check rewrite experiments: https://bugs.openjdk.org/secure/attachment/108173/Fast_Subtype_Checks_20240206.pdf Covers the code in ssc.cuckoo.2seed branch: https://github.com/iwanowww/jdk/tree/ssc.cuckoo.2seed
06-02-2024

Could this be approached on the compiler end? For example if the method contains many `x instanceof Xxxx` for the same value `x`, perhaps it could just perform one single null check and one single sweep over the supers for `x` and record the results for each check, looking for all of them at once, in a boolean vector of some sort. Then the actual `instanceof` would reduce to checking the corresponding slot in the boolean vector. While this doesn't solve the complexity issue in a mathematical sense, it could improve matters at the least with better cache locality.
02-12-2022

Background information: http://cr.openjdk.java.net/~jrose/jvm/checktype-2001.txt https://bugs.openjdk.java.net/browse/JDK-8180450 https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-August/041056.html https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-August/041051.html
07-08-2020