JDK-8301007 : [lworld] Handle mismatches of the preload attribute in the calling convention
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: repo-valhalla
  • Priority: P2
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2023-01-24
  • Updated: 2025-10-08
  • Resolved: 2023-04-19
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
repo-valhallaFixed
Related Reports
Causes :  
Causes :  
Relates :  
Relates :  
Relates :  
Description
C2 passes inline type arguments in scalarized form if the calling convention of the resolved method supports this. That means that the calling conventions of the overriding methods need to support scalarization as well. The preload attribute added with JDK-8281116 does not guarantee any consistency between the overridden and overriding method. We need to fix the calling convention accordingly.

Below is a summary I wrote a while ago.

-------------------------------------------------------------------------------------------

Not following the "global cache" version of option (3) for now, I was thinking about other ways to
make it work.

In general, we would always stick to the calling convention of the parent method. Our "multiple
interface example" would be handled like this:

interface I1 {
  void m(L*MyValue); // Scalarized
}

interface I2 {
  void m(LMyValue); // Non-scalarized
}

class C implements I1, I2 {
  void m(LMyValue) { }
}

Once C::m is linked, we detect the mismatch and force C::m to always use the non-scalarized calling
convention. Now if the method does not have any other scalarized arguments, that would mean that the
scalarized entry point is unused and just points to the non-scalarized entry point. But in this case
we could always create it and add code that triggers deoptimization and re-execution in the caller.

That's similar to option (2) in that we resolve the mismatch with deoptimization but it is easier to
implement and deoptimization would only be triggered in edge cases (otherwise we would just use the
non-scalarized calling convention right from the beginning). Also, I found several flaws with option
(2) and resolving them would add lots of additional complexity.

Potentially, we could also implement it the other way around: Always use the scalarized calling
convention for C::m but there are some technical challenges to make that work.


-------------------------------------------------------------------------------------------


I had a Zoom session with John this morning. I'm trying to summarize some of what we discussed.

----------------
Argument passing
----------------

- The (optimized) calling convention is fixed at method link time (when the holder klass is loaded).
- The interpreter and C1 always pass arguments in non-scalarized form.
- C2 passes inline type arguments in scalarized form if the calling convention of the resolved
method supports this. That means that the calling conventions of the overriding methods need to
support scalarization as well. Otherwise, we have a mismatch that can only be detected at runtime.
With Q, such a mismatch is not possible but with L* it is.
- Translations between non-scalarized and scalarized arguments are performed in the method entry
points for compiled-to-compiled calls and in the adapter for compiled-to-interpreted or
interpreted-to-compiled calls. It's not feasible to emit multiple entry points and adapters to
handle all the possible mismatches.

Options for handling mismatches:
(1) Reject classes that introduce mismatches during class loading.
(2) Detect mismatches at runtime during resolution of a C2 compiled call when the calling convention
used by the resolved method differs from the calling convention expected by the selected method. To
resolve the mismatch, deoptimize the C2 compiled caller and re-execute the call in the interpreter.
Potentially re-compile the caller without using the scalarized calling convention.
(3) Always use the calling convention of the overridden method and ignore the stars attached to the
arguments of the overriding method.

As I understand, (1) is not an option because we need to support migration.

For (2), we would need to implement deoptimization from call resolution and re-execution of a call
in the interpreter. Without prototyping, it's hard to tell how complicated that would be. This also
has the side effect that once a mismatch is detected for a (virtual) call, we would fall back to
using the non-scalarized calling convention for that call (i.e. for all arguments / callees).

Option (3) would make mismatches impossible. But there's one scenario that we can't handle, namely a
class implementing two interfaces both defining the same method with mismatching star settings:

interface I1 {
  void m(L*MyValue); // Scalarized
}

interface I2 {
  void m(LMyValue); // Non-scalarized
}

class C implements I1, I2 {
  void m(LMyValue) { }
}

No matter what calling convention we choose for C::m, a mismatch is possible through I1 (scalarized)
or I2 (non-scalarized) interface calls.

One variant of option (3) that John and I discussed, would be a dictionary that keeps track of the
calling convention used for each class.

- If I1 is loaded first, 'MyValue' would be loaded early due to the * and the dictionary would keep
track of 'MyValue' being scalarized. When I2::m and C::m are then loaded, we would consult the
dictionary and find that the scalarized calling convention should be used.
- If I2 is loaded first, 'MyValue' would not be loaded early and the dictionary would keep track of
'MyValue' being non-scalarized. When I1::m and C::m are then loaded, we would consult the dictionary
and find that the non-scalarized calling convention should be used.

I'm not sure how feasible that is though. Open questions are:
- Where would we keep track of this?
- Is the footprint overhead acceptable?
- What about concurrent loading of classes?

Also, this option has the side effect that the star setting is often ignored and the class loading
order has an impact on the calling convention of methods of completely unrelated classes.


-----------------
Return processing
-----------------

To avoid costly runtime checks in C2 compiled code to handle scalarized and non-scalarized returns,
the interpreter, C1 and C2 are *always* returning Q's in a scalarized form. Because the interpreter
and C1 do not support scalarization, an additional "field" is returned that either contains a
pointer to a heap buffer or a specially encoded Klass*. The interpreter and C1 can then check that
field and either use the pointer to the buffer or re-buffer.

Now current code partially relies on the fact that a Q return is always scalarized and an L return
is never scalarized. With the L* proposal, that would no longer be guaranteed because there can be a
mismatch between the resolved method having an L (non-scalarized) return and the selected method
having a L* (scalarized) return.

Our options are similar to the argument passing ones. For (2) we would detect the mismatch by adding
additional runtime checks to the interpreter and C1 (as I described in my earlier emails). I think
the overhead would be negligible because we already need these checks and would just apply them in
some additional cases. For (3) we would always choose the return convention the dictionary tells us
to use and mismatches would be impossible.
Comments
A pull request was submitted for review. Branch: lworld URL: https://git.openjdk.org/valhalla/pull/834 Date: 2023-04-18 11:12:13 +0000
08-10-2025

In the given example, C <: I1, I2. And I1 disagrees with I2 in Q-ness (in Q-folding terms of JDK-8303095, or in star-ness in L* vs. L in Preload terms). In the worst case, I1 and I2 have been loaded for many milliseconds and both have thriving separate implementation hierarchies. Then, suddenly, C is loaded and C requires those hierarchies to overlap. Then there is a consistency problem; one of I1 or I2 is wrong, where both were right a millisecond ago. The proposal for Q-folding in JDK-8303095 may allow option (1), ICCE, in more cases, since an explicit Q-marked descriptor is a stronger grounds for throwing ICCE than associated data in a Preload attribute or a TR (type restriction). We can expect that C.class is a witness to consistent I1.class and I2.class files. The only way for the error to happen is for I1.class or I2.class to be recompiled after C.class. If preloads are used, then C.class is no longer a full witness to I1 and I2 agreeing, because the *-ness comes indirectly from the Preload resolution. But with Q-folding, the witnessing of C.class is directly encoded in C.class. The option (3) seems best, however, if we can limit the dictionary complexity and footprint. Option (3), either as a backup to (1) in some cases, or stand-alone, would allow Preload (or similar tactics) to flatten even pure L-types. Here are some thoughts in the dictionary direction: The dictionary can be named `SignatureUnfolding`. (HotSpot uses the non-standard term “signature” for what the JVMS call “method descriptors”.) `SignatureUnfolding` records L-flavored descriptors and how they are allowed to be related to Q-unfolded Q-flavored descriptors. (Or alternatively, individual field types, single L-types and a bit which says whether they are allowed to unfold into Q-types.) The items in `SignatureUnfolding` can be designed as either types or whole signatures (folded to all L-types) or even as name/signature pairs. Storing more information may or may not increase dictionary size; more information makes conflicts less likely. Each item is optionally accompanied by an unfolding (full or partial) of L-types to Q-types, or alternatively by an indication that there is no unfolding allowed. Once an item is entered into the dictionary it is never changed. (Perhaps class unloading could eventually remove it if the symbols go completely unused. Perhaps that is not worth the effort.) Items are added to the dictionary early enough so that they implement a global, permanent decision on whether any method (anywhere, any time in the future execution of this JVM) can unfold its calling sequence to scalarize some of its arguments or returns as Q-types. Note that a method descriptor may contain multiple independently foldable Q-types along with a mix of non-folded L-types. This whole exercise assumes that any single method descriptor (or name+descriptor pair, alternatively) is subject to a global decision as to exactly which L-types may be unfolded to Q-types, even for unrelated API hierarchies that happen to mention the same method names and/or types. The footprint of the `SignatureUnfolding` table is one or two words per entry (two or three for pairs). (There may also be overhead for a hash table, a skip-list, etc.) The entries can be Symbol pointers (or pairs of them), since we are just talking about CONSTANT_Utf8 items here (or pairs). (Idea of the day: Perhaps we should consider compressing our `Symbol` pointers in HotSpot, along the lines of the way we compress oops. We never need more than 4Gs worth of them, surely, even if we “waste” index space by continuing to lay out the chars alongside the headers. And they are ubiquitous.) The dictionary would hold folded (L-only) types, signatures, or name/signature pairs, plus a global decision about whether there is an unfolded Q-flavored signature (or type) that matches the unfolded signature (or type). Those items would be added to the dictionary *for each interface method*. (This is potentially a lot. Nearly all of them will have no Q-flavored component.) The number of entries in `SignatureUnfolding` could be reduced by having a secondary dictionary `ByReferenceOnly` that lists just classes (perhaps class names only) which are confirmed to never fold, like L-Object, L-String, L-List. Adding a class (or class name) to `ByReferenceOnly` is a permanent decision. Loading a value class of the exact same name would cause the JVM not only to fold all Q-types of that class to L-types, but also to avoid scalarizing those values in interface method APIs. Thus, no entries would be added to `SignatureUnfolding` which mention any of those classes. If an interface method signature is considered for entry in the `SignatureUnfolding` dictionary, its types are first evaluated to see if they might possibly (in the future) support Q-folding; if not, they are put in smaller dictionary. If all types in the signature fail to support Q-folding, then there is no need to make an entry in the larger dictionary. This process of evaluating individual L-types seems tricky and complex. But it might naturally piggy-back on top of the class loader constraint checks that we do during class linking. The idea would be to store the decisions of `ByReferenceOnly` in the class loader constraint dictionary (as an extra bit somewhere), instead of in a physically separate table. (Or not: Just loading an interface does not record CLC records; I think you have to load two classes in two CLs to trigger CLC activity.) In the example above, in Q-folding terms, the main dictionary might hold the name/signature pair `m`, `(LMyValue;)V` (or just the signature, or just `LMyValue;`). If I1 is loaded first, since I1 uses the Q-flavored descriptor, the dictionary will contain this entry: `m`, `(LMyValue;)V`, `(QMyValue;)V` Note that the Q-type (or a corresponding Preload entry or TR) would have caused loading of MyValue.class, ensuring that the Q was well-formed. This is the good case. But if I2 is loaded first, which is the less-good case, the dictionary will contain this entry: `m`, `(LMyValue;)V`, null (I hope we compress this common case to 2 words) In that case, when I1 is loaded later, the JVM will refuse to scalarize the argument. Alternatively, if I2 is loaded first, *and* the JVM decides that `MyValue` will never be Q-flavored, the side dictionary `ByReferenceOnly` can be loaded with `MyValue`, and the `SignatureUnfolding` dictionary will not need an entry. When loading C.class, I1.class and I2.class are fully loaded (as C’s supers) and so all the decisions are made already. When linking C and setting up v-tables, C will respect the standing decisions in `SignatureUnfolding`. Alternatively, we could give methods up to three distinct entry points. Every method would have a fully folded entry point with all L-types. Methods defined in classes with Q-types might scalarized entry points for those types, based only on super class methods (single inheritance). Methods defined in interfaces with Q-types would use the `SignatureUnfolding` to make consistent scalarization decisions. Methods which override/implement both kinds of methods (as in the example of C above) would have all three entry points.
24-02-2023