JDK-8221828 : New Invoke Bindings
  • Type: JEP
  • Component: hotspot
  • Sub-Component: compiler
  • Priority: P3
  • Status: Draft
  • Resolution: Unresolved
  • Submitted: 2019-04-02
  • Updated: 2022-06-13
Related Reports
Blocks :  
Duplicate :  
Relates :  
Relates :  
Description
Summary
-------

Replace the machine code bindings for invoke bytecodes to not use type speculating inline caches and/or racy machine code patching.

Goals
-----

The primary goal of this JEP is to improve long-term maintainability of the code. The secondary goal is to make megamorphic calls fast.

Non-Goals
---------

It is not a goal to improve the overall performance of Java applications with this new dispatch mechanism. For some applications with hot megamorphic calls, that might be a bonus, but it is not an explicit goal.

Success Metrics
---------------

Today's dynamically monomorphic calls should not regress noticeably, and the new mechanism has to remove more code complexity than it introduces.

Motivation
----------

Our baseline megamorphic calls are inefficient. To hide the overheads of megamorphic calls, type speculation machinery called inline caches hope for dynamically monomorphic callsites. When we are lucky and type speculation is successful, a more efficient monomorphic call can be made. But dynamically megamorphic calls are still slow, causing unpredictable performance characteristics, as the difference between monomorphic and megamorphic calls is significantly different. This type speculation logic is also very complicated and has caused many bugs over the years. The complicated life cycle of Just-In-Time (JIT) compiled machine code is a consequence of inline caches. It is also a significant complication for class unloading to deal with the various races that inline caches bring.

Description
-----------

To replace type speculating inline caches, the baseline megamorphic calls are to be optimized to be nearly as fast as speculatively monomorphic calls. That way, the speculation machinery (~10000 lines of racy low level code, involving racy machine code patching) can be removed.

The proposal for invokevirtual calls is to flatten the vtable to have direct code pointers at a negative offset from the class pointer. This allows dispatching with a single indirect branch (compared to two direct branches for dynamically monomorphic calls today).

The proposal for invokeinterface calls is to give each interface method a unique number called "selector", and create a cuckoo-style hash table mapping selectors to code pointers for the concrete implementations of the interface method. The table is embedded straight into the native class. This allows dispatching the majority of the time with one indirect branch and one direct branch (compared to two direct branches for dynamically monomorphic calls today).

As for direct calls, today they emit a call straight to the target compiled code, when the target method is compiled, but go through a stub that fills in a method register when going into interpreted mode. This elides the cost of setting a register when calling compiled code, but causes many races and introduces a lot of code complexity, involving racy machine code patching. The cost of setting this register unconditionally and remove the stub code is considered minimal, compared to spilling and stack banging that we usually perform when calling a method.

One tricky thing this JEP has to deal with is invokeinterface bytecodes. In particular, today the JVM can not be certain that the receiver implements the given interface, and must instead solve this with a dynamic type check, that can throw IncompatibleClassChangeError when the receiver does not implement the interface. In particular, the JVMS states:

"For assignments, interfaces are treated like Object." - JVMS14 4.10.1.2.

This relaxation of the verifier type system means that we don't know anything about variables with interface types. They could be anything. The impact on invokeinterface is:

"Otherwise, if the class of objectref does not implement the resolved interface, invokeinterface throws an IncompatibleClassChangeError." - JVMS 14 6.5

While bytecodes violating interface type safety are not generated by javac or any other reasonable compiler, it is possible for hand written bytecodes to violate interface type integrity..

This is problematic for multiple reasons.

Problem 1: Performance.

The current design relies on the monomorphic inline caches implicitly also performing this type check. The megamorphic implementation is slow. Therefore, since the idea with this JEP is to optimize megamorphic calls, such that monomorphic calls through the megamorphic code path, is on par with a speculative inline cache, special care needs to be taken around what to do with this dynamic type check. The naive solution of performing the REFC type check for every invokeinterface results in inappropriate performance characteristics; there are noticeable regressions.

Problem 2: Complexity.

An invokeinterface bytecode can results in an itable dispatch, a vtable dispatch (for default methods) or direct calls (when provable through CHA, etc). Therefore, all these types of calls need to perform the REFC check if the raw bytecode is invokeinterface. This causes some unfortunate complexities.

Problem 3: Type integrity.

Allowing bytecodes to pass around non-Foo instances into Foo typed variables, is quite nasty. Other than causing bugs in the JVM because we cannot trust our own type system regarding interfaces, and explicitly opt-out from optimizations because we do not trust the type system, it is also problematic for users of Java. Since Java programmers might not have read the JVMS, they might not be aware that an invokeinterface bytecode might throw ICCE. From a Java language perspective, this seems like an impossible event. But from the JVM perspective, it is fully possible. This mismatch of expectations around type integrity, can hide security bugs in Java code, where a malicious person can alter the control flow in ways that most Java programmers have no idea is possible, and arguably should not be possible.

The current solution under this JEP has been to avoid the REFC check at any cost. Initially, this has taken the form of having the verifier have two modes: sane and insane. In the sane mode, the verifier checks that interface type safety is ensured. As long as the verifier is in the sane mode (valid for pretty much all reasonable programs, except corner case JVMS test cases), the REFC checks are elided from all JIT-compiled invokeinterface bytecodes. In the unlikely event that a class is linked with insane bytecodes that violate basic type safety, a deoptimization event is triggered. This is a big hammer event that deoptimizes all JIT-compiled methods in the process, and enters insane mode. This will cause new compilations to re-instate the dynamic REFC check, so that we can follow the JVMS.

However, the JVMS was specified this way partially so that we can avoid loading interfaces during verification; an optimization. But loading interfaces that is precisely what I do, in order to elide the REFC check from invokeinterface; also an optimization, yet a much more important one. In other words, the JVMS has looser type checking so we can optimize. But if we really want to optimize what matters, we can not utilize that verifier optimization. The result is supporting two modes: a sane mode and an additional insane mode, used only to comply with the JVMS. When you look at it this way, there is little reason left for the JVMS to enforce the weaker type system on the verifier, and disallow us from doing the intuitive thing. Therefore, I propose to change the JVMS to at the very least allow a JVM to implement proper type checks for interfaces in the verifier. This has the following benefits:

Benefit 1: Performance

Now invokeinterface never needs to perform the REFC check, because the verifier has already proven that insane bytecodes do not exist. This allows the megamorphic calls to be very fast with my proposed itable dispatch mechanism.

Benefit 2: Maintenance

No need for compilers to explicitly distrust their own type system and opt out from various optimizations, or run into bugs due to the type system being broken. Code such as DirectMethodHandles (in Java code) can also remove such checks.

Benefit 3: Type integrity

User code can suddenly trust the Java code they write to do something closer to what they think it will do. The more strictly enforced type system used by the verifier, leaves no opportunities for malicious programmers to exploit insane control flows that should not be possible from a Java programmer's perspective.

The major risk with this proposal to change the JVMS is backward compatibility. While unlikely, it is theoretically possible for some user to have custom written bytecodes (not from javac), that intentionally does things that violate basic type safety, and even intentionally throws ICCE (resulting in deoptimizations). I consider this risk small, compared to the possibly larger risk of having more realistic Java code be susceptible to strange runtime conditions that should never happen.

Conclusively, while it is not needed to change the JVMS, and instead maintain both a sane and insane mode of the verifier for JVMS compatibility, I believe it is the right thing to do, to change the JVMS to allow only the sane mode to be used, and completely opting out from the insane mode. It seems more likely that such bytecodes is a serious bug in the program and should be fixed, than it being intentionally used in that way.

Alternatives
------------

An alternative invokeinterface dispatch mechanism performs graph colouring of interface methods to optimize the call further. Currently not considering that, because the primary goal of this JEP is to simplify the code and make it more maintainable.

Testing
-------

The usual jtreg tests must pass, especially the ones concerning invoke bytecodes. Fortunately, there is already a lot of testing for invoke bytecodes, that can be reused in this effort to change its machine code bindings.

Risks and Assumptions
---------------------

An assumption with this work is that dynamically monomorphic call sites using a well-predicted indirect branch should be as performant as a direct branch, due to branch target buffering in hardware. In other words, the software type speculation trick is already done in the hardware as well nowadays. But it might regress on machines without well performant branch prediction. The assumption is that such hardware is to a large extent out of fashion today.
Comments
[~jvernee] this JEP doesn't really rely on updating the JVMS. In the current implementation state, the verifier is updated to initially speculatively verify that interfaces are sane. If they are not, we deopt, and start doing less efficient stuff, and start accepting crappy bytecodes in the verifier. Almost no programs violate the speculative constraint in practice, and unless your app is doing something extremely weird with buggy hand written bytecodes violating basic type safety, then I aggressively optimize, removing all runtime type checks. Having said that, this JEP seems to add yet one more reason why it would be less painful if we could just trust our own type system always, and not have to deal when it can't be trusted. There are more reasons for that.
13-06-2022

Reading this after the CMC writeup [1]. Most of the description section is about the need/motivation for changing the JVMS (starting from the 5th paragraph). I suggest maybe making a separate JEP for the JVMS change, and focus this JEP on describing the changes to invoke bindings (and maybe link to this JEP from the JVMS JEP). I think the JVMS change in itself could be a maintainability gain, because it allows the VM code to reason within a saner type system. I also think that it would help if the description section contained a description of the status quo, and then compared that with the proposed solutions. The paragraph on direct calls already does this nicely I think. (As someone who is not that familiar with the current state of the implementation, it's hard to say, after reading this, how the new implementation would be different/improve things). [1]: http://cr.openjdk.java.net/~jrose/jvm/hotspot-cmc.html
13-06-2022

[~hseigel] That is an excellent question. In my current prototype, I activate "insane mode" whenever a class uses the old verifier. But as discussed, I would prefer to not have such a mode at all. Therefore, I definitely think it is desirable to fix this in both verifiers, if we agree about changing the JVMS to expect interface type checking in bytecodes. The same reasoning and motivation seemingly applies to them both. I don't think we want a situation where we ensure interface type safety... except when we don't because reasons. Having said that, I have not attempted to do this in the old verifier yet, and don't have a complete picture regarding what that would look like.
04-08-2020

Would the changes affect all class file versions, requiring changes to both verifiers?
04-08-2020

[~vlivanov] Thank you for pointing that out. That code indeed also needs to check interfaces, just like other class types, with my proposal.
04-08-2020

> While bytecodes violating interface type safety are not generated by javac or any other reasonable compiler, it is possible for hand written bytecodes to violate interface type integrity.. Would like to point out that it is exposed through public API as well: java.lang.invoke.MethodHandles.explicitCastArguments * <li>If <em>T0</em> and <em>T1</em> are references, and <em>T1</em> is an interface type, * then the value of type <em>T0</em> is passed as a <em>T1</em> without a cast. * (This treatment of interfaces follows the usage of the bytecode verifier.) public static MethodHandle explicitCastArguments(MethodHandle target, MethodType newType) {
04-08-2020

I agree that it would be really nice to change the JVMS to allow the verifier reject classes violating interface types!
03-08-2020

I agree with this proposal but it should address any concerns and reaching an agreement before moving further.
07-10-2019

[~gdub] Regarding class unloading, we are shifting over to do concurrent class unloading, because we strongly dislike doing it in safepoints. We have it already in ZGC since JDK12, which was prioritized as it cares about latencies. Big part of the motivation for this work, is to simplify that code, because it is quite tricky due to inline cache and complex nmethod life cycle interactions. But I am working on making it more available to other collectors as well. The nmethod sweeping is running concurrently, and will not pose a latency problem. However, with the new simplified model where callsites don't have stale pointers lying around, I'm planning on removing a few states from nmethods, and sweep them faster than ever. In particular, we won't need the unloaded or zombie states any longer, and can instead nuke them immediately after stack scanning (which runs in thread-local handshakes). We first unlink them, handshake, and then purge them, that is it. As for deoptimization, it shouldn't happen so frequently... if it does, then perhaps the particular optimization that rides on the deopt should reconsider. It already has to do fun things like jumping in to the interpreter where performance is blown out of the window, so again, I'm not too worried about the throughput of deoptimizations. If it really turns out to be a problem though, clearing can be done lazily by the sweeper if we have to, but I'd resist that unless it is empirically turns out to matter. At least that is my current standpoint. I will gather some measurements eventually to try to nail down how big deopt slowdowns really matter due to poking around at a few more tables, compared to the interpreter transitions it suffers today.
04-04-2019

OK understood. Thank you for the clarifications. Regarding the point at which we pay the cost: i agree with you that at code installation time, this is probably an acceptable cost compared to the cost of compiling a method. However can't this also be called in more time-sensitive places such as safepoints (in particular clear_code: nmethod sweeping, deoptimization...)? There, the worst-case scenario of having to change one of the methods of j.l.Object has to be taken into account.
04-04-2019

[~gdub] There is a trade-off indeed, between making either JIT-compiled calls, or code installation of their JIT-compiled code faster. By flattening the structures, every call gets faster, but changing the installed JIT-compiled code takes the cost of jumping over a few extra hoops to install the code instead. Note though, that apart from default methods (will deal with them separately), I only need to walk the method holder and its subclasses up until the next override, to update itables and vtables. I have tried a few applications so far, and the average number of subclasses to traverse per set_code and clear_code invocation has ranged in a number of applications from 3 - 6 in what I have had time to measure so far, with one exception. Compared to the other crazy stuff compilers have to do when JIT-compiling an nmethod, I don't think the extra overhead of jumping through a handful extra tables to stick in some code pointers is going to ruin anyone's day. It seems more important to me to optimize the cost of the compiled calls themselves, which could be invoked millions of times. For the class of applications that have crazy number of classes, my gut feeling is that they will also benefit from the faster megamorphic calls. In fact I have seen one outlier: jython, which seems to create crazy class hierarchy and on average has to walk 25 classes on set_code/clear_code. However, even in such extreme cases, jumping through 25 classes to insert code pointers still seems insignificant compared to the overall time of compilation that causes such code insertion. And indeed, in this particular benchmark, the existing mechanism creates a large number of megamorphic call sites, to the extent that it frequently runs out of ICStub buffer area, and provokes a number of global safepoints in the whole system to reclaim ICStubs, so that it can continue creating more and more poorly optimized megamorphic callsites. This also has latency implications, when forcing ICStub reclamation safepoints potentially close to GC safepoints, causing long global hiccups. Nevertheless, as I continue investigating, I will make more measurements to see if my gut felling is right about this tradeoff.
04-04-2019

The existing vtables and itables have the indirection of pointing to Method objects which means the _from_compiled_entry can be changed in one place. Is there some other indirection i'm not seeing in the new vtable containing direct code pointers or the proposed data structure for invokeinterface? Otherwise it looks like Method::clear_code and Method::set_code will have to patch pointer in many places. vtables should be rather simple but patching all of them is still O(number of classes loaded). For interfaces and default methods it's a bit more complicated and would have at least the same big-O complexity. Are you rather envisioning some patching of those data-structures when a call lands into a non-entrant nmethod (similar to what happens to ICs right now)? If so i guess this limits the simplifications you can make to the nmethod life-cycle since you will have to scan all vtables & interface mappings when freeing a nmethod.
03-04-2019