The current InvocationCounter-related code is complex and is getting even more complex. The need for enhancing profiling code roots in John Rose's comment on a related issue, JDK-8059606. In the following description of the enhancement I'll (re-)use (some of) John's text.
This enhancement targets the following two simplifications.
1) Currently, there are two different profiling mechanisms in the template-based interpreter. One profiling mechanism is specific to the tiered mode of operations of the VM, the other mechanism is specific to the non-tiered mode of operation.
In the tiered mode of operation, the interpreter (assembly code) periodically notifies the runtime; the runtime code then checks if a method should be compiled or not. Notifications can happen at:
- method entry points (in the InterpreterGenerator::generate_counter_incr() method) and at
- loop backedges (in TemplateTable::branch() method)
In the non-tiered mode of operation the decision to compile the method happens implicitly in the interpreter: Once an invocation (or backedge) counter overflows, the interpreter passes control to the runtime. The runtime then triggers compilation of the method. As a result, the code in the interpreter handling the non-tiered mode of operation is overly complicated as it contains two paths performing essentially the same thing:
if (TieredCompilation) {
// profiling specific to tiered mode of operation
} else {
// profiling specific to non-tiered mode of operation
}
The first task of this enhancement is to unify the interpreter code specific to the tiered- and non-tiered modes of operation. At the end, there should be no code specific of mode of operation, that is, the large if-then-else block in the above-mentioned methods should disappear. Moreover, the interpreter should use notification-based code similar to the code currently used in the tiered mode of operation.
2) The second task is related to reducing the memory footprint of profiling code shrinking the Method layout to handle all counter bookkeeping with a single word, instead of the current two. Also low-count, non-looping methods do not need a full MethodCounter struct, just a simple inline count with a low-tag bit, with CAS-based state changes. This would have the effect of delaying MC and MD allocation until a method has been used non-trivial amount, which would reduce footprint if "non-trivial amount" turns out to be large.
counters = union {
uintptr_t simple_count; // c = (invocation_count << 16 | notify_mask << 8 | other_flags_we_might_like << 1 | 1)
uintptr_t method_counters; // c = ((intptr_t)method_counters_addr | 0)
uintptr_t method_data; // c = ((intptr_t)method_data_addr | 0)
}
(Code assumes that method_data and method_counters can be distinguished suitably by their contents.)
In my (zmajo) opinion, Simplification (1) is more urgent than (2). So it the it's not possible to handle the two simplifications at once (within the same issue), then I would recommend splitting off Simplification (2) to a different enhancement and taking care of it later.