The current nmethod entry barrier is good, but it could be a bit better. In particular, this enhancement targets the following ideas.
1. The alignment of the cmp instruction is 8 bytes. However, we only patch 4 bytes and the instruction length is always 8 bytes. So if we align the start of the instruction to 4 bytes only, that is enough to ensure that the immediate part of the instruction is 4 byte aligned, which is all we need (cf. http://cr.openjdk.java.net/~jrose/jvm/hotspot-cmc.html).
2. Today the fast path (conditionally) jumps over a call to a stub. It is not uncommon for the branch not taken path being better optimized, making it favourable to move the call to a stub out-of-line. This has the additional benefit of not polluting the instruction caches at the nmethod entry with instructions not used in the fast path. A bit messy but we can do it for C2.
3. For C1 and native wrappers, I don't think they are hot enough to warrant the stub machinery. But at least the jump that jumps over the cold stuff, can be shortened. I can get behind that.
Before addressing this, turning nmethod entry barriers on with G1 (e.g. by enabling loom) leads to a regression in DaCapo tradesoap-large. With this enhancement, the regression goes away, so that the cost of nmethod entry barriers is not visible.