Small performace regression in ConcurrentHashMap on c1 since the integration of CR 7036559: "ConcurrentHashMap footprint and contention improvements".
Paraphrasing Doug's comments:
On ARM/C1, the cost of using Unsafe and the cost of a read fence seem to
both be in the 10-30ns range. Using them together combines cost. I
suspect that the Unsafe cost is actually mainly due to c1 not inlining
the little methods that encapsulate the accesses.
The solution is to manually expand out these for get(). This seems to
balance things out so that the change is performance-neutral or maybe
even a bit faster that the old-CHM, even on ARM. It is not the
first time we've done this for such reasons in collections.
The new-CHM is also 20% faster than old-CHM for #threads==1 on c1 on sparc,
so it looks like it is worth doing (mainly for get and containsKey; others
entail calls anyway). It is no faster on c2 on any machine I have, so is not
likely to make much of a difference in practice (since most systems using
CHMs much will be using c2 anyway) except in helping speed up class loading
etc a bit.