While following up where the rest of the performance hits lies in C1 for String.charAt after C1 intrinsics arrived, I realized that we blow the static inlining thresholds for C1 when trying to inline String.charAt() -- the method is 73 bytes, while the threshold is 35 bytes.
Unfortunately, massaging the code so it is works fine with both C1 and C2 yields a somewhat odd (all right, borderline horrible) peeling of charAt, but with tremendous boosts on C1:
Alas, moving the range checks into String(Latin1|UTF16) would not really help, since we would need to peel out OOBE thrower there anyway. It will also require materializing the checks in intrinsics.
Or, we might shrug it off, and let users know that -XX:MaxInlineSize=74 or more is needed to recover from the performance regression. C2 avoids this by inlining charAt as hot method, since -XX:FreqInlineSize is 325 bytes by default.