Blocks :
|
|
Relates :
|
|
Relates :
|
With the recursive lightweight locking change in JDK-8319796, there is still a 3-6% regression in SPECjbb2005 on AMD (*not* on Intel) with LM_LIGHTWEIGHT. This needs to be resolved before returning the default to LM_LIGHTWEIGHT. Charile Hunt: In short, Erik is right that there is a higher regression with LW locking as the hyper-thread of a core becomes active. And, there is high variability in frontend stalls along with high variability in branches, and the lack of a huge difference is branch-misses (less than 5%) may imply that the differences in implementation between Legacy and LW locking on SPECjbb2005 may introduce to a different sequence, or priority of if conditions being executed, and depending on the sequence or condition being checked, some branch(es) fall out into a shorter instruction path Roman: I did some more experiments: I added a very simple recursive implementation on top of PR16606, I only added a few instructions to check the entries on the lock-stack. This caused a regression of ~10%. So I took this a little further and added the same (or similar) instructions on top of mainline, where they really do nothing, because of lack of recursive support. I also narrowed it down by selectively adding/removing some of those instructions, and it looks very much like it is the loading of lock-stack stuff in LW-unlock path that is causing most of the AMD regressions. It is not the branches (I removed them) and not the stuff in the locking path (I removed that, too). My theory is that loading the lock-stack pre-empts something else from the L1 cache, which causes stalls somewhere else - this is also what the profiles look like. I don’t know a way around that problem, though. Stefan: I remember seeing similar odd things. Just adding an unnecessary loads in the unlock path caused disproportional regressions Charlie: Nice experiment by Roman. His observation, along with what you recalled in your observations could offer an explanation for high variability I noticed in both frontend stalls and backend stalls. I am assuming that we wouldn't always be loading the "lock-stack stuff" on every recursive lock execution in SPECjbb2005. Are there some cases where we wouldn't be doing the "lock-stack stuff" and others where we would?
|