JDK-8238766 : Perf regression on promo benchmark with 15-b9
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 15
  • Priority: P3
  • Status: Closed
  • Resolution: Won't Fix
  • CPU: x86_64
  • Submitted: 2020-02-10
  • Updated: 2020-04-13
  • Resolved: 2020-04-13
Related Reports
Relates :  
Description
AFAICT tracing back through the intermediate builds this is due to something in build 262 including

2020-02-05 17:33
erikj: 023df1 - OpenJDK
8238225  Issues reported after replacing symlink at Contents/MacOS/libjli.dylib with binary
2020-02-05 16:40
dcubed: b35341 - OpenJDK
8235795  replace monitor list mux{Acquire,Release}(&gListLock) with spin locks
2020-02-05 16:39
dcubed: 00470b - OpenJDK
8236035  refactor ObjectMonitor::set_owner() and _owner field setting
2020-02-05 16:38
dcubed: 175867 - OpenJDK
8235931  add OM_CACHE_LINE_SIZE and use smaller size on SPARCv9 and X64
Build 

This is seen on Linux and Windows in our OCI perf machines.
Comments
[~ecaspole] - please read thru my notes and let me know if the analysis is clear (or not).
10-04-2020

- Eric C. created a JMH'ed version of DaCapo-h2 that I was able to get running on my Linux-X64 server in my lab (Thanks Eric!): 8238766_base: (size) Mode Cnt Score Error Units large ss 30 50180.954 �� 2852.083 ms/op 8238766_8235795_merge: (size) Mode Cnt Score Error Units large ss 30 43219.959 �� 2701.425 ms/op - That looks like about 13% performance improvement since lower is better. It's entirely possible that this performance regression is purely Skylake related. - I had meant to add SpinPause() calls in the monitor list functions that can loop so I went ahead and did that experiment via a temporary option: 8238766_8235795_merge w/ temporary -XX:+UseOMSpinPauseInListLoops: (size) Mode Cnt Score Error Units large ss 30 50850.122 �� 1929.179 ms/op - Slightly worse than the baseline, but with less variation. While that's interesting, it looks like we don't need the SpinPause() calls in the monitor list functions that can loop.
24-03-2020

I did some analysis with my own Aurora perf runs: - base vs. exp: - DaCapo-h2: -13.34%, Volano: 12.24% - 8235795 made DaCapo-h2 very unhappy, but Volano very happy! - base SpinPauseNop vs. exp SpinPauseNop - DaCapo-h2: -5.44%, Volano: 3.58% - 8235795 made DaCapo-h2 unhappy, but Volano happy! - It's interesting that SpinPauseNop caused both the regression and the performance improvement to be less. - I remembered that I had planned to put SpinPause() calls into the new monitor list functions that loop so I've conditionally done that with a temporary option as part of a better SpinPauseNop patch. - I'll checkout the latest SpinPauseNop patch next week. SpinPauseNop is a patch I'm using for debugging the PAUSE instruction slowdown that we're seeing with Skylake machines. I recently updated the patch to include two new temporary options: $ hg diff -r qparent src/hotspot/share/runtime/globals.hpp diff -r b353416faedf src/hotspot/share/runtime/globals.hpp --- a/src/hotspot/share/runtime/globals.hpp Wed Feb 05 11:40:20 2020 -0500 +++ b/src/hotspot/share/runtime/globals.hpp Tue Mar 24 13:21:48 2020 -0400 @@ -695,6 +695,12 @@ "Use LWP-based instead of libthread-based synchronization " \ "(SPARC only)") \ \ + product(bool, UseOMSpinPauseInListLoops, false, \ + "Use ObjectMonitor SpinPause in list loops") \ + \ + product(bool, UseOMSpinPauseNop, false, \ + "Use ObjectMonitor SpinPauseNop instead of SpinPause") \ + \ product(intx, MonitorBound, 0, "(Deprecated) Bound Monitor population") \ range(0, max_jint) I reran the experiments after rebuild with the latest SpinPauseNop patch: - base vs. exp with options off - DaCapo-h2: -11.63%, Volano: 12.61% - about the same DaCapo-h2 regression and Volano improvement as the original experiment. - base vs. exp with -XX:+UseOMSpinPauseInListLoops - DaCapo-h2: 0.93%, Volano: -0.17% - These are both flagged as non-significant. - Looks like adding OMSpinPause in monitor list function that loops is not helpful. - base vs. exp with -XX:+UseOMSpinPauseNop - DaCapo-h2: -6.43%, Volano: 2.90% - about the same DaCapo-h2 regression and Volano improvement as the original experiment. - base vs. exp with -XX:+UseOMSpinPauseNop -XX:+UseOMSpinPauseInListLoops - DaCapo-h2: -6.86%, Volano: 2.64% - About the same as the previous experiment. - Looks like adding OMSpinPause in monitor list function that loops is not helpful.
24-03-2020

ILW = HLM = P3
10-02-2020