JDK-8194316 : GC induced regressions in 10-b34
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 10
  • Priority: P3
  • Status: Closed
  • Resolution: Won't Fix
  • OS: linux,os_x,solaris
  • CPU: sparc,x86_64
  • Submitted: 2018-01-02
  • Updated: 2018-01-26
  • Resolved: 2018-01-24
Related Reports
Relates :  
Description
Three benchmarks reproduced regressions in 10-b34. These benchmarks all use G1. Running the same benchmarks with 10-b34 while using ParallelGC show no regression.

SPECjvm98-Server.jack    approx -15 to -18% on LXS  (very large)
                                           approx -3% on STS
                                           approx -8% on MXS

SPECjvm98-Server.jess    approx -4% on LXS  (large)

Tools-Javadoc-Steady       approx -1 to -2% on LXS

Note that the SPECjvm98-Server.jack on LXS regression is very large!
Comments
Changing the priority of this regression with regards to the above comments. Regression is caused by combination of how we measure the score and a change in heuristics.
26-01-2018

The new parallel Full GC cause different heap sizing and we will not do anything specific for this issue. We will however look into improving the way we choose number of threads and how that effect the heap size in JDK-8196071.
24-01-2018

I'm starting to get a better view on this I think. I'm able to reproduce this when running exactly as the perf setup, that is 5 run for each specjvm98 sub-benchmark and looking at the best result out of these 5 runs. Just running jack isolated doesn't show a regression and doing more than 5 runs also reduce the regression a lot. It looks like the cause is that the new Full GC will shrink the heap a little less than the old one. So when jack starts in now has ~10 regions instead of 2, this will in turn lead to more time between young collections and a slower heap growth. If changing the benchmark setup to do 10 runs the heap growth has caught up with the old version and we get similar results.
17-01-2018

I've been looking at the regression from a Full GC point of view but I haven't been able to verify if it is the cause or not. What can be said is that no Full GCs occur during the _228_jack runs so it is not caused by longer Full GC times. What could be the cause is that the Full GC (as Thomas mentioned) will create a different heap layout and the heap sizing might be a bit different. I've created a build based on b34 with the Full GC capped to only using 1 thread. It would be nice to see SPECjvm98 run on this build to try to rule out or verify that the Full GC is the cause for the regression. Build id: 2018-01-16-1501561.stefan.johansson.hs MDash: http://java.se.oracle.com:10065/mdash/jobs/sjohanss-g1-serial-full-gc-b34-20180116-1503-8962
16-01-2018

Looked through JBS for potential candidates. A query of "project = JDK AND status = Resolved AND fixVersion = "10" AND component = hotspot AND "Resolved In Build" in (b34, b34e) AND Subcomponent = gc ORDER BY createdDate DESC" My most likely guess is, *if* these benchmarks do the usual system.gc() before testing, JDK-8186571: Implementation: JEP 307: Parallel Full GC for G1 I.e. the new full gc may have significanlty changed object locations/caching compared to the previous build. Another candidate could be JDK-8189941: Implementation JEP 312: Thread-local handshake although others mentioned there were perf regressions due to that, if it were this change it likely would have affected all GCs.
11-01-2018

Yes ... The SPECjvm98 benchmarks do the usual system.gc() before each test. As for TLH... Eric Caspole has been investigating a Volano29 regression in the same build. He's created "a build with b33 + only the TLH change set". This could be used against the benchmarks reported here to determine whether TLH is the cause.
11-01-2018