JDK-8155917 : Memory access in free regions during G1 full gc causes regressions in SPECjvm2008 scimark.fft,lu,sor,sparse with 9+116 on Linux-x64
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 9
  • Priority: P2
  • Status: Closed
  • Resolution: Fixed
  • OS: linux
  • CPU: generic
  • Submitted: 2016-05-03
  • Updated: 2018-06-21
  • Resolved: 2016-08-30
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 9
9 b137Fixed
Related Reports
Blocks :  
Relates :  
Relates :  
Relates :  
Description
FFT in particular regress in build 116 compared to 115, with more marking rounds and GCs during the benchmark run. The regression shows up with and without large pages enabled.

As this might be ergo-related, testing impact of JDK-8077144 might be a good first step.
Comments
See comments from Thomas about performance testing.
26-07-2017

During Full GC, JDK-8073321 accesses memory that has not been paged in (in free regions) at startup. This, possibly in combination with the hardware used in our performance lab, seems to screw with the memory manager of the OS (some RHEL 6), causing performance regressions. Locally, on RHEL7, I can not reproduce the issue (i.e. the impact on the following benchmark run, the full gc is still slow), but it is definitely a good idea to not have full gc touch all memory.
10-08-2016

On FFT (going to look at other benchmarks too, and verify this result a bit more) JDK-8073321 seems to be the cause for this problem. Apart from the score difference, the only observable effect is that the first initial full gc that should compact the initial heap after startup takes ~12 times longer. JDK-8073321 should only affect full gc, so it is even more interesting that it has such a high impact on the remaining runtime (and only depending on some other settings).
08-08-2016

Further investigation showed that almost all of these regressions disappear with -XX:+UseCountedLoopSafepoints. However there are many crashes on SPECjvm2008 benchmarks, so any further investigation until they have been fixed a not good use of time. For this reason I am adding JDK-8161147 as a blocker for this issue.
13-07-2016

There is a very strong indication that the regressions are caused by not having JDK-5014723. At least there is no regression from b115 to b116 if -XX:+UseCountedLoopSafepoints is set.
12-07-2016

At least the crashes reproduce easily with SPECjvm2008 scimark.fft and sor when -XX:+UseCountedLoopSafepoints is set with something like b125.
12-07-2016

This issue may be related to JDK-8161147 as G1 does a few more GCs with later versions that increase the likelihood for these long waits causing performance drops occurring quite a bit. Particularly it explains very high variation of results after some fixes (that still result in slightly more GC pauses than before).
12-07-2016

Some runs with gc+ergo*=debug indicate that the following is happening: the heap is so tight that G1 continuously requests marking cycle (i.e. below threshold), but none of these markings reclaim anything. For adaptive IHOP the situation looks as if the marking cycle is never completed (because there is not a single mixed gc happening), so the threshold is never adjusted (upwards) to break out of this loop. With b116 the marking cycle got significantly faster, so it can do more concurrent markings (that all do nothing) with everything that entails, most likely causing the regressions.
25-05-2016