JDK-8138966 : Intermittent SEGV running ParallelGC
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 9
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • OS: linux
  • CPU: aarch64
  • Submitted: 2015-10-06
  • Updated: 2016-04-27
  • Resolved: 2015-11-04
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 8 JDK 9
8u101Fixed 9 b94Fixed
Description
We are seeing intermittent SEGVs running specjbb2013 with both aarch64 jdk8 port (built from ssh://enevill@hg.openjdk.java.net/aarch64-port/jdk8u) and jdk9 (built from ssh://enevill@hg.openjdk.java.net/jdk9/hs-comp).

This crash only occurs on Partner X hardware.

Prebuilt binaries which exhibit the problem may be downloaded from

http://openjdk.linaro.org/releases/jdk8u-server-release-1509.tar.xz

http://openjdk.linaro.org/releases/jdk9-server-release-1508.tar.xz

A sample hs_err log is attached below.

This only occurs with UseParallelGC so it happens by default with jdk8 but only happens with jdk9 with -XX:+UseParallelGC

The frequency of the crash is about one in 10 runs.

The command used to invoke specjbb2013 is

java -Xmx50g -Xms50g -Xmn40g -Dspecjbb.forkjoin.workers=48 -jar specjbb2013.jar -m COMPOSITE

(note: if using jdk9 you also need -XX:+UseParallelGC to observe the problem)
Comments
noreg-hard: this is an issue related to missing memory barriers. This is hard to reliably reproduce.
04-11-2015

If this is a GC problem, which it seems it is, you should move it to the right subcomponent.
03-11-2015

There is a race between threads when updating the block table in ParallelCompactData::calc_new_pointer : if (!region_ptr->blocks_filled()) { PSParallelCompact::fill_blocks(addr_to_region_idx(addr)); region_ptr->set_blocks_filled(); } Neither blocks_filled() nor set_blocks_filled() have any memory fences so it is possible for a thread to observe a partially-filled block table.
29-10-2015

It's similar to problems we've seen before in that an object in the old generation is being prematurely collected. I'm guessing that the cause of the problem is similar too: one of the collector's worker threads is missing a store fence.
09-10-2015

I've reproduced the problem and I'm working on it.
08-10-2015

I have replicated a failure on Partner Y hardware after 32 executions. This was using the following binary http://openjdk.linaro.org/releases/jdk8u-server-release-1509.tar.xz which was built on Sep 30th from ssh://enevill@hg.openjdk.java.net/aarch64-port/jdk8u error log hs_err_pid12209.log attached
08-10-2015

Sometime it is as frequent as once every 3 runs, sometimes once every 10 runs. So given that a single run takes ~2 hours, between 6 hours and 20 hours.
07-10-2015

How long does it take before you see this?
07-10-2015