JDK-8253902 : G1: Starting a new marking cycle before the conc mark thread fully completed causes assertion failure
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 16
  • Priority: P2
  • Status: Closed
  • Resolution: Fixed
  • OS: linux
  • CPU: x86_64
  • Submitted: 2020-10-01
  • Updated: 2021-01-19
  • Resolved: 2020-10-06
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 16
16 b19Fixed
Related Reports
Duplicate :  
Relates :  
Description
The following test failed in the JDK16 CI:

gc/stress/TestStressG1Humongous.java

Here's a snippet from the log file:

[342.714s][info ][gc] GC(5240) Concurrent Mark Cycle
[342.979s][info ][gc] GC(5240) Pause Remark 75M->75M(132M) 171.410ms
[343.059s][info ][gc] GC(5240) Pause Cleanup 122M->122M(132M) 0.174ms
[343.075s][info ][gc] GC(5241) Pause Young (Concurrent Start) (G1 Humongous Allocation) 123M->123M(132M) 14.023ms
[343.078s][info ][gc] GC(5240) Concurrent Mark Cycle 363.855ms
# To suppress the following error report, specify this argument
# after -XX: or in .hotspotrc:  SuppressErrorAt=/g1ConcurrentMarkThread.cpp:167
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/opt/mach5/mesos/work_dir/slaves/4076d11c-c6ed-4d07-84c1-4ab8d55cd975-S108696/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/783a4faa-6c31-4af4-9d3c-65ba47924b9c/runs/baf6de7f-a630-481f-a34d-f9799b9336ee/workspace/open/src/hotspot/share/gc/g1/g1ConcurrentMarkThread.cpp:167), pid=313789, tid=313794
#  assert(!in_progress()) failed: should have been cleared
#
# JRE version: Java(TM) SE Runtime Environment (16.0+18) (fastdebug build 16-ea+18-911)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 16-ea+18-911, compiled mode, sharing, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0xb4b504]  G1ConcurrentMarkThread::wait_for_next_cycle()+0xe4
#
# Core dump will be written. Default location: Core dumps may be processed with "/opt/core.sh %p" (or dumping to /opt/mach5/mesos/work_dir/slaves/4076d11c-c6ed-4d07-84c1-4ab8d55cd975-S205149/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/20d4c14e-cdc2-4a73-a953-0ff2fde45c41/runs/dfa0bbd2-617b-4220-b173-e0dc67ee3432/testoutput/test-support/jtreg_open_test_hotspot_jtreg_hotspot_gc/scratch/1/core.313789)
#
# An error report file with more information is saved as:
# /opt/mach5/mesos/work_dir/slaves/4076d11c-c6ed-4d07-84c1-4ab8d55cd975-S205149/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/20d4c14e-cdc2-4a73-a953-0ff2fde45c41/runs/dfa0bbd2-617b-4220-b173-e0dc67ee3432/testoutput/test-support/jtreg_open_test_hotspot_jtreg_hotspot_gc/scratch/1/hs_err_pid313789.log
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
#
];
 stderr: []
 exitValue = 134

java.lang.RuntimeException: Expected to get exit value of [0]


Here's the crashing thread's stack:

---------------  T H R E A D  ---------------

Current thread (0x00007ff9dc066660):  ConcurrentGCThread "G1 Main Marker" [stack: 0x00007ff9c1c96000,0x00007ff9c1d96000] [id=313794]

Stack: [0x00007ff9c1c96000,0x00007ff9c1d96000],  sp=0x00007ff9c1d94d00,  free space=1019k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0xb4b504]  G1ConcurrentMarkThread::wait_for_next_cycle()+0xe4
V  [libjvm.so+0xb4dba0]  G1ConcurrentMarkThread::run_service()+0x1d0
V  [libjvm.so+0x97452d]  ConcurrentGCThread::run()+0x2d
V  [libjvm.so+0x175d380]  Thread::call_run()+0x100
V  [libjvm.so+0x1458186]  thread_native_entry(Thread*)+0x116
Comments
Verified by CI results.
19-01-2021

Changeset: c9d1dcc6 Author: Thomas Schatzl <tschatzl@openjdk.org> Date: 2020-10-06 10:16:19 +0000 URL: https://git.openjdk.java.net/jdk/commit/c9d1dcc6
06-10-2020

There is another oddity in the log output: [342.714s][info ][gc] GC(5240) Concurrent Mark Cycle [342.979s][info ][gc] GC(5240) Pause Remark 75M->75M(132M) 171.410ms [343.059s][info ][gc] GC(5240) Pause Cleanup 122M->122M(132M) 0.174ms [343.075s][info ][gc] GC(5241) Pause Young (Concurrent Start) (G1 Humongous Allocation) 123M->123M(132M) 14.023ms [343.078s][info ][gc] GC(5240) Concurrent Mark Cycle 363.855ms I.e. the concurrent start pause log message occurs before the Concurrent Mark Cycle one. Looking at the history, this has always been a possibility.
02-10-2020

The assert is wrong: with the changes from JDK-8240556 there is a small window where a) concurrent gc thread sets itself to idle b) in a gc pause we request/start a new gc, setting in-progress c) the concurrent gc thread goes into wait_for_next_cycle() where this assert fails Without that assert the concurrent cycle will just continue. Previously this situation worked because there has been another "pre-in-progress" state (implemented in a somewhat complicated way with nested states).
02-10-2020

Looks like there is a race in acknowledging the collection cycle state from the JDK-8240556 changes.
02-10-2020