Bug ID: JDK-8300915 G1: incomplete SATB because nmethod entry barriers don't get armed

Type: Bug
Component: hotspot
Sub-Component: gc

Priority: P4
Status: Resolved
Resolution: Fixed

Submitted: 2023-01-23
Updated: 2023-02-14
Resolved: 2023-01-30

JDK 21
21 b08Fixed

Symptom:
--------

Crashes as described in JDK-8299956 because of class/nmethod unloading even though a nmethod is on stack.

The crashes are reproducible with the release build running test/langtools:tier1 repeatedly with a concurrency of 6 within 15 - 180 minutes.

Analysis:
--------

Debugging code after G1ConcurrentMark::finalize_marking() shows there are
nmethods with dead oops (mostly classloaders) on stack if MarkingCodeBlobClosure
is changed not to mark oops during G1 remark.

The following steps lead to a G1 concurrent marking cycle without arming nmethod entry barriers.
This could cause the symptom because nmethod barriers should be armed to keep oop constants
of nmethods alive.

Step 1

CodeCache::on_gc_marking_cycle_start() is called and nmethods are armed in
G1CollectedHeap::start_codecache_marking_cycle_if_inactive() before young GC

  Stack:
    CodeCache::on_gc_marking_cycle_start() : void
    G1CollectedHeap::start_codecache_marking_cycle_if_inactive() : void
    G1ConcurrentMark::pre_concurrent_start(enum GCCause::Cause) : void
    G1YoungCollector::pre_evacuate_collection_set(G1EvacInfo *) : void
    G1YoungCollector::collect() : void
    G1CollectedHeap::do_collection_pause_at_safepoint_helper() : void

Step 2

Concurrent marking start is undone at same safepoint

  Stack:
    G1ConcurrentMarkThread::start_undo_mark() : void
    G1CollectedHeap::start_concurrent_cycle(bool) : void
    G1CollectedHeap::do_collection_pause_at_safepoint_helper() : void  

Step 3

Because of the undo the CodeCache::on_gc_marking_cycle_finish() in G1ConcurrentMark::remark() is not reached.

Step 4

Next concurrent cycle starts. Same stack as in Step 1. Nmethods are not armed
because CodeCache::is_gc_marking_cycle_active() returns true in
G1CollectedHeap::start_codecache_marking_cycle_if_inactive()

This can cause the issues given in JDK-8299956. The dead loaders are most
probably loaders of (maybe inlined) optimized virtual calls that aren't
reachable anymore. Nevertheless the referencing nmethods must not be unloaded if
they are on stack. The backout done with JDK-8299956 prevents this by iterating
all frames and marking the oops of nmethods on stack.

A better fix would be to make sure nmethod entry barriers are armed when g1 marking starts.

Changeset: 3db558b6 Author: Richard Reingruber <rrich@openjdk.org> Date: 2023-01-30 08:43:15 +0000 URL: https://git.openjdk.org/jdk/commit/3db558b67bebfe559833331475f481c588147084

30-01-2023

A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/12194 Date: 2023-01-25 13:32:27 +0000

25-01-2023

Blocks :	JDK-8302462 - [REDO] 8297487: G1 Remark: no need to keep alive oop constants of nmethods on stack
Relates :	JDK-8299956 - [BACKOUT] 8297487: G1 Remark: no need to keep alive oop constants of nmethods on stack
Relates :	JDK-8288970 - G1 does not keep weak nmethod oops alive