JDK-8279957 : Main refinement thread tries to wake up itself when re-enqueuing causing assertion failure
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 18,19
  • Priority: P4
  • Status: Resolved
  • Resolution: Not an Issue
  • Submitted: 2022-01-13
  • Updated: 2022-06-08
  • Resolved: 2022-06-08
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 19
19Resolved
Related Reports
Relates :  
Relates :  
Relates :  
Description
Test case : vmTestbase/gc/gctests/StringInternSyncWithGC2/TestDescription.java
OS : Crash seen on MacOS-x64 fastdebug build
Is it a Regression: Intermittent failure (can't say) 
vm options: -Xcomp -XX:+CreateCoredumpOnCrash -XX:+TieredCompilation 

# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/System/Volumes/Data/mesos/work_dir/slaves/a2dc162d-743b-4800-9e92-31f85abb45b1-S70482/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/6ec75420-ee26-4d02-ad88-7c4317f1017e/runs/4e652870-0065-401b-a0c3-991c19213f45/workspace/open/src/hotspot/share/gc/g1/g1ConcurrentRefineThread.cpp:64), pid=32481, tid=14339
#  assert(this != Thread::current()) failed: precondition
#
# JRE version: Java(TM) SE Runtime Environment (18.0.1+1) (fastdebug build 18.0.1-ea+1-1)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 18.0.1-ea+1-1, compiled mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, bsd-amd64)
# Core dump will be written. Default location: core.32481
#
# An error report file with more information is saved as:
# /System/Volumes/Data/mesos/work_dir/slaves/a2dc162d-743b-4800-9e92-31f85abb45b1-S60693/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/76920fb7-b0d8-47dc-b4c9-bc578eeba950/runs/b4513c6c-5973-4306-814c-99b083118875/testoutput/test-support/jtreg_open_test_hotspot_jtreg_vmTestbase_vm_gc_misc/scratch/1/hs_err_pid32481.log
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
#
Comments
Resolving as Not an Issue after JDK-8280917, and not a product issue regardless. Not worth "backporting" any change to JDK 18.
08-06-2022

This should no longer happen after JDK-8280917 (JDK 19 b09). Prior to that, this can happen when refine_card_concurrently fails to process the card because it encountered an unparsable part of the heap. That's a pretty rare case. It is handled by enqueuing the card for later reprocessing. After JDK-8221360 (JDK 18), enqueuing for reprocessing involves allocating a new dirty card buffer, adding the card to it, and immediately enqueuing the buffer. If this is done by the primary activation thread and the dirty card queue set has sufficient entries to trigger refinement thread notification, this could lead to an attempt at self-activation that would fail the assert. Before JDK-8221360, enqueuing for reprocessing involves accumulating such cards until a buffer is filled, and only then enqueuing the buffer. Since the reprocessing path is rare and the buffer typically holds many cards, the buffer is much less likely to get enqueued at all, so much less likely to be enqueued at just the wrong time. Before JDK-8280917, failing the assert is "harmless" in that nothing actively bad happens if we proceed. It is an "unexpected" situation, but we could remove the assert without loss of safety. There's no problem in product builds. So for JDK 18 we should consider removing that assert, just to reduce testing noise with debug builds. For earlier releases it's less clear whether that's worthwhile.
10-02-2022

Here's the crashing stack from the jdk-19+7-319-tier4 sighting: vmTestbase/gc/gctests/StringInternSyncWithGC2/TestDescription.java --------------- T H R E A D --------------- Current thread (0x00007f9d180eaf00): ConcurrentGCThread "G1 Refine#0" [stack: 0x0000700006d9e000,0x0000700006e9e000] [id=12291] Stack: [0x0000700006d9e000,0x0000700006e9e000], sp=0x0000700006e9d8e0, free space=1022k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.dylib+0x117b019] VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long)+0x6e9 V [libjvm.dylib+0x117b69b] VMError::report_and_die(Thread*, void*, char const*, int, char const*, char const*, __va_list_tag*)+0x3b V [libjvm.dylib+0x63af0d] report_vm_error(char const*, int, char const*, char const*, ...)+0xdd V [libjvm.dylib+0x78a289] G1ConcurrentRefineThread::activate()+0x79 V [libjvm.dylib+0x7dfb65] G1RemSet::refine_card_concurrently(unsigned char*, unsigned int)+0x205 V [libjvm.dylib+0x78c93a] G1DirtyCardQueueSet::refine_buffer(BufferNode*, unsigned int, G1ConcurrentRefineStats*)+0x15a V [libjvm.dylib+0x78ca8e] G1DirtyCardQueueSet::refine_completed_buffer_concurrently(unsigned int, unsigned long, G1ConcurrentRefineStats*)+0x3e V [libjvm.dylib+0x789be3] G1ConcurrentRefine::do_refinement_step(unsigned int, G1ConcurrentRefineStats*)+0xe3 V [libjvm.dylib+0x78a66e] G1ConcurrentRefineThread::run_service()+0x2ce V [libjvm.dylib+0x61853b] ConcurrentGCThread::run()+0x1b V [libjvm.dylib+0x10c9b47] Thread::call_run()+0x177 V [libjvm.dylib+0xe99240] thread_native_entry(Thread*)+0x150 C [libsystem_pthread.dylib+0x68fc] _pthread_start+0xe0 C [libsystem_pthread.dylib+0x2443] thread_start+0xf
24-01-2022

This is also extremely rare: you need an unparseable part of the heap, and the re-enqueuing must cross the processing threshold again. The relevant code has been added in JDK-8237143 in JDK15.
14-01-2022

Apart from re-notifying the semaphore and causing spurious wake-ups of the main refinement thread this does not seem to hurt (in the product).
14-01-2022

Looking at the crash cause, the stack trace and the code I think the following could have happened: - main refinement thread encounters an unparseable part of the heap, enqueues the buffer again (G1RemSet::refine_card_concurrently->G1RemSet::enqueue_for_reprocessing) - enqueuing notices that this re-enqueuing causes it to go over the card processsing threshold again - tries to activate the main refinement thread - since we are the main refinement thread, assert fails
14-01-2022