JDK-8218880 : G1 crashes when issuing a periodic GC while the GCLocker is held
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 12,13
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2019-02-12
  • Updated: 2019-04-12
  • Resolved: 2019-03-04
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 12 JDK 13
12.0.2Fixed 13 b11Fixed
Related Reports
Duplicate :  
Duplicate :  
Relates :  
Description
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/scratch/opt/mach5/mesos/work_dir/5fbae2b5-f76e-47c8-bb78-216e4a9d30fe/workspace/open/src/hotspot/share/runtime/thread.hpp:2096), pid=1312, tid=7
#  assert(thread->is_Java_thread()) failed: just checking
#
# JRE version: Java(TM) SE Runtime Environment (13.0) (fastdebug build 13-internal+0-jdk13-jdk.373)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 13-internal+0-jdk13-jdk.373, mixed mode, sharing, tiered, compressed oops, g1 gc, solaris-sparc)
# Core dump will be written. Default location: /opt/mach5/mesos/work_dir/df48cb68-2e89-40b2-ba97-a4768750775c/testOutput/test-support/jtreg_open_test_hotspot_jtreg_tier1_gc/scratch/3/core or core.1312
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
#

---------------  S U M M A R Y ------------

Command Line: -XX:+UseG1GC -XX:G1PeriodicGCInterval=100 -Xlog:gc,gc+periodic=debug -Xmx10M gc.g1.TestPeriodicLogMessages$GCTest

Host: XXXXX, Sparcv9 64 bit 4133 MHz, 63 cores, 64G, Oracle Solaris 11.3 SPARC
Time: Tue Feb 12 13:51:55 2019 PST elapsed time: 0 seconds (0d 0h 0m 0s)

---------------  T H R E A D  ---------------

Current thread (0x00000001001d5000):  ConcurrentGCThread "G1 Young RemSet Sampling" [stack: 0xffffffff58600000,0xffffffff58700000] [id=7]

Stack: [0xffffffff58600000,0xffffffff58700000],  sp=0xffffffff586ff1f0,  free space=1020k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x2471f1c]  void VMError::report_and_die(int,const char*,const char*,void*,Thread*,unsigned char*,void*,void*,const char*,int,unsigned long)+0x92c
V  [libjvm.so+0x2471588]  void VMError::report_and_die(Thread*,void*,const char*,int,const char*,const char*,void*)+0x38
V  [libjvm.so+0x14925a0]  void report_vm_error(const char*,int,const char*,const char*,...)+0x80
V  [libjvm.so+0xe6f170]  JavaThread*JavaThread::current()+0xc0
V  [libjvm.so+0x16b8d08]  void GCLocker::stall_until_clear()+0x18
V  [libjvm.so+0x15aacc0]  void G1CollectedHeap::collect(GCCause::Cause)+0x840
V  [libjvm.so+0x16b36bc]  void G1YoungRemSetSamplingThread::run_service()+0x25c
V  [libjvm.so+0x13d2028]  void ConcurrentGCThread::run()+0x88
V  [libjvm.so+0x2388adc]  void Thread::call_run()+0x1ec
V  [libjvm.so+0x202f824]  thread_native_entry+0x3e4

Test: gc/g1/TestPeriodicLogMessages.java

Problematic code:

void GCLocker::stall_until_clear() {
  assert(!JavaThread::current()->in_critical(), "Would deadlock");
  MutexLocker   ml(JNICritical_lock);

  if (needs_gc()) {
    log_debug_jni("Allocation failed. Thread stalled by JNI critical section.");
  }

  // Wait for _needs_gc  to be cleared
  while (needs_gc()) {
    JNICritical_lock->wait();
  }
}

Looks like the assertion needs to check if dealing with a JavaThread first.

Unclear what may have changed, or if this is just on a path normally never taken - we failed to schedule a collection.
Comments
Fix Request Justification: without this fix the feature implemented in JDK-8204089 Timely Reduce Unused Committed Memory is unusable due to somewhat frequent crashes caused by the error fixed here. Description: The problem fixed is that we are not allowed to wait for the GC locker to be relinquished by JNI code running in a critical section for deadlock reasons when trying to issue a periodic GC (see JEP). The fix is to not insist on starting a GC when the GC locker is held (i.e. JNI code runs in a critical section). Risk: low - the feature is already broken and already gated via a command line switch that is off by default. Testing: new test case directly exercising the failure mode and existing periodic gc functionality, hs-tier 1-5, baked in nightlies in jdk/jdk for a bit Review: Patch did not apply cleanly due to a merge error in a copyright notice. Review at http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2019-March/024983.html
05-03-2019

Removed OS = solaris. There's nothing solaris-specific about this problem.
19-02-2019

JDK-8212657 introduced a call to collect() from a non-Java thread for the first time. That might not have been a good idea, for more reasons than just this assertion not being written properly for such.
13-02-2019

Assert seems wrong after JDK-8212657 - we consciously allowed that thread to start collections there.
13-02-2019

Thanks Kim!
13-02-2019

JDK-8212657 introduced the call chain in the failure. And having the periodic check happen while some thread is in a JNI critical section is probably a pretty rare occurrence.
12-02-2019