JDK-8224875 : Shenandoah: ParallelCleaning code unloading should take lock to protect shared code roots array
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 11-shenandoah,12,13
  • Priority: P1
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2019-05-28
  • Updated: 2020-11-20
  • Resolved: 2019-05-29
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 13 JDK 14
13 b23Fixed 14Fixed
Related Reports
Relates :  
Relates :  
Description
This was found in Shenandoah testing after JDK-8224115 that exposed it to the mercy of CodeCache lock. There is apparently a path via ParallelCleaningTask that calls into nmethod::make_unloaded without the lock, which in turn calls CollectedHeap::unregister_nmethod. This definitely happens with Shenandoah, and might also happen with G1? I cannot see where CodeCache lock is taken on CH::unregister_nmethod path in G1.

# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/home/jenkins/workspace/nightly/jdk-jdk/src/hotspot/share/gc/shenandoah/shenandoahCodeRoots.cpp:159), pid=82156, tid=82172
#  assert(CodeCache_lock->owned_by_self()) failed: Must own CodeCache_lock
...

---------------  T H R E A D  ---------------

Current thread (0x00007fd43c007000):  GCTaskThread "Shenandoah GC Threads#4" [stack: 0x00007fd4440da000,0x00007fd4441da000] [id=82172]

Stack: [0x00007fd4440da000,0x00007fd4441da000],  sp=0x00007fd4441d8b30,  free space=1018k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x15c0d74]  ShenandoahCodeRoots::remove_nmethod(nmethod*)+0x474
V  [libjvm.so+0x136d5d0]  nmethod::make_unloaded()+0x3a0
V  [libjvm.so+0x1441419]  ParallelCleaningTask::work(unsigned int)+0xc9
V  [libjvm.so+0x18fc7e6]  GangWorker::run_task(WorkData)+0x66
V  [libjvm.so+0x18fc8d8]  GangWorker::loop()+0x48
V  [libjvm.so+0x17d756b]  Thread::call_run()+0xfb
V  [libjvm.so+0x13fdac9]  thread_native_entry(Thread*)+0x119

Comments
This patch is not applicable to sh-8u, as JDK-8224115 is not in sh-8u.
20-11-2020

...actually, the assert here might be too strong! G1 checks the safepoint, and thus does not fail the assert. void ShenandoahCodeRoots::remove_nmethod(nmethod* nm) { assert(CodeCache_lock->owned_by_self(), "Must own CodeCache_lock"); ...but Shenandoah does want a lock, because it modifies the shared data structure during parallel cleaning. See how G1 deals with it: void HeapRegionRemSet::remove_strong_code_root(nmethod* nm) { assert(nm != NULL, "sanity"); assert_locked_or_safepoint(CodeCache_lock); MutexLocker ml(CodeCache_lock->owned_by_self() ? NULL : &_m, Mutex::_no_safepoint_check_flag); // <--- ... _code_roots.remove(nm);
28-05-2019

Should probably do this: diff -r 749421a1c1a1 src/hotspot/share/gc/shared/parallelCleaning.cpp --- a/src/hotspot/share/gc/shared/parallelCleaning.cpp Tue May 28 11:23:59 2019 +0200 +++ b/src/hotspot/share/gc/shared/parallelCleaning.cpp Tue May 28 11:56:20 2019 +0200 @@ -104,4 +104,5 @@ // The first nmethods is claimed by the first worker. if (worker_id == 0 && _first_nmethod != NULL) { + MutexLocker mu(CodeCache_lock, Mutex::_no_safepoint_check_flag); _first_nmethod->do_unloading(_unloading_occurred); _first_nmethod = NULL; @@ -118,6 +119,9 @@ } - for (int i = 0; i < num_claimed_nmethods; i++) { - claimed_nmethods[i]->do_unloading(_unloading_occurred); + { + MutexLocker mu(CodeCache_lock, Mutex::_no_safepoint_check_flag); + for (int i = 0; i < num_claimed_nmethods; i++) { + claimed_nmethods[i]->do_unloading(_unloading_occurred); + } } }
28-05-2019