JDK-8147087 : Race when reusing PerRegionTable bitmaps may result in dropped remembered set entries
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 7u75,8,9
  • Priority: P1
  • Status: Closed
  • Resolution: Fixed
  • Submitted: 2016-01-14
  • Updated: 2020-07-28
  • Resolved: 2016-01-21
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 7 JDK 8 JDK 9 Other
7u101Fixed 8u101Fixed 9Fixed openjdk7uFixed
Related Reports
Relates :  
Description
Running the remembered set stress test from JDK-8134963 causes infrequent

Crash Internal Error ...heapRegionRemSet.cpp...assert(contains_reference(from)) failed: We just added it!

assertion failures.

The same issue has already been observed earlier in JDK-8017474 but closed there because of being unable to reproduce. There is also a known customer issue that after analysis shows the same problem.

Created a new CR because JDK-8017474 is already so old and has a lot of comments.
Comments
Not verified. No tests for JDK 8 version of the fix.
02-02-2016

UR SQE is ok to take the fix to PSU16_02 (not CPU).
25-01-2016

Please, pushe the backport for 7u to 7u-cpu.
25-01-2016

Set fix version to 8u76 because pushing it first to 8u
21-01-2016

noreg-other: this issue can be reproduced using the test added in JDK-8134963.
19-01-2016

Ignore above comments about another bug, this has been due to a typo in the change.
18-01-2016

It seems that this change is not enough. G1 stress tests now sometimes fail with # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (.../hotspot/src/share/vm/gc/g1/heapRegionRemSet.cpp:433), pid=26837, tid=26841 # assert(prt != __null && prt->hr() == from_hr) failed: consequence # [...] during GC. Stack trace: Current thread (0xf5e49c00): GCTaskThread [stack: 0xe5a82000,0xe5b03000] [id=26841] Stack: [0xe5a82000,0xe5b03000], sp=0xe5b01c30, free space=511k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x11a1eca] VMError::report_and_die(int, char const*, char const*, char*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned int)+0x12a V [libjvm.so+0x11a2ba0] VMError::report_and_die(Thread*, char const*, int, char const*, char const*, char*)+0x30 V [libjvm.so+0x7ee7e0] report_vm_error(char const*, int, char const*, char const*, ...)+0x60 V [libjvm.so+0xa4ae1f] OtherRegionsTable::add_reference(void*, unsigned int)+0x40f V [libjvm.so+0x9848f5] G1UpdateRSOrPushRefOopClosure::do_oop(oop*)+0x245 V [libjvm.so+0x972e37] void FilterOutOfRegionClosure::do_oop_nv<oop>(oop*)+0x77 V [libjvm.so+0x9744f5] int InstanceKlass::oop_oop_iterate<true, FilterOutOfRegionClosure>(oop, FilterOutOfRegionClosure*)+0xc5 V [libjvm.so+0x966e9a] InstanceKlass::oop_oop_iterate_nv(oop, FilterOutOfRegionClosure*)+0x2a V [libjvm.so+0xa40187] HeapRegion::oops_on_card_seq_iterate_careful(MemRegion, FilterOutOfRegionClosure*, bool, signed char*)+0x587 V [libjvm.so+0x981dec] G1RemSet::refine_card(signed char*, unsigned int, bool)+0x60c V [libjvm.so+0x983c62] RefineRecordRefsIntoCSCardTableEntryClosure::do_card_ptr(signed char*, unsigned int)+0x52 V [libjvm.so+0x8bddd5] DirtyCardQueueSet::apply_closure_to_completed_buffer(CardTableEntryClosure*, unsigned int, int, bool)+0x105 V [libjvm.so+0x92eaac] G1CollectedHeap::iterate_dirty_card_closure(CardTableEntryClosure*, unsigned int)+0x2c V [libjvm.so+0x982c59] G1RemSet::oops_into_collection_set_do(G1ParPushHeapRSClosure*, CodeBlobClosure*, unsigned int)+0xf9 V [libjvm.so+0x94a82c] G1ParTask::work(unsigned int)+0x14c V [libjvm.so+0x12012e7] GangWorker::loop()+0x27
15-01-2016

Just fyi, as the list of affected versions indicates, this is a day one G1 bug.
15-01-2016

Additional note to above analysis: This problem can only occur when a thread A reuses the PRT of thread B and the buckets in the hash set where they are stored of the from-regions of the cards added are the same. Reuse of PRT bitmap occurs if the number of PRT bitmaps meets an internal threshold for their total number.
14-01-2016

Suggested fix: void init(HeapRegion* hr, bool clear_links_to_all_list) { if (clear_links_to_all_list) { set_next(NULL); set_prev(NULL); } _collision_list_next = NULL; _occupied = 0; _bm.clear(); OrderAccess::release_store_ptr(&_hr, hr); } So far the stress tests from JDK-8134963 have not found this error. With that change the situation described above can not occur because the bitmap is guaranteed to be cleared before another thread can try to write data into it.
14-01-2016

Original analysis by Poonam P.: "What I am thinking is (let me know if I am thinking wrong) we have a memory barrier when we add the prt to the _fine_grain_regions to make sure that the addition is visible only after PRT is initialized properly, but not when we remove it. Say Thread A removed prt(hr_old) from the _fine_grain_regions but this entry is still visible to other threads. Then it initializes it with 'hr' and is just before clearing the bitmap. add_reference() { ..... if (_n_fine_entries == _max_fine_entries) { prt = delete_region_table(); // There is no need to clear the links to the 'all' list here: // prt will be reused immediately, i.e. remain in the 'all' list. prt->init(from_hr, false /* clear_links_to_all_list */); ... } void init(HeapRegion* hr, bool clear_links_to_all_list) { if (clear_links_to_all_list) { set_next(NULL); set_prev(NULL); } _hr = hr; _collision_list_next = NULL; <----Thread A is here _occupied = 0; _bm.clear(); } Now Thread B finds this prt(hr) in _fine_grain_regions which thread A just initialized with 'hr', adds its from_card in bm. And just after that Thread A clears bm and Thread B's record is lost. And Thread B encounters assertion failure.
14-01-2016