JDK-8222426 : vmTestbase/vm/gc/containers/Combination02 SIGSEGVs during optional evacuation
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 13
  • Priority: P2
  • Status: Closed
  • Resolution: Duplicate
  • Submitted: 2019-04-12
  • Updated: 2019-05-05
  • Resolved: 2019-04-25
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 13
13Resolved
Related Reports
Duplicate :  
Duplicate :  
Relates :  
Description
The vmTestbase/vm/gc/containers/Combination02 test sometimes crashes when processing the optional remembered set with (most of the time) the following stack trace:

V  [libjvm.so+0x764db8]  G1ParCopyClosure<(G1Barrier)0, (G1Mark)0>::do_oop(oopDesc**)+0x38
V  [libjvm.so+0x74c250]  G1OopStarChunkedList::oops_do(OopClosure*, OopClosure*)+0x60
V  [libjvm.so+0x75a2c4]  G1ScanRSForRegionClosure::scan_opt_rem_set_roots(HeapRegion*)+0xc4
V  [libjvm.so+0x75a3b7]  G1ScanRSForRegionClosure::do_heap_region(HeapRegion*)+0x47
V  [libjvm.so+0x7117e4]  G1CollectionSet::iterate_incremental_part_from(HeapRegionClosure*, unsigned int, unsigned int) const+0x84
V  [libjvm.so+0x758ec8]  G1RemSet::scan_rem_set(G1ParScanThreadState*, unsigned int, G1GCPhaseTimes::GCParPhases, G1GCPhaseTimes::GCParPhases, G1GCPhaseTimes::GCParPhases)+0xf8
V  [libjvm.so+0x70b978]  G1EvacuateRegionsBaseTask::work(unsigned int)+0x98

siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: 0x0000000000000000

Reproduction 1 in 1000 runs of that stress test in the CI environment.

Sometimes the stack trace does not contain the topmost stack frame, indicating the crash in G1OopStarChunkedList::oops_do(), but otherwise looking exactly the same.
Comments
Providing fix for this issue with JDK-8222492 as they are the same problem, which I upgraded to a bug of the same priority when closing this as dup for the other.
25-04-2019

After some thinking I believe both case *2 and *1 are one and the same issue just a bit different: after copying an object that has been wrongly determined as live from B into another region, and then scan it for new references it may obviously have references to that old region, i.e. B, that have not been in the remembered set. Some additional logging confirms that. So JDK-8222492 completely fixes the issue.
25-04-2019

The crash is caused by stale index_in_opt_cset() entries in a HeapRegion, causing an out-of-bounds index access in one of the next (mixed) GCs after the next concurrent cycle if it were added to the old part of the initial evacuation set. Which is likely, because it contains very few objects, see below. The following must be true for that to occur: - there must be multiple optional increments - there must be a remembered set entry for a region A of increment X containing references a region B of increment X-n (n >= 1) - that remembered set entry must contain a reference into B (observed cases: *1) a remembered set entry from A in B having a reference to itself i.e. within the same region *2) or a reference from another region - that referenced object must not have been evacuated already in some other way - that evacuation must experience an evacuation failure This results in region B experiencing an evacuation failure, which means that - the region will not be freed (and B.index_in_opt_cset() would be cleared) - the region will not go through the evacuation failure protocol, because JDK-8218668 introduced an optimization where only regions in the current evacuation increment are handled during evacuation failure. Pre-JDK-8218668 evacuation failure handling always iterated all regions. Fixes could be: - revert evacuation failure handling to previous behavior - unconditionally clear the index_in_opt_cset while freeing the collection set - JDK-8222492 Notes: (*1) case is clear, and could (and should) be avoided by JDK-8222492, or simply removing the evacuation failure optimization introduced by JDK-8218668 (*2) case looks like something that should not happen. So far removal of the JDK-8218668 optimization would catch this issue as well. Needs further investigation.
25-04-2019

jib > [16.615s][trace][gc,ergo,cset] GC(107) Start choosing CSet. pending cards: 1558 predicted base time: 2.75ms remaining time: 197.25ms target pause time: 200.00ms jib > [16.615s][trace][gc,ergo,cset] GC(107) Add young regions to CSet. eden: 105 regions, survivors: 15 regions, predicted young region time: 88.35ms, target pause time: 200.00ms jib > [16.695s][info ][gc ] GC(107) Pause Young (Normal) (G1 Evacuation Pause) 2252M->2221M(2416M) 80.062ms jib > [16.710s][trace][gc,ergo,cset] GC(108) Start choosing CSet. pending cards: 1556 predicted base time: 2.67ms remaining time: 197.33ms target pause time: 200.00ms jib > [16.710s][trace][gc,ergo,cset] GC(108) Add young regions to CSet. eden: 105 regions, survivors: 15 regions, predicted young region time: 88.20ms, target pause time: 200.00ms jib > [16.781s][info ][gc ] GC(108) To-space exhausted jib > [16.782s][info ][gc ] GC(108) Pause Young (Normal) (G1 Evacuation Pause) 2326M->2341M(2416M) 72.198ms [...] jib > [18.782s][trace][gc,ergo,cset] GC(121) Start choosing CSet. pending cards: 1564 predicted base time: 2.68ms remaining time: 197.32ms target pause time: 200.00ms jib > [18.782s][trace][gc,ergo,cset] GC(121) Add young regions to CSet. eden: 105 regions, survivors: 15 regions, predicted young region time: 84.97ms, target pause time: 200.00ms jib > [18.887s][info ][gc ] GC(121) To-space exhausted jib > [18.888s][info ][gc ] GC(121) Pause Young (Prepare Mixed) (G1 Evacuation Pause) 2307M->2323M(2416M) 105.696ms jib > [18.905s][trace][gc,ergo,cset] GC(122) Start choosing CSet. pending cards: 2860 predicted base time: 3.12ms remaining time: 196.88ms target pause time: 200.00ms jib > [18.905s][trace][gc,ergo,cset] GC(122) Add young regions to CSet. eden: 85 regions, survivors: 15 regions, predicted young region time: 73.07ms, target pause time: 200.00ms jib > [18.905s][debug][gc,ergo,cset] GC(122) Start adding old regions to collection set. Min 155 regions, max 242 regions, time remaining 123.81ms, optional threshold 24.76ms jib > [18.905s][debug][gc,ergo,cset] GC(122) Finish adding old regions to collection set (Maximum number of regions). Initial 242 regions, optional 0 regions jib > [18.905s][debug][gc,ergo,cset] GC(122) Finish choosing collection set old regions. Initial: 242, optional: 0, predicted old time: 0.00ms, predicted optional time: 0.00ms, time remaining: 81.83 Optional regions are zero per the log messages in gc 122, so G1 should not get into the optional evacuation path, however it crashes in there. The most recent mixed gc had optional regions though. So maybe some missing clearing/setup of that?
16-04-2019

jdk-13+15 (a random recent build without JDK-8218668) passes 1000 repetitions.
16-04-2019