JDK-8202049 : G1: ReferenceProcessor doesn't handle mark stack overflow
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Priority: P2
  • Status: Closed
  • Resolution: Fixed
  • OS: generic
  • CPU: generic
  • Submitted: 2018-04-19
  • Updated: 2023-07-30
  • Resolved: 2018-06-07
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11
11 b18Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Description
If the mark stack overflows during reference processing, further calls to the is_alive closure may return the wrong value, returning false for an actually live object. (Said overflow and wrong answer can only happen because of soft reference phase1 or finalizer reference processing, as those are the only cases where previously unknown live objects can be found during reference processing.)  There doesn't appear to be any mechanism for communicating to reference processing that the is_alive closure is no longer reliable.

Comments
URL: http://hg.openjdk.java.net/jdk/jdk/rev/ab967988f850 User: tschatzl Date: 2018-06-07 09:21:28 +0000
07-06-2018

Actually it looks like we already support restarting marking of the heap during reference processing. The only thing missing seems to be to bail out if an overflow occurred anyway.
26-04-2018

We decided that we will change the marking to simply expand the global mark stack when it goes out of memory, until MarkStackSizeMax. We will exit the VM if that is not enough global mark stack memory available for processing. We will remove the re-scan of the region as it is already complicated enough and adding the condition that we rescan only outside of weak reference processing makes it even worse. A forced rescan of the entire heap is already a significant performance problem. Future changes to the marking algorithm may implement a guaranteed bounded-memory algorithm.
26-04-2018

Another option would be to provide another closure that asks whether reference processing should continue or abort, to be called immediately after soft reference phase1 and finalizer reference phase 3. I'm not sure that's actually better than the earlier suggestion of not allowing overflow during reference processing, which seems like it might be a simpler solution to this very rare issue that only affects some collectors. [Later: never mind. This doesn't work either.]
23-04-2018

So the only option would be to handle overflow "gracefully" - either by expanding the mark stack (and risk C-heap OOME), or just exit the VM or make the marking algorithm cope with a fixed amount of mark stack memory.
23-04-2018

I think changing the is_alive closure to always return true if mark stack overflow occurred doesn't work. I think it can lead to a "should be impossible" state where a WeakReference referring to some object X is cleared and notified, but but a PhantomReference also referring to X doesn't get cleared and notified.
23-04-2018

Another option would be to not allow mark stack overflow by continuing to allocate (global) mark stack space until OOME
23-04-2018

So one option would be to always start returning true if an overflow occurred? I.e. not communicate the reference processor that an overflow occurred, but communicate to the is_alive closure that that has occurred.
23-04-2018