JDK-8055232 : (ref) Exceptions while processing Reference pending list
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.lang
  • Affected Version: 9
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • OS: generic
  • CPU: generic
  • Submitted: 2014-08-15
  • Updated: 2016-09-02
  • Resolved: 2016-09-01
Related Reports
Relates :  
Relates :  
Relates :  
The java.lang.ref.Reference class has a private Lock class and static lock instance of that class.  It is documented as providing synchronization with the garbage collector, with a warning that one must not allocate any new objects while holding this lock.

The Reference class has a tryHandlePending member function containing a synchronized block on that lock object.

Within that synchronized block is a comment indicating some of the code therein is written in a particular way because the instanceof operator might throw OutOfMemoryError.  That would seem to imply that instanceof can allocate new objects, which is contrary to the requirement on holders of the lock object.

The Reference class also contains a static initialization to ensure the relevant class for that instanceof (java.lang.misc.Cleaner) has been loaded, with the apparent intent of ensuring that instanceof does *not* need to allocate memory.  But if that's so, then it would seem that any code contortions to account for that possibility are unnecessary.

I'm not sure whether there is a bug in the code or not.  At the very least, there are problems with the comments.

The relevant code was substantially revised by JDK-8156500, resulting in a fix for this bug too.

Yes, the whole point of wait() call it to relinquish the lock so that GC can discover new references. So it doesn't matter when this happens - during wait() or immediately after the wait() when the lock is re-obtained and InterruptedException is allocated, which can trigger a new round of reference discovery within a nested lock.

Yes - JDK-7038914 dealt with the more likely OOME cause of having to load the InterruptedException class, but if the thread is interrupted then the actual InterruptedException instance must still be allocated. But in that case the thread has not already looked at the pending list so it should be safe - no state needs to be preserved across the wait() call.

Besides the possibility of allocation from instanceof, I think there's also a possibility of allocation in order to throw InterruptedException. That would have the same problems, and we need a wait/wakeup mechanism for when references are added to the list.

Thread.yield is a non-blocking operation that simply hints to the scheduler to give the processor to someone else - no blocking so no InterruptedException.

I think this is ready to be proposed as a patch on the core-libs-dev mailing list.

I suggested the other order in case the call to Thread.yield() throws. I later looked it up and was somewhat surprised that Thread.yield() doesn't have InterruptedException as a checked exception.

No, waiting to enter a synchronized block is not interruptible. At least by Java specification. If it was to throw InterruptedException it would be a bug. Regarding Thread.yield() - I think that doing it before re-entering the synchronized block is more productive, since we are in an OOME handler as a result of unsuccessful allocation and don't want to race with GC for obtaining the lock right away. GC has probably already finished it's round before OOME is even thrown, but that's implementation-dependent I think. What is your reasoning, Kim, for adding the reference back to the pending list before the Thread.yield()?

Either of those solutions seem plausible to me. Looking at the webrev, I might add the reference back to the pending list before the Thread.yield(). Hm, can a blocked synchronize statement throw InterruptedException? I would guess not, but would have to look it up to be certain. If interruption is possible there, then the reference notification can still be lost with that proposed new code. Avoiding that would be bit messy.

So, one possibility is to move instanceof check out of synchronized block to where it was before and act as if preloading of Cleaner class is the ultimate measure to prevent ReferenceHandler thread to die with OOME. The other possibility is to move instanceof check out of synchronized block and wrap it with try/catch and handle the OOME by re-linking the Reference back into the pending list: http://cr.openjdk.java.net/~plevart/jdk9-dev/ReferenceHandlerExceptions/webrev.01/ I think this is better than just dropping the reference on the floor.

The lock appears to be acquired as part of GC safepoint entry. And the GC needs to be holding the lock in order to transfer reference objects from the per-GC reference discovery lists to the pending reference list and then notifying the wait() in Reference.tryHandlePending(). I don't see a way to delay that transfer. It might be possible to redesign the interface around the pending list to eliminate the lock, if notification doesn't require holding the lock (as is permitted for pthread condition variables). I don't know if that's actually possible for Java notification; the documentation for Object.notify() says "should only be called by a thread that is the owner of this object's monitor." I don't know if that "should" is a recommendation or a requirement. And even if that's permitted, it introduces a race condition that could sometimes lead to a delay in the processing of some references until some later collection cycle [unless the wait is given a timeout to limit the delay, at the cost of periodically "polling" the list].

Kim - can you confirm if GC acquires this lock in order to proceed a collection? This lock is for reference processing. I wonder if the reference processing should only be impacted rather than the entire GC cycle if GC fails to acquire this lock. Such scenario would happen when the reference handler thread trying to handle pending references but object allocation fails (while holding the lock), it will trigger GC in which there may be unreachable objects that can be collected.

Note: Kalyan (Srikalyan Chandrashekar) is no longer at Oracle. I'm not sure how we would test this issue if he was the only one who could reproduce it.

I have run the following test (with -verbose:class): {code:java} public class Test { static class Lazy { } public static void main(String[] args) { System.out.println(new Object() instanceof Lazy ? "Lazy" : "not Lazy"); } } {code} ...that prints: ... [Loaded Test from file:/home/peter/work/local/jdk9-dev-test/out/test/jdk9-dev-test/] ... [Loaded Test$Lazy from file:/home/peter/work/local/jdk9-dev-test/out/test/jdk9-dev-test/] ... not Lazy ... Which means that instanceof operator *is* a class-loading trigger. Meaning it is very probable that Kaylan (srikalyan.chandrashekar@oracle.com) has been observing an OOME triggered from unsuccessful loading of the Cleaner class. In original code, "r instanceof Cleaner" was called out of synchronized (lock) {} block, but we moved it into it to arrange so that the OOME thrown from it happens before the reference is unlinked from the pending chain, so that it is not dropped on the floor in case we handle OOME by retrying. By preloading Cleaner class at Reference class initialization time, the possibility of "r instanceof Cleaner" throwing OOME is hopefully non-existent. We might as well move the instanceof check out of synchronized block to where it was before and see if Kaylan can reproduce the failure - many people tried it but were unsuccessful. Only Kaylan seemed to have such an environment capable of reproducing it. I can prepare the patch. Are you ready to put it to the test?

Actually, what was confusing me is the combination of lines 111-114 /* Object used to synchronize with the garbage collector. The collector * must acquire this lock at the beginning of each collection cycle. It is * therefore critical that any code holding this lock complete as quickly * as possible, allocate no new objects, and avoid calling user code. */ [especially the phrase "allocate no new objects"] in conjunction with lines 181-182 // Also prevent CPU intensive spinning in case 'r instanceof Cleaner' above // persistently throws OOME for some time... occurring within a synchronized block on the lock object being discussed in lines 111-114. If it is true that allocation of new objects while holding that lock is a problem, then either that instanceof must be moved out of the lock context, or it really doesn't allocate, in which case it shouldn't actually throw OOME, right? Alternatively, it might be OK to allocate while holding that lock, but I don't think that's actually true.

Doh, I see. The confusing comments are in lines 181, 182: // 'instanceof' might throw OutOfMemoryError sometimes // so do this before un-linking 'r' from the 'pending' chain... and 200, 201: // Also prevent CPU intensive spinning in case 'r instanceof Cleaner' above // persistently throws OOME for some time... ...we re-visited the OOME bug which resulted in further change to RefrecenceHandler before me adding the tryHandlePending() method. Issue is JDK-8022321 which I don't have access too. The even more interesting discussion starts here: https://www.mail-archive.com/core-libs-dev%40openjdk.java.net/msg23596.html which resulted in the following fix: http://hg.openjdk.java.net/jdk9/dev/jdk/rev/d04102f69d46 So these comments are a direct result of that discussion. We haven't concluded with 100% certainty what causes the OOME to be thrown from (c instanceof Cleaner) expression. David Holmes said that instanceof is not a class loading trigger, so it might be that it causes OOME for some other reason. We nevertheless kept preloading of Cleaner class in the ReferenceHandler's static initializer. It might be that either preloading of Cleaner class or the comments are superfluous but we don't know which. We would need to revisit this and determine what causes OOME in the instanceof operator to be sure.

The fix for JDK-6857566 did add tryHandlePending() method, but did not change the semantics nor added mentioned comments and static initializer. The comments, exception handling and static initializer were added by previous fixes for bug JDK-7038914. A lengthy but interesting discussion about that bug starts here: http://mail.openjdk.java.net/pipermail/core-libs-dev/2013-April/016510.html or use this link which keeps the thread together when it spans multiple months: https://www.mail-archive.com/core-libs-dev%40openjdk.java.net/msg16250.html We concluded that the OOME could be thrown from within lock.wait() method when the waiting thread was interrupted (that is what the failing test does - I haven't seen the test code, but that's what the discussion brought up). The interrupted thread was about to throw the InterruptedException, but allocating that exception object failed because of heap fullness and pre-allocated OOME was thrown instead. The static initializer is just making sure that no class loading happens when InterruptedException is attempted to be thrown, by pre-loading it's class. I don't think this situation happens in real-world code, since no one is interrupting the ReferenceHandler thread. The fix was made mainly to silence the test.

The comments do seem contradictory, so it looks like further investigation is warranted. Assigning to Peter Levart, author of the above-mentioned changeset.

The tryHandlePending() method was added by the fix for JDK-6857566, changeset 9934d34ed3c0.