JDK-7191630 : ReentrantReadWriteLock in inconsistent state
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.util.concurrent
  • Affected Version: 6u25
  • Priority: P4
  • Status: Closed
  • Resolution: Cannot Reproduce
  • OS: linux_redhat_5.0
  • CPU: x86
  • Submitted: 2012-08-15
  • Updated: 2015-09-22
  • Resolved: 2015-09-22
Related Reports
Relates :  
Description
FULL PRODUCT VERSION :
1.6.0_25-b06

ADDITIONAL OS VERSION INFORMATION :
Linux x86-64
2.6.18-194.32.1.el5

A DESCRIPTION OF THE PROBLEM :
I am raising this bug as advised by an Oracle Support Engineer in SR 3-6014003521.

We have only seen once incidence of this problem but, based on our heap dump analysis, it definitely looks like a bug.  I accept that it would be hard to fix this problem based on the details below, but I think it is worth raising it anyway for the sake of visibility.

We had a deadlock-like failure of our application recently. We are trying to find the root cause.

I initially reported it on the BDB JE forum (https://forums.oracle.com/forums/thread.jspa?messageID=10480988) but further analysis of the heap and thread dumps has exposed a problem that looks like a Java bug.

We���re using Oracle JVM 1.6.0_25-b06, running on Linux version: 2.6.18-194.32.1.el5.

Thread t@41101 was blocked indefinitely in ReentrantReadWriteLock.writeLock().lock().

We know from code inspection that nothing ever takes a read lock on this ReentrantReadWriteLock, so started trying to find out what has got its write lock.

The output of "jstack -l" should list which thread holds this exclusive lock in the "locked ownable synchronizers" section but does not.

Our first theory was that the owning thread might have terminated.

We wrote a simple test program to explore this. We found from heap dump analysis that even if the owning thread terminates, the lock itself still refers to it via the ReentrantReadWriteLock.WriteLock.sync.exclusiveOwnerThread field. Looking in the java.util.concurrent source code, it seems that this field only gets null'ed when the lock is released.

However, looking in the heap dump taken following our "deadlock", we were surprised to find that the lock in question has a null sync.exclusiveOwnerThread field.

Surely a write lock should be in one of two states (except possibly for a tiny instant when its state is being non-atomically switched):

1) The lock is available, and sync.exclusiveOwnerThread is null
2) The lock is unavailable, and sync.exclusiveOwnerThread is populated

But our lock was indefinitely in this state:

3) The lock is unavailable and sync.exclusiveOwnerThread is null

Note that I also asked this question on the Concurrency Interest list, and got a response from Doug Lea confirming that it looks like a VM, OS, or processor bug:
http://cs.oswego.edu/pipermail/concurrency-interest/2012-August/009635.html


STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Not known

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
1) The lock is available, and sync.exclusiveOwnerThread is null
2) The lock is unavailable, and sync.exclusiveOwnerThread is populated
ACTUAL -
The lock is unavailable and sync.exclusiveOwnerThread is null

REPRODUCIBILITY :
This bug can be reproduced rarely.

---------- BEGIN SOURCE ----------
not available
---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
not known

Comments
Only ever seen once ...
22-09-2015

Problem was seen once and is not readily reproducible - lowering to P4. Might just need to close out as "not reproducible".
14-06-2013

Moved to correct category so that this shows up when searching for other RRWL issues.
13-06-2013

Further info from Phil Harvey on 4/01/2013 I exhumed our heap dump and inspected the state of the lock in question. I'm not sure exactly which field you're interested so I've included a few. Here is the ReentrantReadWriteLock$NonfairSync object: { objectAddress: 0x6f498a698 (included for my own benefit) cachedHoldCounter: null, state: 0, exclusiveOwnerThread: null, head: <AbstractQueuedSynchronizer$Node @ 0x6fdbe8da0>, tail: <AbstractQueuedSynchronizer$Node @ 0x6fdbe8da0>, readHolds: { class: ReentrantReadWriteLock$Sync$ThreadLocalHoldCounter, threadLocalHashCode: 1855151196 } } And then on 9/01/2013 I checked the heap dump again and I don't think any threads have set a value on the readHolder. My reasoning is as follows: - I asked Eclipse Memory Analyser Tool to show all incoming references to the ThreadLocalHoldCounter we're interested in (object address 0x6f4988210 for future reference). - The only incoming reference is the one from the NonfairSync's "readHolds" field. - If any thread had set a value on the ThreadLocalHoldCounter, an additional incoming reference would exist, namely that from a java.lang.ThreadLocal. ThreadLocalMap.Entry's "referent" field. - QED. This conclusion ties in with what the Oracle JE team told us (remember that the ReentrantReadWriteLock is within their code). Based on information from the heap dump about the object that manages this particular RRWL, they believe that nothing would ever take a read lock on it.
30-01-2013

Response from Phil Harvey on 22/12/2012 I will check the read counter. We still have the heap dump so this should be possible. We don't have a reproducible test case. In fact, we only saw this problem once. I realise there is probably not enough information for you to make progress on this issue, but thought it worth creating the bug in case other people have seen similar behaviour. Phil On Nov 26, 2012 5:02 AM, "David Holmes" <david.holmes@oracle.com> wrote: Hi Phil, I just wanted to follow up with you on this issue again. Reading through all the emails/forums it wasn't clear that when you inspected the RRWL state that you verified that the read counter was zero - was that verified? You also mention a simple reproducer program - is that available? Without being able to reproduce this there is very little for us to work on.
30-01-2013

Email sent - awaiting response.
28-11-2012

Please ask Phil Harvey if we could get his "simple test program". Otherwise, this is likely to be a duplicate of a known bug due to asynchronous issues such as any runtime exception or Thread.stop.
13-11-2012

PUBLIC COMMENTS This is the referenced mailing on the c-i list Doug Lea writes: On 08/02/12 04:15, Phil Harvey wrote: > Thanks for the advice guys. > > I've checked our code and can confirm we make no Thread.stop() or > Thread.stop(Throwable) calls. > > Also, we would have seen the stack trace of a StackOverflowError in our logs. > > So I still have no idea what caused this problem. I can only assume it's a Java > bug. Or am I jumping to conclusions prematurely? Everyone (including me) who has looked at this agrees that the the situation you describe "cannot" happen at the Java level. So it could be a VM, OS, or processor bug. But until there is a self-contained test case, I don't think much can be done to further diagnose. -Doug > Phil > > On Aug 2, 2012 5:14 AM, "Stanimir Simeonoff" <stanimir at riflexo.com > <mailto:stanimir at riflexo.com>> wrote: > > David, > I am quite positive it's Thread.stop, as setState is inlined. I've seen that > case due to Thread.stop quite a few times too. > Even though it's possible to avoid the disaster via some awkward steps like: > waiting for sleep mode/examine the stack trace, followed by > Thread.suspend/check again, then stop(). Alternatively peppering the code w/ > stop points during class loading is an option but a hard one. > > That has made me wonder if hotspot can prevent adding safe points in > java.util.concurrent.locks classes, or at least the safe point to skip > checking for Thread.stop outside park(). That's it the only safe point would > be park(), as side effect it can have minor performance benefits. > > I know Thread.stop is deprecated but still there is enough middleware that > makes use of. > > Stanimir > > On Thu, Aug 2, 2012 at 5:56 AM, <davidcholmes at aapt.net.au > <mailto:davidcholmes at aapt.net.au>> wrote: > > Phil, > > A RRWL that has no owner but can not be locked is definitely a problem. > If this is not 6822370 then the other possibilities are async-exceptions > occurring in the release code: > > if (free) > setExclusiveOwnerThread(null); > <=== async exception here > setState(nextc); > > Two possible sources of the async exception: > > a) Use of Thread.stop elsewhere > b) StackOverflowException was triggered trying to call setState > > David Holmes > ------------ > > Quoting Phil Harvey <phil at philharveyonline.com > <mailto:phil at philharveyonline.com>>: > > Hi, > > Yes, we had looked at that bug but assumed we were not experiencing > it here > because we are using Java 1.6.0_25, and it was reported fixed in > 1.6.0_18. > > Do you agree that the unusual state of the ReentrantReadWriteLock > suggests > we've hit a bug? > > Phil > On Aug 1, 2012 3:05 PM, "Ariel Weisberg" <ariel at weisberg.ws > <mailto:ariel at weisberg.ws>> wrote: > > Hi, > > I remember that. That was fixed Oracle JDK 1.6.0_18. It hasn't > been > reproducing for us since 1.6.0_18, but I am not sure if we are using > ReentrantLock in the same way anymore. > > The reproducer we used was > https://github.com/VoltDB/__voltdb/tree/master/tools/lbd___lock_test > <https://github.com/VoltDB/voltdb/tree/master/tools/lbd_lock_test> > If I remember correctly it prints '.' as it goes and when it > hangs it > stops printing dots. > > Regards, > Ariel > > On Wed, Aug 1, 2012, at 09:27 AM, ?iktor ?lang wrote: > > Hi Phil, > > Related to this? > http://bugs.sun.com/view_bug.__do?bug_id=6822370 > <http://bugs.sun.com/view_bug.do?bug_id=6822370> > > Cheers, > ? > > On Wed, Aug 1, 2012 at 3:20 PM, Phil Harvey > <phil at philharveyonline.com > <mailto:phil at philharveyonline.com>>__wrote: > > We had a deadlock-like failure of our application recently. > > I initially reported it on the BDB JE forum ( > https://forums.oracle.com/__forums/thread.jspa?messageID=__10480988 > <https://forums.oracle.com/forums/thread.jspa?messageID=10480988>) > but > further analysis of the heap and thread dumps has exposed a > problem that > looks like a Java locking bug. I'm hoping you can offer advice > on whether > this is the case. > > We?re using Oracle JVM 1.6.0_25-b06, running on Linux version: > > 2.6.18-194.32.1.el5. > > We are launching Java as follows: java -server > -XX:+UseConcMarkSweepGC > -XX:+__HeapDumpOnOutOfMemoryError -Xmx1024m ... > > Several consecutive thread dumps showed that Thread t at 41101 was > blocked > indefinitely in ReentrantReadWriteLock. writeLock().lock(). > > We know from code inspection that nothing ever takes a read lock > on this > ReentrantReadWriteLock, so started trying to find out what has > got its > write lock. > > The output of "jstack -l" should list which thread holds this > exclusive > lock in the "locked ownable synchronizers" section but does not. > > Our first theory was that the owning thread might have terminated. > > We wrote a simple test program to explore this. We found from > heap dump > analysis that even if the owning thread terminates, the lock > itself still > refers to it via the ReentrantReadWriteLock.__WriteLock.sync. > exclusiveOwnerThread field. Looking in the java.util.concurrent > source > code, it seems that this field only gets null'ed when the lock > is released. > > However, looking in the heap dump taken following our > "deadlock", we were > surprised to find that the lock in question has a null > sync.exclusiveOwnerThread field. > > Surely a write lock should be in one of two states (except > possibly for a > tiny instant when its state is being non-atomically switched): > > 1) The lock is available, and sync.exclusiveOwnerThread is null > 2) The > lock is unavailable, and sync.exclusiveOwnerThread is populated > > But our lock was indefinitely in this state: > > 3) The lock is unavailable and sync.exclusiveOwnerThread is null > > Does anyone know whether this represents a bug? If not, can you > explain > what it means for a lock to be in this counterintuitive state? > > Thanks, Phil > > _________________________________________________ > Concurrency-interest mailing list > Concurrency-interest at cs.__oswego.edu > <mailto:Concurrency-interest at cs.oswego.edu> > http://cs.oswego.edu/mailman/__listinfo/concurrency-interest > <http://cs.oswego.edu/mailman/listinfo/concurrency-interest> > > > > > > -- > Viktor Klang > > Akka Tech Lead > Typesafe <http://www.typesafe.com/> - The software stack for > applications > that scale > > Twitter: @viktorklang > *_________________________________________________* > > Concurrency-interest mailing list > Concurrency-interest at cs.__oswego.edu > <mailto:Concurrency-interest at cs.oswego.edu> > http://cs.oswego.edu/mailman/__listinfo/concurrency-interest > <http://cs.oswego.edu/mailman/listinfo/concurrency-interest> > > > > > > > > > _________________________________________________ > Concurrency-interest mailing list > Concurrency-interest at cs.__oswego.edu > <mailto:Concurrency-interest at cs.oswego.edu> > http://cs.oswego.edu/mailman/__listinfo/concurrency-interest > <http://cs.oswego.edu/mailman/listinfo/concurrency-interest> > > > > > _______________________________________________ > Concurrency-interest mailing list > Concurrency-interest at cs.oswego.edu > http://cs.oswego.edu/mailman/listinfo/concurrency-interest
10-09-2012