JDK-8167108 : inconsistent handling of SR_lock can lead to crashes
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 9
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • OS: generic
  • CPU: generic
  • Submitted: 2016-10-04
  • Updated: 2023-08-28
  • Resolved: 2017-11-24
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 10
10 b36Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Description
While writing some g-tests to exercise the Monitor class,
I happened to notice some inconsistencies with how the
SR_lock is handled. More details to follow...
Comments
URL: http://hg.openjdk.java.net/jdk/jdk/rev/8d15b1369c7a User: jwilhelm Date: 2017-12-07 09:00:22 +0000
07-12-2017

URL: http://hg.openjdk.java.net/jdk/hs/rev/8d15b1369c7a User: dcubed Date: 2017-11-24 04:25:00 +0000
24-11-2017

We're investigating the possibility of using Safe Memory Reclamation (SMR) as a long term fix instead of doing yet another work around to close the timing window a little bit more. This is going to take some time to prototype, time to reason through, time to code review since SMR is a new concept in the HotSpot code base, and then time to test thoroughly. In order to reduce Tums consumption, I'm de-targeting this fix from JDK9 and re-targeting this fix to JDK10 (with an eye towards a backport to a JDK9 update release down the road after much bake time)...
02-12-2016

Will be lots of discussion about this bug so setting the date out by about a month.
14-10-2016

Strangely enough this bug hasn't taken a lot of time and is just a slight digression/diversion from what I'm currently working on which is some gtests for the Monitor/Mutex code. Until I get my latest test up and running, this bug is just a place to record what I _think_ I see in the code. Could be wrong, could be right, could be somewhere in between. We'll have to discuss the appropriateness of MutexLockerEx and what it brings to the thread race party during the code review. Of course, my testing might show that these ideas don't pan out and in that case this bug will get closed...
05-10-2016

Don't want to see you waste a lot of time and effort doing unnecessary things. I think we use MutexLockerEx for the no safepoint check, not because of its NULL tolerance! I think things would go badly wrong if the SR_lock were actually NULL.
05-10-2016

I think you might want to wait for the webrev before jumping in here. For example: > But hiding the fact SR_lock() may be NULL is a mistake I think because > we should be very well aware of when it can and can not be NULL! A majority of the helper usage for SR_lock uses MutexLockerEx. I'm only changing a few more so the tolerance of the system for SR_lock being NULL has been there since I added the SR_lock back in 2001.11... I was planning to have you pre-review this crazy idea anyway, but I want to develop the test first.
05-10-2016

Dan: not sure I agree with some of what you have written but it will take too many cycles to investigate in depth and I don't have those cycles right now. But hiding the fact SR_lock() may be NULL is a mistake I think because we should be very well aware of when it can and can not be NULL! Also acquire/release for _terminated field is only needed if there are other data fields involved. Ditto for _SR_Lock access - we set it to NULL so that the SR_handler executing in the current thread sees it. No need for any acquire/release there.
05-10-2016

Here's my current set of gory details: - MutexLockerEx, MonitorLockerEx, and MutexUnlockerEx are the right helper objects for SR_lock. MutexLocker, MonitorLocker and MutexUnlocker should not be used because the non-Ex helpers do not tolerate a NULL Mutex or Monitor parameter and will crash on the NULL pointer. - JavaThread::is_exiting() calls should be made as soon as possible after the SR_lock is lock()'ed. This reduces the size of the window before the thread is deleted and the Monitor destructor is called. - Direct uses of SR_lock()->XXX should be replaced with the appropriate helper or we can crash on the NULL pointer dereference. - The Thread::_SR_lock field should be volatile, the NULL value should be set with release semantics and its getter should use acquire semantics. - JavaThread should have its own SR_lock() getter that checks for is_exiting() before returning the SR_lock. - JavaThread is_exiting() and is_terminated() should use acquire semantics to fetch the _terminated field. - JavaThread set_terminated() and set_terminated_value() should use release semantics to set the _terminated field. Of course, I'm planning to update/rewrite a bunch of comments and some new ones to make the race very clear.
04-10-2016