JDK-8153107 : enabling ObjectSynchronizer::quick_enter() on ARM64 causes hangs
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 9
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • OS: linux
  • CPU: arm
  • Submitted: 2016-03-30
  • Updated: 2018-06-26
  • Resolved: 2018-06-19
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11
11 b19Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Description
The fixes for the following bugs:

   JDK-8077392 Stream fork/join tasks occasionally fail to complete
   JDK-8131715 backout the fix for JDK-8079359 when JDK-8077392 is fixed

cause hangs on ARM64. There is something about the
ObjectSynchronizer::quick_enter() code path that is not
working right with the ARM64 platform dependent code.
Comments
Passed a Mach5 builds-tier1,jdk-tier1,jdk-tier2,hs-tier1,hs-tier2 run with a single unrelated failure due to JDK-8204842. Now we just need to hear back from the ARM folks about their testing.
15-06-2018

I'm planning to sponsor this fix for [~apetushkov].
15-06-2018

Webrev with alternative fix which only changes the logic in the Arm port specific code is here http://cr.openjdk.java.net/~apetushkov/8153107/
15-06-2018

On 6/13/18 1:26 PM, Andrey Petushkov wrote: > Ok, I shall I admit I did not dig deep enough. Yes, there is platform specifics > which plays a role here. That is: all platforms except Oracle ARM (all bitness) > and aarch32 fill-in stack lock���s DHW with result of same-stack-page-lock probe > result. Which is always !=0 when failed. Oracle���s ARM and aarch32 do that > conditionally, based on whether this test actually passes. Hence for the latter > platforms the DHW in the stack lock really contains random value which > occasionally can be 0, and being not rewritten (e.g. with markOopDesc::unused_mark(), > aka 3) will indicate recursive stack lock for ::fast_unlock(). Not sure about > those stack walkers mentioned in the comment in ::quick_enter(). So, right, > it���s possible to fix the problem in the platform-specific code only. Leaving up > to you to decide whether it���s proper design :) > > Regards, > Andrey
13-06-2018

Email from Derek White: Hi Andrey, Andrew, > -----Original Message----- > From: hotspot-runtime-dev [mailto:hotspot-runtime-dev- > bounces@openjdk.java.net] On Behalf Of Andrew Haley > On 06/13/2018 01:40 PM, Andrey Petushkov wrote: > >> It looks like the cause of the >> https://bugs.openjdk.java.net/browse/JDK-8153107 >> <https://bugs.openjdk.java.net/browse/JDK-8153107> problem is that >> ObjectSynchronizer::quick_enter implementation leaves the BasicLock >> marked as recursive (DHW==0) in case that the lock is inflated and >> already taken. As such the routines which first check the recursive >> stack locking (e.g. fast_exit) consider this lock as non-inflated >> recursive and hence do not decrement _recursions field of >> ObjectMonitor, leaving the monitor locked forever. In contrary the >> slow path does work the right way: ObjectSynchronizer::slow_enter >> first marks stack lock as non-locked and non-recursive (DHW==3) and >> then delegates to ObjectMonitor::enter to increment _recursions. > >> So from the source code prospective this bug looks like >> platform-independent, contrary to what���s was found under >> https://bugs.openjdk.java.net/browse/JDK-8131715 >> <https://bugs.openjdk.java.net/browse/JDK-8131715>. Unfortunately I >> cannot prove it with an example. The only case it fails for me is >> runtime/CreateMirror/ArraysNewInstanceBug.java hotspot jtreg test on >> aarch32 openjdk port. The following changes fix the problem: > > Ouch. The code is due to 8167501, which was a bug in the Oracle ARM64 > and ARM32 ports. This bug was never seen in the AARCH64 open port, so > the workaround shouldn't be enabled for it. Yes, I've been testing this last week myself (e.g. setting SyncFlags). It's responsible for about a 2% drop in SPECjbb on aarch64 when they added it. > 8167501: ARMv7 Linux C2 compiler crashes running jtreg harness on MP > systems > Reviewed-by: dcubed > diff -r afd6ae4fec81 -r 71d7ced6c439 > hotspot/src/share/vm/runtime/sharedRuntime.cpp > --- a/hotspot/src/share/vm/runtime/sharedRuntime.cpp Thu Nov 03 > 11:53:20 2016 +0530 > +++ b/hotspot/src/share/vm/runtime/sharedRuntime.cpp Thu Nov 03 > 10:44:17 2016 -0400 > @@ -1983,8 +1983,10 @@ > // Handles the uncommon case in locking, i.e., contention or an inflated > lock. > JRT_BLOCK_ENTRY(void, > SharedRuntime::complete_monitor_locking_C(oopDesc* _obj, BasicLock* > lock, JavaThread* thread)) > // Disable ObjectSynchronizer::quick_enter() in default config > - // on AARCH64 until JDK-8153107 is resolved. > - if (AARCH64_ONLY((SyncFlags & 256) != 0 &&) > !SafepointSynchronize::is_synchronizing()) { > + // on AARCH64 and ARM until JDK-8153107 is resolved. > + if (ARM_ONLY((SyncFlags & 256) != 0 &&) > + AARCH64_ONLY((SyncFlags & 256) != 0 &&) > + !SafepointSynchronize::is_synchronizing()) { > // Only try quick_enter() if we're not trying to reach a safepoint > > Why did you move the code which handles recursive locking counts? Do you > believe that it's incorrect where it is, and if so, why? I'm similarly suspicious of pointing the blame at the shared code. I'll look some more also. - Derek
13-06-2018

More email from aph@redhat.com: On 06/13/2018 03:54 PM, Daniel D. Daugherty wrote: > This particular hang was never seen on any of Oracle's other supported > platforms so I'm a bit hesitant on making a change to shared code > without a better understand of why. Yeah, my thoughts exactly. It doesn't help that the platform- specific variants of quick_enter() are all subtly different. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. <https://www.redhat.com> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
13-06-2018

Email from andrey.petushkov@gmail.com: > On 13 Jun 2018, at 16:37, Andrew Haley <aph@redhat.com> wrote: > > On 06/13/2018 01:40 PM, Andrey Petushkov wrote: > >> It looks like the cause of the >> https://bugs.openjdk.java.net/browse/JDK-8153107 >> <https://bugs.openjdk.java.net/browse/JDK-8153107> problem is that >> ObjectSynchronizer::quick_enter implementation leaves the BasicLock >> marked as recursive (DHW==0) in case that the lock is inflated and >> already taken. As such the routines which first check the recursive >> stack locking (e.g. fast_exit) consider this lock as non-inflated >> recursive and hence do not decrement _recursions field of >> ObjectMonitor, leaving the monitor locked forever. In contrary the >> slow path does work the right way: ObjectSynchronizer::slow_enter >> first marks stack lock as non-locked and non-recursive (DHW==3) and >> then delegates to ObjectMonitor::enter to increment _recursions. > >> So from the source code prospective this bug looks like >> platform-independent, contrary to what���s was found under >> https://bugs.openjdk.java.net/browse/JDK-8131715 >> <https://bugs.openjdk.java.net/browse/JDK-8131715>. Unfortunately I >> cannot prove it with an example. The only case it fails for me is >> runtime/CreateMirror/ArraysNewInstanceBug.java hotspot jtreg test on >> aarch32 openjdk port. The following changes fix the problem: > > Ouch. The code is due to 8167501, which was a bug in the Oracle ARM64 > and ARM32 ports. This bug was never seen in the AARCH64 open port, > so the workaround shouldn't be enabled for it. > > 8167501: ARMv7 Linux C2 compiler crashes running jtreg harness on MP systems > Reviewed-by: dcubed > diff -r afd6ae4fec81 -r 71d7ced6c439 hotspot/src/share/vm/runtime/sharedRuntime.cpp > --- a/hotspot/src/share/vm/runtime/sharedRuntime.cpp Thu Nov 03 11:53:20 2016 +0530 > +++ b/hotspot/src/share/vm/runtime/sharedRuntime.cpp Thu Nov 03 10:44:17 2016 -0400 > @@ -1983,8 +1983,10 @@ > // Handles the uncommon case in locking, i.e., contention or an inflated lock. > JRT_BLOCK_ENTRY(void, SharedRuntime::complete_monitor_locking_C(oopDesc* _obj, BasicLock* lock, JavaThread* thread)) > // Disable ObjectSynchronizer::quick_enter() in default config > - // on AARCH64 until JDK-8153107 is resolved. > - if (AARCH64_ONLY((SyncFlags & 256) != 0 &&) !SafepointSynchronize::is_synchronizing()) { > + // on AARCH64 and ARM until JDK-8153107 is resolved. > + if (ARM_ONLY((SyncFlags & 256) != 0 &&) > + AARCH64_ONLY((SyncFlags & 256) != 0 &&) > + !SafepointSynchronize::is_synchronizing()) { > // Only try quick_enter() if we're not trying to reach a safepoint > > Why did you move the code which handles recursive locking counts? Do > you believe that it's incorrect where it is, and if so, why? That���s exactly what I���ve been trying to describe. The displaced header shall be assigned 3 when recursive locking happens on inflated lock. At least with the current design of the exit code which uses it as an indicator that inflated lock is present. So it seems to me that currently the number of recursive locks on an object is the count of relevant stack lock structures with mark(DHW)!=3 (i.e. first lock + any stack locks prior to inflation) + _recursions field value of the inflated lock structure. The sequence of operations under review was breaking this assumption: any recursion via quick_enter was increasing both number of stack locks with DHW==0 and value of _recursions. While the slow_enter only increased _recursions The above is idea I got from reading the code, I hope someone more experienced (@Daniel Daugherty?) can chime in and prove me right or wrong Andrey > > -- > Andrew Haley > Java Platform Lead Engineer > Red Hat UK Ltd. <https://www.redhat.com> > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
13-06-2018

Email from aph@redhat.com: On 06/13/2018 01:40 PM, Andrey Petushkov wrote: > It looks like the cause of the > https://bugs.openjdk.java.net/browse/JDK-8153107 > <https://bugs.openjdk.java.net/browse/JDK-8153107> problem is that > ObjectSynchronizer::quick_enter implementation leaves the BasicLock > marked as recursive (DHW==0) in case that the lock is inflated and > already taken. As such the routines which first check the recursive > stack locking (e.g. fast_exit) consider this lock as non-inflated > recursive and hence do not decrement _recursions field of > ObjectMonitor, leaving the monitor locked forever. In contrary the > slow path does work the right way: ObjectSynchronizer::slow_enter > first marks stack lock as non-locked and non-recursive (DHW==3) and > then delegates to ObjectMonitor::enter to increment _recursions. > So from the source code prospective this bug looks like > platform-independent, contrary to what���s was found under > https://bugs.openjdk.java.net/browse/JDK-8131715 > <https://bugs.openjdk.java.net/browse/JDK-8131715>. Unfortunately I > cannot prove it with an example. The only case it fails for me is > runtime/CreateMirror/ArraysNewInstanceBug.java hotspot jtreg test on > aarch32 openjdk port. The following changes fix the problem: Ouch. The code is due to 8167501, which was a bug in the Oracle ARM64 and ARM32 ports. This bug was never seen in the AARCH64 open port, so the workaround shouldn't be enabled for it. 8167501: ARMv7 Linux C2 compiler crashes running jtreg harness on MP systems Reviewed-by: dcubed diff -r afd6ae4fec81 -r 71d7ced6c439 hotspot/src/share/vm/runtime/sharedRuntime.cpp --- a/hotspot/src/share/vm/runtime/sharedRuntime.cpp Thu Nov 03 11:53:20 2016 +0530 +++ b/hotspot/src/share/vm/runtime/sharedRuntime.cpp Thu Nov 03 10:44:17 2016 -0400 @@ -1983,8 +1983,10 @@ // Handles the uncommon case in locking, i.e., contention or an inflated lock. JRT_BLOCK_ENTRY(void, SharedRuntime::complete_monitor_locking_C(oopDesc* _obj, BasicLock* lock, JavaThread* thread)) // Disable ObjectSynchronizer::quick_enter() in default config - // on AARCH64 until JDK-8153107 is resolved. - if (AARCH64_ONLY((SyncFlags & 256) != 0 &&) !SafepointSynchronize::is_synchronizing()) { + // on AARCH64 and ARM until JDK-8153107 is resolved. + if (ARM_ONLY((SyncFlags & 256) != 0 &&) + AARCH64_ONLY((SyncFlags & 256) != 0 &&) + !SafepointSynchronize::is_synchronizing()) { // Only try quick_enter() if we're not trying to reach a safepoint Why did you move the code which handles recursive locking counts? Do you believe that it's incorrect where it is, and if so, why? -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. <https://www.redhat.com> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
13-06-2018

E-mail from andrey.petushkov@gmail.com: Dear Hotspot developers, It looks like the cause of the https://bugs.openjdk.java.net/browse/JDK-8153107 <https://bugs.openjdk.java.net/browse/JDK-8153107> problem is that ObjectSynchronizer::quick_enter implementation leaves the BasicLock marked as recursive (DHW==0) in case that the lock is inflated and already taken. As such the routines which first check the recursive stack locking (e.g. fast_exit) consider this lock as non-inflated recursive and hence do not decrement _recursions field of ObjectMonitor, leaving the monitor locked forever. In contrary the slow path does work the right way: ObjectSynchronizer::slow_enter first marks stack lock as non-locked and non-recursive (DHW==3) and then delegates to ObjectMonitor::enter to increment _recursions. So from the source code prospective this bug looks like platform-independent, contrary to what���s was found under https://bugs.openjdk.java.net/browse/JDK-8131715 <https://bugs.openjdk.java.net/browse/JDK-8131715>. Unfortunately I cannot prove it with an example. The only case it fails for me is runtime/CreateMirror/ArraysNewInstanceBug.java hotspot jtreg test on aarch32 openjdk port. The following changes fix the problem: diff --git a/src/hotspot/share/runtime/sharedRuntime.cpp b/src/hotspot/share/runtime/sharedRuntime.cpp --- a/src/hotspot/share/runtime/sharedRuntime.cpp +++ b/src/hotspot/share/runtime/sharedRuntime.cpp @@ -1985,9 +1985,7 @@ JRT_BLOCK_ENTRY(void, SharedRuntime::complete_monitor_locking_C(oopDesc* _obj, BasicLock* lock, JavaThread* thread)) // Disable ObjectSynchronizer::quick_enter() in default config // on AARCH64 and ARM until JDK-8153107 is resolved. - if (ARM_ONLY((SyncFlags & 256) != 0 &&) - AARCH64_ONLY((SyncFlags & 256) != 0 &&) - !SafepointSynchronize::is_synchronizing()) { + if (!SafepointSynchronize::is_synchronizing()) { // Only try quick_enter() if we're not trying to reach a safepoint // so that the calling thread reaches the safepoint more quickly. if (ObjectSynchronizer::quick_enter(_obj, thread, lock)) return; diff --git a/src/hotspot/share/runtime/synchronizer.cpp b/src/hotspot/share/runtime/synchronizer.cpp --- a/src/hotspot/share/runtime/synchronizer.cpp +++ b/src/hotspot/share/runtime/synchronizer.cpp @@ -220,11 +220,6 @@ // Case: light contention possibly amenable to TLE // Case: TLE inimical operations such as nested/recursive synchronization - if (owner == Self) { - m->_recursions++; - return true; - } - // This Java Monitor is inflated so obj's header will never be // displaced to this thread's BasicLock. Make the displaced header // non-NULL so this BasicLock is not seen as recursive nor as @@ -237,6 +232,11 @@ // and last are the inflated Java Monitor (ObjectMonitor) checks. lock->set_displaced_header(markOopDesc::unused_mark()); + if (owner == Self) { + m->_recursions++; + return true; + } + if (owner == NULL && Atomic::replace_if_null(Self, &(m->_owner))) { assert(m->_recursions == 0, "invariant"); assert(m->_owner == Self, "invariant"); Regards, Andrey
13-06-2018

Should be looked at in 11 if we are going to allocate resources for more "monitors" work; definitely don't need this for 10.
05-04-2017

I think there were some "memory sync" issues fixed for ARM64 in the last few months. I will try to find those.
05-04-2017

This issue also impacts armv7 binaries.
03-11-2016

ARM64 was never a target platform for the Contended Locking project. While this bug might get addressed in the JDK9 time frame, I currently have no plans to do so.
04-04-2016