JDK-8267042 : bug in monitor locking/unlocking on ARM32 C1 due to uninitialized BasicObjectLock::_displaced_header
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 17
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • OS: generic
  • CPU: aarch32
  • Submitted: 2021-05-12
  • Updated: 2021-07-20
  • Resolved: 2021-06-21
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11 JDK 17 JDK 18
11.0.13Fixed 17Fixed 18 b03Fixed
Related Reports
Relates :  
Description
On 5/12/21 1:02 PM, Chris Cole wrote:
> Hi,
>
> Not sure if this is the appropriate place or method to report an OpenJDK
> bug, if not please advise.
>
> I have discovered a bug with ARM32 C1 monitor locking/unlocking that can
> result in deadlock. The bug was introduced with JCK-8241234 "Unify monitor
> enter/exit runtime entries" [1]. This change introduced a call to
> ObjectSynchronizer::quick_entry() within the logic for
> Runtime1::monitorenter().
> If the monitor is inflated and already owned by the current thread, then
> ObjectSynchronizer::quick_entry() simply increments
> ObjectMonitor::_recursions and returns. In this case
> Runtime1::monitorenter() returns to the JIT compiled code without calling
> "lock->set_displaced_header(markWord::unused_mark())" (see [2]). For ARM32
> the _displaced_header field is not always initialized in JIT code before
> calling Runtime1::monitorenter() helper. If the uninitialized value of
> _displaced_header field on stack happens to be NULL, this causes an issue
> because the JIT code to exit the monitor first checks for a NULL
> _displaced_header as an indication for non-inflated recursive locking which
> is a noop for exiting the monitor (see [3]). This means that the
> Runtime1::monitorexit() helper is not called as required to exit this
> inflated monitor, and the ObjectMonitor::_recursions is not decremented as
> required. This leads to thread not unlocking the monitor when required and
> deadlock when another thread tries to lock the monitor.
>
> This bug is not present on AArch64 and x86, because the displaced header is
> initialized in JIT code with the "unlocked object header" value (which is
> non-zero) before calling Runtime1::monitorenter() helper (see [4] and [5]).
> Note sure about other CPU architectures.
>
> I see two ways to fix this.
> 1) In ObjectSynchronizer::quick_entry() move the
> "lock->set_displaced_header(markWord::unused_mark())" statement to before
> the "if (owner == current)" at line 340 in share/runtime/synchronizer.cpp
> (see [6]), so that Runtime1::monitorenter() helper logic always initializes
> the displaced header field as was the case before JCK-8241234.
> 2) For ARM32 add JIT code to initialize the displaced header field before
> calling Runtime1::monitorenter() helper as done for AArch64 and x86.
>
> Not sure which is better (or maybe both are required for some reason I am
> not aware of). I believe this "displacted header" on the stack can be
> looked at by stack walkers but I am not familiar with the exact details and
> if there are implications on this fix.
>
> The bug is also present in OpenJDK 11.0.10 and later (introduced by the
> backport of JDK-8241234 [1]).
>
> I/my company (Sage Embedded Software) has signed the Oracle Contributor
> Agreement (OCA) and have been granted access to JCK.
>
> The bug can be reproduced in my environment with the OpenJDK Community TCK
> testing of java.io.PipedReader that deadlocks, but because reproduction of
> the issue requires uninitialized stack field to be zero, it might not
> happen in some environments. I have a Java test case that can reproduce
> this issue on ARM in 32 bit mode. It is pasted inline below at the end of
> the email. There is a "getZeroOnStack()" method that I think helps get a
> zero into the uninitialized _displaced_header field. The test case source
> code is copied from OpenJDK java.io.PipedReader source code and then
> modified. It needs to be run with only C1 enabled (I am using minimal
> variant to enforce this) and the following command line options
> (-XX:-BackgroundCompilation -XX:CompileThreshold=500
> -XX:CompileOnly="com.sageembedded.test.MonitorBugTest::receive"). The test
> case should run and then end with "Source thread done" and "Reading
> complete" output if the bug is not reproduced. If the monitor bug is
> reproduced the test case will not exit and the main thread will be
> deadlocked, with the main thread last printing "read() before wait" and
> missing "read() after wait" and "Reading complete". If useful I can provide
> the output of this test cause including -XX:PrintAssebly and logging that I
> added to ObjectSynchronizer::quick_entry() that shows uninitialized
> lock->_displaced_header and ObjectMonitor->_recursions continue to get
> incremented (into the 1000s) as the MonitorBugTest.receive() method is
> called in a loop.
>
> Please let me know if there is anything else that would be helpful. I hope
> to become active in the OpenJDK Community. My time is a little limited at
> the moment, so sometimes it might take a day to respond (I have 3 and 6
> year old kids). In the coming years I expect to have additional time to be
> more involved in the OpenJDK Community.
>
> Best regards,
> Chris Cole
> Sage Embedded Software LLC
>
> [1] https://bugs.openjdk.java.net/browse/JDK-8241234
> [2]
> https://github.com/openjdk/jdk/blob/dfe8833f5d9a9ac59857143a86d07f85769b8eae/src/hotspot/share/runtime/synchronizer.cpp#L343
> [3]
> https://github.com/openjdk/jdk/blob/dfe8833f5d9a9ac59857143a86d07f85769b8eae/src/hotspot/cpu/x86/c1_MacroAssembler_x86.cpp#L130
> [4]
> https://github.com/openjdk/jdk/blob/71b8ad45b4de6836e3bb2716ebf136f3f8ea2198/src/hotspot/cpu/aarch64/c1_MacroAssembler_aarch64.cpp#L95
> [5]
> https://github.com/openjdk/jdk/blob/dfe8833f5d9a9ac59857143a86d07f85769b8eae/src/hotspot/cpu/x86/c1_MacroAssembler_x86.cpp#L74
> [6]
> https://github.com/openjdk/jdk/blob/dfe8833f5d9a9ac59857143a86d07f85769b8eae/src/hotspot/share/runtime/synchronizer.cpp#L340
Comments
Fix Request (11u): 11u is also affected. Fix is simple and applies cleanly: https://github.com/openjdk/jdk11u-dev/pull/129
13-07-2021

Changeset: c294ae4f Author: Boris Ulasevich <bulasevich@openjdk.org> Date: 2021-06-21 06:04:57 +0000 URL: https://git.openjdk.java.net/jdk/commit/c294ae4fed59c7e303416346cc4b189a48bc1ab1
21-06-2021

Ok. Thank you!
21-06-2021

[~bulasevich] - You did not have to reopen this main bug in order to integrate the JDK17 backport. When you integrated your backport (using the main bug ID), the system would then use the existing backport bug ID for adding the JDK17 integration notification.
21-06-2021

[~bulasevich] No you have to use the main bug id, JDK-8267042, for the integration, not the backport ID.
21-06-2021

Daniel, thank you. One more question. Can I change JDK-8269035 Type Backport->Bug to resolve integration issue?: > https://github.com/openjdk/jdk17/pull/102 > Issue of type Backport is not allowed for integrations
20-06-2021

We're in JDP1 for JDK17. See https://openjdk.java.net/jeps/3. This is a P3 bug so you can do a PR for this bug and integrate once you have reviewer approval. No Release Team permission is required.
18-06-2021

Daniel, what is the process to get this fix in jdk-17?
18-06-2021

Fix was pushed while main bug was targeted to '17'. Reset the main bug to fixed in '18' and copied the Robo Duke entry here.
18-06-2021

Robo Duke added a comment - 17 minutes ago Changeset: 8f2456e5 Author: Boris Ulasevich <bulasevich@openjdk.org> Date: 2021-06-18 16:25:25 +0000 URL: https://git.openjdk.java.net/jdk/commit/8f2456e5b058a88730ec383d88634737849afdfb
18-06-2021

Response from Chris: On 5/20/21 1:13 PM, Chris Cole wrote: > Hi Boris, > > Thanks for looking into this and your support. See comments below. > > On Wed, May 19, 2021 at 2:20 AM Boris Ulasevich > <boris.ulasevich@bell-sw.com> wrote: >> >> Hi Chris, >> >> I agree that the problem is in C1_MacroAssembler::lock_object. > > I believe that this issue can be fixed by initializing the displaced > header in either > C1_MacroAssembler::lock_object/SharedRuntime::generate_native_wrapper > or in ObjectSynchronizer::quick_enter with changes provided in > previous email. > > I would suggest that where the problem/fix is located depends on what > the "interface contract" is for initializing the displaced header > between JIT inlined code and the monitor enter runtime helper. This > "interface contract" is not documented anywhere so it is defined by > how things are coded. Before the introduction of > ObjectSynchronizer::quick_enter(), the runtime helper would always > initialize the displaced header as appropriate. When > ObjectSynchronizer::quick_enter() was introduced it changed the > "interface contract" of the monitor enter runtime helper because there > is one special case where it no longer always initializes the > displaced header as appropriate. > > I believe that the better design and "interface contract" is to have > the runtime helper always set the displaced header as appropriate > (apply change to ObjectSynchronizer::quick_enter and no other changes) > for the following reasons. > - Less coupling between JIT code and runtime helper, this coupling > helped lead to a number of time consuming bugs including 8153107, > 8077392, and this one. > - Every place that slow case runtime helper is called, special care is > required to initialize displaced header, rather then in just one place > in ObjectSynchronizer::quick_enter. > - This coupling is also exposed through the > JVMCIRuntime::monitorenter(JavaThread, oopDesc, BasicLock) interface > and without updates to ObjectSynchronizer::quick_enter(), anyone (like > Graal, etc) calling this interface needs to make sure > BasicLock->_displaced_header is initialized to a non-zero value. My > guess is that without removing this requirement there may be future > deadlock issues. Also current users use JVMCIRuntime::monitorenter() > should be reviewed to ensure that they are initializing > BasicLock->_displaced_header. > - JIT inline code and JVMCIRuntime::monitorenter users can possibly be > smaller and faster, if the JIT inline code determines that the runtime > helper is needed it can simply call this without first having to > initialize the displaced header. Also note that JIT code may not have > the information to know exactly what to set displaced herder to, so I > will set it to non-zero and the runtime helper may overwrite this > value with something different based on the additional logic in the > slow path. > - Supports fixing -XX:-UseFastLocking develop flag without additional changes > > But given that all code except ARM32 C1, assumes an "interface > contract" that has special case for when displaced header needs to be > initialized I am not sure that there will be much support for making > changes that impact all other architectures in the context of the > ARM32 bug fix (for other architectures there could be a redundant > setting displaced header to non-zero in > ObjectSynchronizer::quick_enter()). > > Anyway, just my thoughts... I am fine with either location > (C1_MacroAssembler::lock_object/SharedRuntime::generate_native_wrapper > or ObjectSynchronizer::quick_enter) to fix this ARM32 bug. > >> What I do not like in your fix is >> - an arbitrary non-zero value is put into disp_hdr address for 'ne' case. > > I somewhat agree that in general the aesthetics of using an arbitrary > non-zero value isn't ideal. But in this case I believe setting to a > specific non-zero value would require an additional instruction and I > don't think this is worth it. Note that this JIT code is inlined to > every compiled synchronized method and block, so that one extra > instruction gets multiplied by a large number and increases memory > usage, especially important for ARM32 with devices generally having > limited memory. > > The code change for C1_MacroAssembler::lock_object is taken from what > is done for ARM32 C2 code in C2_MacroAssembler::fast_lock(), > that is using an arbitrary non-zero value (see > https://github.com/openjdk/jdk/blob/b7b6acd9b1cafb791827e151712836c4e7140db5/src/hotspot/cpu/arm/c2_MacroAssembler_arm.cpp#L122). > I would suggest that all places should be consistent. Also when other > architectures call the runtime helper, the displaced header is > initialize with "unlocked object header" value due to failed attempt > to do a stack lock and this is somewhat arbitrary (and somewhat > misleading) (see > https://github.com/openjdk/jdk/blob/dfe8833f5d9a9ac59857143a86d07f85769b8eae/src/hotspot/cpu/x86/c1_MacroAssembler_x86.cpp#L74). > > Note that in this context the displaced header is acting like a > boolean, false = 0 and any other value is true. > > So to keep the code small and to be consistent with ARM32 C2, I would > suggest just using non-zero value. Maybe a better comment in the code > would be helpful, as follows. > > diff --git a/src/hotspot/cpu/arm/c1_MacroAssembler_arm.cpp > b/src/hotspot/cpu/arm/c1_MacroAssembler_arm.cpp > index 9d5c82dceb9..44ddc3a9da9 100644 > --- a/src/hotspot/cpu/arm/c1_MacroAssembler_arm.cpp > +++ b/src/hotspot/cpu/arm/c1_MacroAssembler_arm.cpp > @@ -235,7 +235,11 @@ int C1_MacroAssembler::lock_object(Register hdr, > Register obj, > sub(tmp2, hdr, SP, eq); > movs(tmp2, AsmOperand(tmp2, lsr, exact_log2(os::vm_page_size())), eq); > // If 'eq' then OK for recursive fast locking: store 0 into a lock record. > - str(tmp2, Address(disp_hdr, mark_offset), eq); > + // If 'ne' then initialize displaced header with this non-zero value > + // to make sure monitor exit is not treated as non-inflated recursive unlock, > + // the runtime helper used in slow case doesn't always do this for us > + // (see discussion in JDK-8267042) > + str(tmp2, Address(disp_hdr, mark_offset)); > b(fast_lock_done, eq); > // else need slow case > b(slow_case); > > >> - there is a similar code pattern in >> SharedRuntime::generate_native_wrapper - should not it be fixed too? > > Good catch, yes this needs to be fixed as well. I would suggest same > change as for C1_MacroAssembler::lock_object, as follows. > > diff --git a/src/hotspot/cpu/arm/sharedRuntime_arm.cpp > b/src/hotspot/cpu/arm/sharedRuntime_arm.cpp > index 341cf63c4c9..7de4306aedb 100644 > --- a/src/hotspot/cpu/arm/sharedRuntime_arm.cpp > +++ b/src/hotspot/cpu/arm/sharedRuntime_arm.cpp > @@ -1184,7 +1184,11 @@ nmethod* > SharedRuntime::generate_native_wrapper(MacroAssembler* masm, > __ sub(Rtemp, mark, SP, eq); > __ movs(Rtemp, AsmOperand(Rtemp, lsr, exact_log2(os::vm_page_size())), eq); > // If still 'eq' then recursive locking OK: set displaced header to 0 > - __ str(Rtemp, Address(disp_hdr, > BasicLock::displaced_header_offset_in_bytes()), eq); > + // If 'ne' then initialize displaced header with this non-zero value > + // to make sure monitor exit is not treated as non-inflated > recursive unlock, > + // the runtime helper used in slow case doesn't always do this for us > + // (see discussion in JDK-8267042) > + __ str(Rtemp, Address(disp_hdr, > BasicLock::displaced_header_offset_in_bytes())); > __ b(lock_done, eq); > __ b(slow_lock); > >> - the second comment in hdr bits manipulation code is wrong: "// -2- >> test (hdr - SP) if the low two bits are 0" > > What are you suggesting is wrong? If I read this as a comment about > the two instructions that follow (sub and movs) then seems to make > sense to me, but I might be missing something. Also I would think this > is not really related to this issue and addressed in a separate place > issue? > > Thanks again, > Chris >
20-05-2021

Response from Boris: On 5/19/21 5:20 AM, Boris Ulasevich wrote: > Hi Chris, > > I agree that the problem is in C1_MacroAssembler::lock_object. > > What I do not like in your fix is > - an arbitrary non-zero value is put into disp_hdr address for 'ne' case. > - there is a similar code pattern in SharedRuntime::generate_native_wrapper - should not it be fixed too? > - the second comment in hdr bits manipulation code is wrong: "// -2- test (hdr - SP) if the low two bits are 0" > > regards, > Boris
19-05-2021

Another email from Chris Cole: On 5/19/21 1:31 AM, Chris Cole wrote: > Hi All, > > I have continued to further investigate this bug that I reported and > have some additional information. > > I believe that this issue is similar to JDK-8153107 "enabling > ObjectSynchronizer::quick_enter() on ARM64 causes hangs" (see [1]), > this impacted C2 on both ARM32 and ARM64. The fix for that issue (see > [2]) fixed MacroAssembler::fast_lock() that is used by C2. When this > fix was made, I believe only C2 was using > ObjectSynchronizer::quick_enter(). Now (version 11.0.10+ and 15+) that > C1 is using ObjectSynchronizer::quick_enter() due to JDK-8241234, a > similar change to C1_MacroAssembler::lock_object() used by C1 is > required. The following change to unconditionally store to the > displaced header field should fix things for C1. > > diff --git a/src/hotspot/cpu/arm/c1_MacroAssembler_arm.cpp > b/src/hotspot/cpu/arm/c1_MacroAssembler_arm.cpp > index 9d5c82dceb9..9a628eb7de5 100644 > --- a/src/hotspot/cpu/arm/c1_MacroAssembler_arm.cpp > +++ b/src/hotspot/cpu/arm/c1_MacroAssembler_arm.cpp > @@ -235,7 +235,8 @@ int C1_MacroAssembler::lock_object(Register hdr, > Register obj, > sub(tmp2, hdr, SP, eq); > movs(tmp2, AsmOperand(tmp2, lsr, exact_log2(os::vm_page_size())), eq); > // If 'eq' then OK for recursive fast locking: store 0 into a lock record. > - str(tmp2, Address(disp_hdr, mark_offset), eq); > + // set to non zero otherwise (see discussion in JDK-8267042) > + str(tmp2, Address(disp_hdr, mark_offset)); > b(fast_lock_done, eq); > // else need slow case > b(slow_case); > > This should fix the specific issue that I was seeing and that is > reproduce with test case I provided. > > I also reviewed the code to see if there were any other possible issue > related to Runtime1::monitorenter() not always initializing displaced > header field due to ObjectSynchronizer::quick_enter(). I found that if > the -XX:-UseFastLocking develop flag is used (C1 doesn't use any > inline monitor lock/unlock code and just calls the > Runtime1::monitorenter() and Runtime1::monitorexit()), then the test > case will reproduce deadlock on any of the platforms (arm32, aarch64, > x86_64, etc) due to uninitialized displaced header field. Here is a > fix for that. (Note that this change below, makes the change above to > c1_MacroAssembler_arm.cpp unnecessary). > > diff --git a/src/hotspot/share/runtime/synchronizer.cpp > b/src/hotspot/share/runtime/synchronizer.cpp > index 98afe838c49..db3eb255778 100644 > --- a/src/hotspot/share/runtime/synchronizer.cpp > +++ b/src/hotspot/share/runtime/synchronizer.cpp > @@ -355,11 +355,6 @@ bool ObjectSynchronizer::quick_enter(oop obj, Thread* self, > // Case: light contention possibly amenable to TLE > // Case: TLE inimical operations such as nested/recursive synchronization > > - if (owner == self) { > - m->_recursions++; > - return true; > - } > - > // This Java Monitor is inflated so obj's header will never be > // displaced to this thread's BasicLock. Make the displaced header > // non-NULL so this BasicLock is not seen as recursive nor as > @@ -372,6 +367,11 @@ bool ObjectSynchronizer::quick_enter(oop obj, Thread* self, > // and last are the inflated Java Monitor (ObjectMonitor) checks. > lock->set_displaced_header(markWord::unused_mark()); > > + if (owner == self) { > + m->_recursions++; > + return true; > + } > + > if (owner == NULL && m->try_set_owner_from(NULL, self) == NULL) { > assert(m->_recursions == 0, "invariant"); > return true; > > I am also reviewing the biased_locking_enter() call from > C1_MacroAssembler::lock_object(), this code path can call > Runtime1::monitorenter() without initializing the displaced header > field. But in this code path it might not be possible for > ObjectSynchronizer::quick_enter() to return "true" without calling > lock->set_displaced_header(markWord::unused_mark()), this would > require current thread to own inflated monitor which seems to not be > possible in this case when biased_locking_enter() is calling > Runtime1::monitorenter(). > > Anyway, I will continue to review the code to make sure this aren't > any other issues related to this issue. > > Thanks, > Chris > > [1] https://bugs.openjdk.java.net/browse/JDK-8153107 > [2] http://hg.openjdk.java.net/jdk/jdk/rev/3e66d204af9b
19-05-2021

ILW = HLM = P3
18-05-2021

Another email from Chris Cole: On 5/16/21 7:50 PM, Chris Cole wrote: > Hi Dan, > > I had some additional information to share regarding JDK-8267042 > (after looking into possible fix I discovered that if > -XX:-UseFastLocking develop flag is used, this deadlock issue can be > reproduced on other architectures like aarch63, x86_64, etc). > > I see that Boris Ulasevich is the Assignee in JDK Bug System and > provided a comment there. He suggested "The displaced header should be > updated the same way with aarch64 and x86" which was one of two > possible solutions I suggested. But this leaves all platforms broken > when -XX:-UseFastLocking is enabled, so the other suggested fix I > suggested may also be required. > > Anyway, is the best way for me to communicate on this issue via the > hotspot-dev@openjdk.java.net mailing list? It seems that my initial > email was not actually sent out on the list, at least according to the > archive (https://mail.openjdk.java.net/pipermail/hotspot-dev/2021-May/date.html)? > Does the list filter the messages based on subject or sender? > > I did review information at https://openjdk.java.net/contribute/ but > wasn't sure if the mailing list is still the place to communicate and > if my messages would get through. > > Thanks for your support and time! > > Chris Cole > Sage Embedded Software
17-05-2021

I reproduced the issue on the ARM32 client VM. It really goes to the deadlock. The displaced header should be updated the same way with aarch64 and x86.
13-05-2021

Attached Chris Cole's test as MonitorBugTest.java.
12-05-2021