JDK-8296101 : nmethod::is_unloading result unstable with concurrent unloading
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 20
  • Priority: P2
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2022-10-30
  • Updated: 2022-11-05
  • Resolved: 2022-11-03
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 20
20 masterFixed
Related Reports
Duplicate :  
Duplicate :  
Relates :  
Relates :  
Description
From JDK-8295969:
I have been assuming that the result of is_unloading() is stable, but that might not be the case. Consider 3 threads, where one thread Patch is calling is_unloading() before patching a compiled call site, thread Clear is calling is_unloading() to see if it the call site should be cleared, and thread Compiler has just set the nmethod state to not_entrant. Now let's say the Patch thread sees the nmethod as is_use and not unloading and decides to patch the call site. The Clear thread sees the nmethod as unloading and decides to clear the call site. The Patch thread writes the cached is_unloading state first with "not unloading", then the Clear thread overwrites it with "is unloading". The call site is first cleared by thread Clear and then patched by thread Patch. The GC unlinks the nmethod, and soon it gets released and the memory recycled. Now we have a compiled call site calling into bad memory. That memory can get overwritten with a new nmethod, and then we have a call into the middle of an nmethod.

If the above scenario is indeed possible, then I think the is_cold/make_not_entrant race can be fixed with memory barriers. There may be a similar race with the has_dead_oop path of is_unloading, because C1 patching can add new nmethod oops. It might be better to fix the race in is_unloading. When is_unloading writes the cached result, it could use a compare-and-swap so that the first value written always wins. 
Comments
OK, I missed that.
05-11-2022

I don’t understand how that would happen. Nobody makes any nmethod not_entrant during the GC safepoint, and the GC safepoint updates the is_unloading_state of all nmethods. So once the safepoint is released, nobody would calculate the value any more.
05-11-2022

I think this race could happen with two threads calling make_not_entrant() at the same time (after 8290025: Remove the Sweeper), without the need for concurrent unloading. In other words, I think it could have happened with G1 too, possibly leaving the nmethod in the is_unloading state but with active frames on the stack.
05-11-2022

Changeset: cc3c5a18 Author: Erik Österlund <eosterlund@openjdk.org> Date: 2022-11-03 13:31:07 +0000 URL: https://git.openjdk.org/jdk/commit/cc3c5a18ed4e52ea385ea0e8bedaf1b01f3c5e6e
03-11-2022

A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/10926 Date: 2022-11-01 09:46:33 +0000
01-11-2022

With the CAS we can detect the race/instability and decide a winner, which solves the problem especially if we don't know where all the races are or don't want to fix them. But if we wanted to, for example, avoid the CAS in release/product builds because it is more expensive, then we could try to remove the races instead. That's where memory ordering comes in. When I reproduce the race with delays, it is when this sequence happened: is_cold READ is_maybe_on_stack == false make_not_entrant WRITE is_maybe_on_stack = true make_not_entrant WRITE not_entrant = true is_cold READ not_entrant == true (I have simplified is_maybe_on_stack and not_entrant to be a flags for illustrative purposes) If we change the order of the reads in is_cold: is_cold READ not_entrant is_cold READ is_maybe_on_stack then it should be impossible for is_cold to see (not_entrant && !is_maybe_on_stack) as a result of the other thread calling make_not_entrant(), assuming the reads and write and performed in order, which requires memory barriers.
31-10-2022

If I modify is_cold to use (os::random % 2) instead of is_not_entrant() to simulate the source of instability but with higher probability (50% chance true vs false), and run RunThese, we hit a crash very quickly. With my patch that installs the is_unloading_state with a CAS, it does not crash, because all threads observe consistent and stable is_unloading values. The first thread that installs the value decides if we should or should not heuristically unload the nmethod.
31-10-2022

I don't understand the connection to memory ordering. I do however understand that since make_not_entrant can happen while concurrent threads are computing is_unloading(), that depending on whether a concurrent observer reads the state of the nmethod before or after make_not_entrant, then it can end up thinking the nmethod is either is_cold() or not is_cold(), which in turn will make it set the is_unloading_state for the current unloading cycle to either true or false racingly. I'm currently trying out a patch (in the generational ZGC repo) for this: https://github.com/fisk/jdk/commit/8aaa8488110b1277de6dc466a9514dc34fbcf0e9 By guarding with a CAS, the thread that makes the nmethod not_entrant ensures that the is_unloading_state value is updated first, and racing threads observing stale values are forced to agree that the nmethod is or is not is_unloading(). But I don't see the connection to the memory ordering of mark_as_maybe_on_stack(). The source for the instability is IMO the not_entrant condition which we look at in is_cold. As for the counter used to mark_as_maybe_on_stack(), this counter is only updated after the is_unloading_state has been updated. If we do that with a CAS, then we know that the state is stable for the current unloading cycle. Moreover, we only ever call mark_as_maybe_on_stack() on nmethods that are !is_unloading. Hope this makes sense.
31-10-2022

I was thinking that this race would only happen with a weak memory model, like aarch64. But we could have: is_cold READ is_maybe_on_stack == false make_not_entrant WRITE is_maybe_on_stack = true make_not_entrant WRITE not_entrant = true is_cold READ not_entrant == true and a memory barrier wouldn't help here. Instead we could do: if (is_not_entrant()) { OrderAccess::loadload(); if (!is_maybe_on_stack() { } } and: mark_as_maybe_on_stack(); OrderAccess::storestore(); try_transition(not_entrant); Or use Atomic::load_acquire() and Atomic::release_store() for is_not_entrant() and try_transition().
30-10-2022