JDK-8153134 : Infinite loop in handle_wrong_method in jmod
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 8u102,9
  • Priority: P2
  • Status: Resolved
  • Resolution: Fixed
  • OS: generic
  • CPU: generic
  • Submitted: 2016-03-30
  • Updated: 2017-11-29
  • Resolved: 2016-10-18
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 8 JDK 9
8u152Fixed 9 b143Fixed
Related Reports
Duplicate :  
Relates :  
Relates :  
Relates :  
Description
I had a local jdk build get stuck while running jmod.

This was on Linux x64, fastdebug, using a fairly up to date hs-rt clone plus my development changes. The jdk tree is at 13956 (tip) and hotspot is at 10649 (tip is 10660, according to the log I'm just missing a couple of innocuous looking changes). My local changes are G1-specific, so don't seem relevant as jmod is invoked with -XX:+UseSerialGC.

I used gdb to attach to the jmod process and looked around. Most threads were blocked in the usual wait states of one sort or another.  There was one thread actually doing stuff. Repeated backtrace/continue/stop of that thread looked something like:

... various things called from the helper ...
SharedRuntime::find_callee_info_helper
SharedRuntime::find_callee_method
SharedRuntime::reresolve_call_site
SharedRuntime::handle_wrong_method
... several unknown (???) frames ...

That thread appeared to be using 100% cpu.  So it seems to be stuck in an unending loop in find_callee_info_helper.  (It ran for close to an hour before I finally killed it.)

Comments
Okay, I moved the lock into set_code/clear_code (we already own it in nmethod::make_not_entrant_or_zombie() and don't need it in the Method constructor): http://cr.openjdk.java.net/~thartmann/8153134/webrev.01/ I'll do performance/correctness measurements as soon as Aurora is up again.
10-10-2016

I was thinking we could put the locking inside set_code and clear_code. Also, please measure performance impact.
07-10-2016

[~dlong] wrote: > I don't see how we could start executing the compiled code before set_code() has finished. I think what happens is the following (correct me if I'm wrong): We have method A that calls method B and the following threads: - Thread T1: CompilerThread that is compiling method B and currently registers the nmethod. - Thread T2: Executes nmethodA. Here is the execution sequence: T1: Registers nmethodB in Method::set_code(): _code = nmethodB T2: Resolves call to nmethodB in CompiledIC::compute_monomorphic_entry: Since Method::_code is set, entry is set to nmethodB::_verified_entry_point T2: nmethodB is converted to zombie in nmethodB::make_not_entrant_or_zombie(): The verified entry is patched to jump to handle_wrong_method_stub() and Method::clear_code() is called: _from_compiled_entry = c2i _from_interpreted_entry = i2i T1: Continues in Method::set_code(): _from_compiled_entry = _verified_entry_point _from_interpreted_entry = i2c T2: Continues in Method::clear_code(): _code = NULL As Dean suggested, we should use the PatchingLock to avoid concurrent updates of these fields: http://cr.openjdk.java.net/~thartmann/8153134/webrev.00/ What do you think?
07-10-2016

Yes, if we don't use a lock then we need to have all these workarounds to handle the inconsistent states. I think we can use Patching_lock without a safepoint check.
18-08-2016

And we need to make sure that these fields are updated only by these 2 methods: set_code() and clear_code().
18-08-2016

I think we need some kind of semaphore to synchronize concurrent execution of set_code() and clear_code(). Otherwise we will be always in trouble since several fields are updated and concurrency may happen at any time causing weird state of fields.
18-08-2016

[~kvn] I see that this field is used by the compiler VtableStubs, and also CDS, so removing it is not so easy. I think we can solve the infinite loop by using the new method above in just handle_wrong_method().
18-08-2016

Suggestion from [~kvn]: "One thing we can do is deoptimize this thing - don't cache from_compiled_entry in Method field but: Method::from_compiled_entry() const { return _code ? _code->verified_entry_point() : _adapter->c2i_entry(); }"
17-08-2016

Related discussions for 8043070: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2014-May/014609.html http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2014-May/014497.html
17-08-2016

Event log shows an uncommon trap and deoptimization, even before the compile log entry is written: 5.2144588360000004 Uncommon trap: trap_request=0xffffffe4 fr.pc=0x00002ac314ecbcb8 5.2145413889999999 DEOPT PACKING pc=0x00002ac314ecbcb8 sp=0x00002ac305338d00 5.2160623920000004 nmethod 412 0x00002ac314ec9690 code [0x00002ac314ec9ea0, 0x00002ac314ecdb50] So it appears that this could be the result of a race between Method::set_code() and Method::clear_code(), however I don't see how we could start executing the compiled code before set_code() has finished.
17-08-2016

It appears that java.nio.file.FileTreeIterator.fetchNextIfNeeded() is trying to call java.nio.file.FileTreeWalker.next(). Unfortunately, both of these nmethods are not_entrant and their verified entries have been patched to jump to handle_wrong_method(). The Method* for the callee method looks weird: _code is 0, but _from_compiled_entry still points to the non-entrant nmethod verified entry, instead of the c2i adapter. This explains why we get into an infinite loop, but not how the Method* got into this state.
17-08-2016

[~jcm] Jamsheed, I'll take this if you don't mind.
17-08-2016

Hi Jamsheed, could you please look into this issue? Thanks! Best regards, Zoltan
25-05-2016

ILW=jvm hangs;rarely;none=HLH=>P2
16-05-2016

These are all called from compiled code, giving this to the compiler team.
04-04-2016

I've built this same forest a number of times, including after the described failure. I've only had the one sighting of this failure so far.
30-03-2016