JDK-8028736 : Crash in G1 after making a method with an embedded oop at the start of the code not-entrant
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: hs25
  • Priority: P2
  • Status: Closed
  • Resolution: Won't Fix
  • Submitted: 2013-11-20
  • Updated: 2013-11-22
  • Resolved: 2013-11-22
Related Reports
Relates :  
Description
When making an nmethod non-entrant, the first few bytes of the code (e.g. 5 bytes on x86) are patched with a jump.

If this area contains an oop, or an oop overlaps into it, G1 may crash later because it remembers that nmethod in its code roots remembered set indefinitely.

Particularly, when this nmethod will be zombified and removed, this nmethod will not be removed from the region's code root set it has originally been registered to, because oop iteration will skip that area (see nmethod.cpp line 1851ff). So the code root set contains a stale nmethod reference that will never be cleared. So when later, after the nmethod has been removed, and the GC accesses it, the VM most likely crashes.

An example for such a method that can cause that is:

public class X {
  static final Object someobject;

  // assign some value to someobject somewhere
  
  // the following method is compiled
  static Object getSomeObject() {
    return someobject;
  }

  // something else
}

A simple fix is to, before patching the nmethod, i.e. making it not-entrant, unregister that nmethod and after that re-register for this kind of methods.

I.e. in pseudo-code, in nmethod::make_not_entrant_or_zombie, line 1338ff:

if (nmethod contains an oop in the patched area) {
  unregister nmethod with heap
  needs_reregister = true
}
      NativeJump::patch_verified_entry(entry_point(), verified_entry_point(),
                  SharedRuntime::get_handle_wrong_method_stub());
if (needs_reregister) {
  register nmethod with heap
}

Problem is that this register/unregister needs to be done with the codecache_lock held, but this is not the case, and further we are currently holding the patching_lock. Which means we cannot just take the codecache_lock as these two mutexes have the same rank.

Optimization of this would check first if an oop is overwritten, and second if oop is the last reference to that region(!) within that nmethod, and only then unregister.

Originally reported by C. Kotselidis.

ILW: Impact: high (crash), Likelyhood: L (only ever reproduced on reporters setup), Workaround: H (none) -> P2
Comments
This situation cannot occur on the C1/C2 compiler as they pad the methods with enough filler instructions. Talked to the original reporter, they will fix it in the Graal compiler.
22-11-2013

This must be Graal-emitted method that fails. C1 & C2 take special care to emit enough bytes in the prologue to make space for the jump instruction of the patch. The method in the example: static final Object someobject = new Integer(0xdeadbabe); final static Object getSomeObject() { return someobject; } Looks like this with C2: 0x000000011185d440: sub $0x18,%rsp 0x000000011185d447: mov %rbp,0x10(%rsp) ;*synchronization entry ; - X::getSomeObject@-1 (line 6) 0x000000011185d44c: movabs $0x74006b240,%rax ; {oop(a 'java/lang/Integer' = -559039810)} 0x000000011185d456: add $0x10,%rsp 0x000000011185d45a: pop %rbp 0x000000011185d45b: test %eax,-0x246a461(%rip) # 0x000000010f3f3000 ; {poll_return} 0x000000011185d461: retq Notice a long version of the sub allocating the frame, that's on purpose. See MacroAssembler::verified_entry() in macroAssembler_x86.cpp. C1, respectively generates either a stack bang (always!): 0x0000000110fa5e20: mov %eax,-0x16000(%rsp) 0x0000000110fa5e27: push %rbp 0x0000000110fa5e28: sub $0x30,%rsp ;*getstatic someobject ; - X::getSomeObject@0 (line 6) ;; block B0 [0, 3] 0x0000000110fa5e2c: movabs $0x74006b428,%rax ; {oop(a 'java/lang/Integer' = -559039810)} 0x0000000110fa5e36: add $0x30,%rsp 0x0000000110fa5e3a: pop %rbp 0x0000000110fa5e3b: test %eax,-0x262fd41(%rip) # 0x000000010e976100 ; {poll_return} 0x0000000110fa5e41: retq Or if you turn stack bangs off it puts a 5-byte nop (to accommodate for the jump): 0x00000001080a1f20: nopl 0x0(%rax,%rax,1) 0x00000001080a1f25: push %rbp 0x00000001080a1f26: sub $0x30,%rsp ;*getstatic someobject ; - X::getSomeObject@0 (line 6) ;; block B0 [0, 3] 0x00000001080a1f2a: movabs $0x74006b428,%rax ; {oop(a 'java/lang/Integer' = -559039810)} 0x00000001080a1f34: add $0x30,%rsp 0x00000001080a1f38: pop %rbp 0x00000001080a1f39: test %eax,-0x24dde3f(%rip) # 0x0000000105bc4100 ; {poll_return} 0x00000001080a1f3f: retq
22-11-2013

Note, usually such small methods are inlined (accessor methods) and big methods have stack bang code at the beginning. Here is criteria used to generate stack bang: bool Compile::need_stack_bang(int frame_size_in_bytes) const { // Determine if we need to generate a stack overflow check. // Do it if the method is not a stub function and // has java calls or has frame size > vm_page_size/8. return (UseStackBanging && stub_function() == NULL && (has_java_calls() || frame_size_in_bytes > os::vm_page_size()>>3)); } In your case stack bang is not generated (small frame) that is why you have embedded oop at the beginning of the code. We can mark such nmethods so it will be easier to see them when we need unregister them. And thank you for catching both problems.
21-11-2013

Another problem: When iterating over the nmethod in oops_do() when the nmethod is a zombie, we need to skip the first few bytes too as they are overwritten, like in the not-entrant case. Otherwise we will crash there during unregistering the nmethod. void nmethod::oops_do(OopClosure* f, bool allow_zombie) { // make sure the oops ready to receive visitors assert(allow_zombie || !is_zombie(), "should not call follow on zombie nmethod"); assert(!is_unloaded(), "should not call follow on unloaded nmethod"); // If the method is not entrant or zombie then a JMP is plastered over the // first few bytes. If an oop in the old code was there, that oop // should not get GC'd. Skip the first few bytes of oops on // not-entrant methods. address low_boundary = verified_entry_point(); if (is_not_entrant()) { /// <---------------- low_boundary += NativeJump::instruction_size; // %%% Note: On SPARC we patch only a 4-byte trap, not a full NativeJump. // (See comment above.) }
21-11-2013