JDK-8138922 : StubCodeDesc constructor publishes partially-constructed objects on StubCodeDesc::_list
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 8u172,9
  • Priority: P2
  • Status: Closed
  • Resolution: Fixed
  • OS: linux
  • CPU: x86_64
  • Submitted: 2015-10-05
  • Updated: 2024-12-04
  • Resolved: 2016-02-15
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 9
9 b108Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Description
While chasing the following bug:

JDK-8047212 runtime/ParallelClassLoading/bootstrap/random/inner-complex assert(ObjectSynchronizer::verify_objmon_isinpool(inf)) failed: monitor is invalid

I ran into three SIGSEGV crashes in StubCodeDesc::desc_for()

My baseline is a clone of RT_Baseline at the following fix:

Changeset: 703df4322ebb
Author: dsamersoff
Date: 2015-10-01 10:33 +0300
URL: http://hg.openjdk.java.net/jdk9/hs-rt/jdk/rev/703df4322ebb

8133063: Remove BasicLauncherTest from the problem list
Summary: Remove BasicLauncherTest from the problem list
Reviewed-by: jbachorik

Here's the changeset info for the entire repo:

$ cat SourceTips.txt
 .:34280222936a jdk:703df4322ebb jaxp:497bc2654e11 pubs:618464525123 corba:ca8a17195884 jaxws:bdb954839363 closed:57176e80ab18 deploy:53398009c566 hotspot:983c56341c80 install:a2caf79947c6 nashorn:678db05f13ba sponsors:9e31857dd56d langtools:8e76163b3f3a jdk/src/closed:59bd18af2265 jdk/make/closed:54d0705354f2 jdk/test/closed:de2be51ab426 hotspot/src/closed:002bf5205dcd hotspot/make/closed:d70cd66cf2f4 hotspot/test/closed:5524c847f372

My testing config is running 4 parallel test runs using the following locally built bits:

$ ../build/linux-x86_64-normal-server-fastdebug/images/jdk/bin/java -version
java version "1.9.0-internal-fastdebug"
Java(TM) SE Runtime Environment (build 1.9.0-internal-fastdebug-ddaugher_2015_10_02_11_39-b00)
Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-internal-fastdebug-ddaugher_2015_10_02_11_39-b00, mixed mode)

$ ../build/linux-x86_64-normal-server-fastdebug/images/jdk/bin/java -Xinternalversion
Java HotSpot(TM) 64-Bit Server VM (1.9.0-internal-fastdebug-ddaugher_2015_10_02_11_39-b00) for linux-amd64 JRE (1.9.0-internal-ddaugher_2015_10_02_11_39-b00), built on Oct 2 2015 12:13:07 by "ddaugher" with gcc 4.8.2 


$ elapsed_times mark.start_test_run hs_err_pid*
mark.start_test_run                               0 seconds
hs_err_pid26813.log   1 days          18 minutes 22 seconds
hs_err_pid28187.log          17 hours 13 minutes 20 seconds
hs_err_pid8500.log            1 hours  9 minutes 11 seconds
hs_err_pid8630.log           11 hours 49 minutes 23 seconds

The first failure (hs_err_pid26813.log) matches the bug I'm
hunting and the remaining three match this bug. I'll attach
the hs_err_pid files.
Comments
This seems to affect 8u172. We've seen intermittent crashes on Aarch64 due to this. It's doubtful that the issue is CPU specific.
03-07-2018

verified by nightly testing
26-07-2017

URL: http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/30b120bce29d User: lana Date: 2016-03-02 19:34:38 +0000
02-03-2016

URL: http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/30b120bce29d User: vlivanov Date: 2016-02-15 20:18:32 +0000
15-02-2016

diff --git a/src/share/vm/runtime/stubCodeGenerator.hpp b/src/share/vm/runtime/stubCodeGenerator.hpp --- a/src/share/vm/runtime/stubCodeGenerator.hpp +++ b/src/share/vm/runtime/stubCodeGenerator.hpp @@ -75,6 +75,7 @@ _index = ++_count; // (never zero) _begin = begin; _end = end; + OrderAccess::storestore(); // Readers should observe fully constructed objects. _list = this; }; <+89>: mov 0x933bb0(%rip),%rax # 0x7ffff7d71580 <_ZN12StubCodeDesc5_listE> <+96>: mov %r14,0x10(%rbx) <+100>: mov %r15,0x18(%rbx) <+104>: mov %r13,0x28(%rbx) <+108>: movq $0x0,0x30(%rbx) <+116>: mov %rax,0x8(%rbx) <+120>: mov 0x933b8a(%rip),%eax # 0x7ffff7d71578 <_ZN12StubCodeDesc6_countE> <+126>: add $0x1,%eax <+129>: mov %eax,0x933b81(%rip) # 0x7ffff7d71578 <_ZN12StubCodeDesc6_countE> <+135>: mov %eax,0x20(%rbx) <+138>: mov %rbx,0x933b7f(%rip) # 0x7ffff7d71580 <_ZN12StubCodeDesc5_listE>
10-02-2016

StubCodeMark ctor publishes partially-constructed object: a compiler reorders "_list = this;" with "_next = _list;", so readers can observe StubCodeDesc with not-yet-initialized _next. That's exactly what happens during the crash: _next == 0xf1f1f1f1f1f1f1f1. Dump of assembler code for function StubCodeMark::StubCodeMark(StubCodeGenerator*, char const*, char const*): ... <+57>: callq <CHeapObj<(MemoryType)4>::operator new(unsigned long)> <+62>: test %rax,%rax <+65>: mov %rax,%rbx <+68>: je <+145> ... <+89>: mov 0x939ab0(%rip),%rax # load StubCodeDesc::_list <+96>: mov %r14,0x10(%rbx) # _group = group; <+100>: mov %r15,0x18(%rbx) # _name = name; <+104>: mov %r13,0x28(%rbx) # _begin = begin; <+108>: movq $0x0,0x30(%rbx) # _end = NULL; <+116>: mov %rbx,0x939a95(%rip) # _list = this; <+123>: mov %rax,0x8(%rbx) # _next = _list; <+127>: mov 0x939a83(%rip),%eax # _index = ++_count; // (never zero) <+133>: add $0x1,%eax # <+136>: mov %eax,0x939a7a(%rip) # <+142>: mov %eax,0x20(%rbx) #
10-02-2016

The crash happens very early in the execution. The last class loading event is: Event: 0.182 loading class java/lang/invoke/MethodHandleStatics$1 done or Event: 0.360 loading class java/lang/invoke/MemberName$1 done Around the same time MethodHandles::generate_adapters() works: [Loaded java.lang.invoke.MethodHandleStatics from ... [Loaded java.lang.invoke.MethodHandleStatics$1 from ... [Loaded jdk.internal.misc.JavaLangInvokeAccess from ... [Loaded java.lang.invoke.MemberName$1 ... MethodHandle::interpreter_entry: _invokeGeneric MethodHandle::interpreter_entry: _invokeBasic MethodHandle::interpreter_entry: _linkToVirtual MethodHandle::interpreter_entry: _linkToStatic MethodHandle::interpreter_entry: _linkToSpecial MethodHandle::interpreter_entry: _linkToInterface So, it is very likely that StubCodeDesc::_list is being modified concurrently.
10-02-2016

I was able to traverse the whole StubCodeDesc::_list. static _list = 0x7f7f3038d420, static _count = 75 First entry: _next = 0x7f7f3038c6d0, _group = 0x7f7f38547518 "MethodHandle::interpreter_entry", _name = 0x7f7f385c2573 "_linkToInterface", _index = 75, _begin = 0x7f7f1d1a5975, _end = 0x7f7f1d1a5d88 Last entry: _next = 0x0, _group = 0x7f7f384f02fb "ICache", _name = 0x7f7f384f02e9 "flush_icache_stub", _index = 1, _begin = 0x7f7f1cd65060, _end = 0x7f7f1cd6507d
10-02-2016

Dump of assembler code for function StubCodeDesc::desc_for(unsigned char*): 0x00007f7f382579d0 <+0>: mov 0x939fe9(%rip),%rax # 0x7f7f38b919c0 <_ZN12StubCodeDesc5_listE> 0x00007f7f382579d7 <+7>: push %rbp 0x00007f7f382579d8 <+8>: mov %rsp,%rbp 0x00007f7f382579db <+11>: test %rax,%rax 0x00007f7f382579de <+14>: je 0x7f7f382579ec <StubCodeDesc::desc_for(unsigned char*)+28> => 0x00007f7f382579e0 <+16>: cmp %rdi,0x28(%rax) 0x00007f7f382579e4 <+20>: ja 0x7f7f382579f0 <StubCodeDesc::desc_for(unsigned char*)+32> 0x00007f7f382579e6 <+22>: cmp 0x30(%rax),%rdi 0x00007f7f382579ea <+26>: jae 0x7f7f382579f0 <StubCodeDesc::desc_for(unsigned char*)+32> 0x00007f7f382579ec <+28>: pop %rbp 0x00007f7f382579ed <+29>: retq 0x00007f7f382579ee <+30>: xchg %ax,%ax 0x00007f7f382579f0 <+32>: mov 0x8(%rax),%rax 0x00007f7f382579f4 <+36>: test %rax,%rax 0x00007f7f382579f7 <+39>: jne 0x7f7f382579e0 <StubCodeDesc::desc_for(unsigned char*)+16> 0x00007f7f382579f9 <+41>: pop %rbp 0x00007f7f382579fa <+42>: retq
10-02-2016

[~dcubed] Nevermind, I noticed you filed the bug long ago. I'll wait until Aurora is up.
09-02-2016

Looks like a not-yet-initialized StubCodeDesc in the list (_next == 0xf1f1f1f1f1f1f1f1). [~dcubed], do you have a core file left by any chance?
09-02-2016

class StubCodeDesc: public CHeapObj<mtCode> { protected: static StubCodeDesc* _list; // the list of all descriptors static int _count; // length of list ... StubCodeDesc(const char* group, const char* name, address begin, address end = NULL) { assert(name != NULL, "no name specified"); _next = _list; _group = group; _name = name; _index = ++_count; // (never zero) _begin = begin; _end = end; _list = this; };
09-02-2016

Crashes in: V [libjvm.so+0x101bf20] StubCodeDesc::desc_for(unsigned char*)+0x10 StubCodeDesc* StubCodeDesc::desc_for(address pc) { StubCodeDesc* p = _list; while (p != NULL && !p->contains(pc)) p = p->_next; // p == NULL || p->contains(pc) return p; } 02: 2e 00 31 add BYTE PTR cs:[rcx],dh 05: c0 41 29 d0 rol BYTE PTR [rcx+0x29],0xd0 09: e9 c2 8d e8 ff jmp 0xffffffffffe88dd0 0e: 66 90 xchg ax,ax 10: 48 8b 05 49 9c 8a 00 mov rax,QWORD PTR [rip+0x8a9c49] # 0x008a9c60 17: 55 push rbp 18: 48 89 e5 mov rbp,rsp 1b: 48 85 c0 test rax,rax 1e: 74 0c je 0x0000002c 20: 48 39 78 28 cmp QWORD PTR [rax+0x28],rdi <=== Crashes here 24: 77 0a ja 0x00000030 26: 48 3b 78 30 cmp rdi,QWORD PTR [rax+0x30] 2a: 73 04 jae 0x00000030 2c: 5d pop rbp 2d: c3 ret 2e: 66 90 xchg ax,ax 30: 48 8b 40 08 mov rax,QWORD PTR [rax+0x8] 34: 48 85 c0 test rax,rax 37: 75 e7 jne 0x00000020 39: 5d pop rbp 3a: c3 ret 3b: 90 nop 3c: 0f 1f 40 00 nop DWORD PTR [rax+0x0] RAX=0xf1f1f1f1f1f1f1f1 is an unknown value const int uninitBlockPad = 0xF1; // value used to zap newly malloc'd blocks.
09-02-2016

I did a very long stress run with the fix for JDK-8047212 in place on my Linux DevOps machine and on my local Solaris X64 server. This latest version was baselined on a JDK9-hs-rt repo as of this fix: Changeset: 9b74c5f1b10e Author: brutisso Date: 2015-10-20 14:00 +0200 URL: http://hg.openjdk.java.net/jdk9/hs-rt/hotspot/rev/9b74c5f1b10e 8139868: CMSScavengeBeforeRemark broken after JDK-8134953 Reviewed-by: sjohanss, jwilhelm Here's all the repo tip info: $ cat 2015-10-20-121401.brutisso.8139868/SourceTips.txt .:cd061b69a817 jdk:d68de0bab8ee jaxp:91795d86744f corba:1ee087da34d5 jaxws:51729143f8fe closed:bb694776dce5 hotspot:9b74c5f1b10e nashorn:d8936a4a0186 langtools:e6fcc24b6d14 jdk/src/closed:1056f5b75a9e jdk/make/closed:8498209d9810 jdk/test/closed:e93dd07f3dee hotspot/src/closed:e1b24390d910 hotspot/make/closed:7131ce6f91de hotspot/test/closed:4913ed0cb0cd So the bits tested are based on the above repo info plus the fix for JDK-8047212; the builds were done locally on the test machines. Linux DevOps machine: $ elapsed_times mark.start_test_run doit_loop.fast_0.log mark.start_test_run 0 seconds doit_loop.fast_0.log 6 days 20 hours 54 seconds $ grep -v PASS doit_loop.fast_?.log doit_loop.fast_0.log:Copy fast_0: loop #1019206... doit_loop.fast_1.log:Copy fast_1: loop #1019583... doit_loop.fast_2.log:Copy fast_2: loop #1019596... doit_loop.fast_3.log:Copy fast_3: loop #1019529... Solaris X64 machine: $ elapsed_times mark.start_test_run doit_loop.fast_0.log mark.start_test_run 0 seconds doit_loop.fast_0.log 6 days 21 hours 38 minutes 46 seconds $ grep -v PASS doit_loop.fast_?.log doit_loop.fast_0.log:Copy fast_0: loop #715125... doit_loop.fast_1.log:Copy fast_1: loop #714984... doit_loop.fast_2.log:Copy fast_2: loop #715051... doit_loop.fast_3.log:Copy fast_3: loop #715111... So the failure mode for this bug (JDK-8138922) is no longer reproducible in my Linux DevOps env. It was never reproducible in my Solaris X64 server environment. I can't explain why my fix JDK-8047212 would cause this bug to go away so I'm not going to close this bug. However, if someone on the Compiler team wants to close this bug as "not reproducible", I would have no objections.
27-10-2015

Changing from the default P4 -> P2. The bug I'm hunting (JDK-8047212) is a P2 and this failure happens more frequently than my bug.
05-10-2015