JDK-8216314 : SIGILL in CodeHeapState::print_names()
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 11,12
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • OS: os_x
  • CPU: x86_64
  • Submitted: 2019-01-07
  • Updated: 2022-12-08
  • Resolved: 2021-04-23
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11 JDK 12 JDK 13
11.0.12Fixed 12 b28Fixed 13Fixed
Related Reports
Duplicate :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Description
Crashed with SIGILL while running stress test:

Stack: [0x000070000ed1b000,0x000070000ee1b000],  sp=0x000070000ee1a8a8,  free space=1022k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  0x0000000111cabf80
V  [libjvm.dylib+0x36d697]  CodeCache::print_names(outputStream*)+0x69
V  [libjvm.dylib+0x3b9051]  CompileBroker::print_heapinfo(outputStream*, char const*, char const*)+0x31b
V  [libjvm.dylib+0x4aaa13]  DCmd::parse_and_execute(DCmdSource, outputStream*, char const*, char, Thread*)+0x1d9
V  [libjvm.dylib+0x1faa6b]  jcmd(AttachOperation*, outputStream*)+0x7a
V  [libjvm.dylib+0x1fa419]  attach_listener_thread_entry(JavaThread*, Thread*)+0x2cd
V  [libjvm.dylib+0xb9221c]  JavaThread::thread_main_inner()+0x1fe
V  [libjvm.dylib+0xb91cf6]  JavaThread::run()+0x2ac
V  [libjvm.dylib+0xb8e65f]  Thread::call_run()+0x83
V  [libjvm.dylib+0x9f4d14]  thread_native_entry(Thread*)+0x149
C  [libsystem_pthread.dylib+0x3661]  _pthread_body+0x154
C  [libsystem_pthread.dylib+0x350d]  _pthread_body+0x0
C  [libsystem_pthread.dylib+0x2bf9]  thread_start+0xd
Comments
Fix request (11u): This is a prerequisite/predecessor for the downport of JDK-8219586. For review and testing see the discussion there.
22-04-2021

URL: http://hg.openjdk.java.net/jdk/jdk12/rev/a6620d37728b User: lucy Date: 2019-01-16 08:52:11 +0000
16-01-2019

OK. Misunderstanding on my side. Currently checking the comments. Then the RFR will go out.
15-01-2019

We have until Thursday (RDP-2) to get P3's in: https://openjdk.java.net/projects/jdk/12/
15-01-2019

I think it would be beneficial to have the fix in 12. Should we rise the priority to P2?
15-01-2019

Okay, sounds good!
15-01-2019

Talking to others helps. The issue I had in mind is long gone. There is no specific performance issue anymore with Thread:current(). Thomas Stuefe (~stuefe) was involved in those discussions years ago. I'll go ahead and reactivate the owned_by_self() check.
15-01-2019

I think whenever code implicitly relies on assumptions (and that's the case for get_cbType which may crash if we don't hold the code cache lock), there should be an assert that verifies that these assumptions still hold. In this case it would be a simple 'assert_locked_or_safepoint(CodeCache_lock)'. The alternative would be to enable the check that you've uncommented but as you said, it might affect performance (I don't know if that issue is still there on some platforms).
15-01-2019

Wasn't there an issue with Thread::current() being a performance hog on some platforms? My memory is not totally clear in that respect. I left the owned_by_self() check in a as reminder. If that performance problem is gone (or never existed), I could activate the check. Note: get_cbType() is called very often. The other calls to owned_by_self() are moved out of any loops, so should be negligible. On the other hand, there are just two call sites, both of them are safe. Yes, I know, future modifications might render this statement incorrect.
14-01-2019

I see that you added a "have_CodeCache_lock" check now and skip the unsafe operations if we don't hold the lock. Of course, in this case we don't need an assert. If we don't have an alternate path but implicitly rely on holding the CodeCache_lock, for example in codeHeapState.cpp:2420, I would suggest to add an assert.
14-01-2019

Ah, I see where the problem is. This re-check is meanwhile gone. Why not assert? Because I am sure I have an alternate path in place everywhere. If you really insist on an assert, I'll put it in. There were even more changes. I found another place (in print_usedSpace()) where the CodeHeap contents is accessed to retrieve blob/nmethod names. It is now protected the same way as in print_names(). I have tests running overnight. If all goes well, I'll put out an official RFR tomorrow with this webrev: http://cr.openjdk.java.net/~lucy/webrevs/8216314.01/
14-01-2019

Thanks for the clarifications, I agree but don't understand why you need to re-check blob consistency while owning the code cache lock(codeHeapState.cpp:2176)? Isn't it the case that we now always hold the CodeCache_lock when calling CodeCache::print_names? So why not add an assert?
14-01-2019

Well, I believe we still need the safety checks. - If the jcmd requests the "all" function (as in this stress test), everything is protected under one instance of the CodeCache_lock. - If, however, a user requests specific functions interactively, one instance of the CodeCache_lock protects aggregation, and another one protects print_names(). - The safety checks are able to detect, and thus protect against, "static" changes. "Static" in this context means changes which happen in between, while the lock is not held. - The CodeCache_lock protects against "dynamic" changes. "Dynamic" in this context means changes which happen while we are processing one specific CodeBlob. Those dynamic changes are basically impossible to protect against without holding the lock. I do not like the idea of adding an assertion. I'd rather handle unsupported situations gracefully, e.g. by printing "name unavailable" if !CodeCache_lock->owned_by_self(). Isn't the final goal of this effort to avoid abnormal terminations, with dbg builds as well as with product builds? Yes, agreed. The comments need to be reworked. The preliminary webrev was meant to validate the basic idea with you.
14-01-2019

Also, the comments need to be adapted (for example, "// All other functions operate on aggregated data - except MethodNames, but that should be safe.").
14-01-2019

But since CodeCache::print_names() is now always called with the CodeCache_lock held, we don't need the safety checks anymore, right? Also, it would be good to add an assert_locked_or_safepoint(CodeCache_lock).
14-01-2019

@all: I have updated the webrev in-place with add'l changes to compileBroker.cpp. If you want to have a look (your comments are welcome) http://cr.openjdk.java.net/~lucy/webrevs/8216314.00/ In case of the "all" function (aggregation and print in one call), the CodeCache_lock is now held from start to end. For all other cases (aggregation and selective print in separate calls), the CodeCache_lock is acquired for aggregation (as before) and for print_names() (new). The other print calls operate on internal tables filled during aggregation. My private tests are still running. Hopefully they complete over the weekend. Only then will I be ready to initiate the official RFR process.
11-01-2019

Sorry for the delayed reaction - I was on an external event on Thursday all day. I'll try to come up with a CodeCache_lock solution. Not sure if I can make it by today, EOB. @Tobias: I'm not sure if the sweeper is the main culprit. Previous analysis pointed more in the direction of CodeBlobs becoming visible before being fully constructed.
11-01-2019

I believe this happened again in the same stress test, this time on windows: # # A fatal error has been detected by the Java Runtime Environment: # # EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x000000a49e8c5d0f, pid=11820, tid=21656 # # JRE version: Java(TM) SE Runtime Environment (13.0) (fastdebug build 13-internal+0-jdk13-jdk.106) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 13-internal+0-jdk13-jdk.106, compiled mode, sharing, compressed oops, g1 gc, windows-amd64) # Problematic frame: # C 0x000000a49e8c5d0f # ... Current thread (0x000000a4ec467800): JavaThread "Attach Listener" daemon [_thread_in_vm, id=21656, stack(0x000000a4ece30000,0x000000a4ecf30000)] Stack: [0x000000a4ece30000,0x000000a4ecf30000], sp=0x000000a4ecf2f608, free space=1021k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) C 0x000000a49e8c5d0f ... Stack slot to memory mapping: stack at sp + 0 slots: 0x00007fff67876c02 jvm.dll::CodeHeapState::print_names + 0x662 stack at sp + 1 slots: 0x000000a4ecf2f720 is pointing into the stack for thread: 0x000000a4ec467800 stack at sp + 2 slots: 0x00007fff00000021 is an unknown value stack at sp + 3 slots: 0x000000a4ecf2f710 is pointing into the stack for thread: 0x000000a4ec467800 stack at sp + 4 slots: 0x000000a4ecf2f720 is pointing into the stack for thread: 0x000000a4ec467800 stack at sp + 5 slots: 0x00007fff68215b18 jvm.dll::`string' + 0x0 stack at sp + 6 slots: 0x00007fff6828bd58 jvm.dll::`string' + 0x0 stack at sp + 7 slots: 0x00007fff6828bd4c jvm.dll::`string' + 0x0
11-01-2019

[~mikael] Yes, looks like the same issue. [~lucy] Right, disabling the sweeper is probably not good enough then.
11-01-2019

I think your latest webrev only narrows the window were interference by the sweeper can lead to crashes. So it's just a matter of time until our stress test that fires diagnostic commands or a system that has -Xlog:codecache=Debug enabled will hit the issue again (and for example calling a random address can have all kinds of severe effects). I don't think we can have an unsupported diagnostic command or logging option, especially if "unsupported" means "may crash the VM or even worse". I agree with Claes that we should go with a safe option first and improve if really necessary for performance. For example, given that the code cache sweeper is the only component that is allowed to remove blobs from the code cache, it might be an option to temporarily disable the sweeper while iterating/printing blobs without holding the CodeCache_lock (but we need to be really careful because concurrent allocations may affect the blob we are currently processing).
10-01-2019

Running concurrently with the code cache sweeper seems precisely like one of the things you *don't* want to do, and even benign races between a CodeHeap_Analytics job and the sweeping actions might be more damning to performance than simply serializing on the lock. The sweeper thread triggers sweeps independently of other GC activities, so it wouldn't be impossible to coordinate with it specifically to make sure the operations are held apart completely, which might further minimize risk to cause observable delays.
09-01-2019

Most prominently, the code cache sweeper will be blocked. Afaik, the sweeper is, among others, triggered by G1GC. The CodeCache_lock is used in heapRegionRemSet.cpp. I'm not a G1GC expert, though. Once again: CodeHeap_Analytics don't have any impact during normal execution. Only when triggered by the respective jcmd will there be a (possibly noticeable) effect. CodeHeap_Analytics are designed to gather information on a system where a CodeHeap issue is suspected. This stress test is kind of a "unsupported" use case. I agree, the issue can occur in a "supported" use case as well. But it is extremely unlikely to occur.
09-01-2019

What operations were blocked while holding the CodeCache_lock, apart from the obvious that compiler threads are not able to install new code etc? Seems odd to me that holding this lock would be cause for any slowdown in normal code execution, so it seems appropriate to analyze a CodeCache_lock solution (which seems simpler and safer) in more details and shed some light on why it caused a slowdown.
09-01-2019

Hi Tobias, how about this attempt for improvement? http://cr.openjdk.java.net/~lucy/webrevs/8216314.00/
09-01-2019

The reason was simply performance, more exactly interference of jcmds with normal operation. On large systems, the code cache can be fairly big. Analyzing and printing its contents can take a while, for sure seconds. The CodeCache_lock would have to be held continuously, all the time, from start of analysis to end of printing. This was considered too much impact. As of now, the CodeCache_lock is only held during the analysis step which completes, even for large heaps, in sub-second time. What biased the decision towards taking some residual risk was the assumption that this jcmd would be used in problem situations only.
09-01-2019

I think the only way to make this safe is to acquire the CodeCache_lock. What was the reasoning for not doing this in the first place again?
09-01-2019

Interesting findings! gcc seems to inline more aggressively. Anyway, this_blob must become invalid somewhen between evaluation of blob_initialized and the call to as_nmethod_or_null(). That could be possible given the code that's executed in between (including stringStream I/O, most likely no "real" I/O). What could I do? Recalculating this_blob and blob_initialized right before the call to as_nmethod_or_null() should shift the probability of using a stale this_blob pointer. There is no 100% safety, though. This was clear from the very beginning.
09-01-2019

I think the problem is the virtual call to CodeBlob::is_nmethod() in CodeBlob::as_nmethod_or_null(). If I interpret the assembly code correctly, the following mov gets the pointer to the virtual method table from the 'this_blob' object: 0x107d75f4f <+1247>: movq (%r15), %rax R15=0x0000000111c8ca90 (lldb) x/a 0x0000000111c8ca90 0x111c8ca90: 0x0000000111c92c80 <- %RAX That value is obviously *not* pointing to the virtual method table but into the code cache. See https://en.wikipedia.org/wiki/Virtual_method_table This does not happen on Linux because the code generated by gcc is different: callq 0x7ffff63da3d0 <outputStream::fill_to(int)> mov (%rbx),%rax lea -0x42b955(%rip),%rcx # 0x7ffff53b7bd0 <CodeBlob::is_nmethod() const> mov 0x10(%rax),%rax cmp %rcx,%rax jne 0x7ffff57e39f8 <CodeHeapState::print_names(outputStream*, CodeHeap*)+3064> xor %r13d,%r13d mov %r13,%rdi callq 0x7ffff586b7e0 <CompiledMethod::nmethod_access_is_safe(nmethod*)> Instead of calling CodeBlob::is_nmethod(), the vtable entry is compared.
09-01-2019

Looking at the code, I also have no idea how it can happen that we jump into the code heap from CodeCache::print_names. Maybe something went wrong with signal handling in the SafeFetch32 stub? The issue first showed up on September 9th, 2018 (not sure why no bug was filed back then). Since then, it reproduced 5 times (always on macosx-x64 but on different machines).
09-01-2019

I was finally able to load the huge core file with lldb but it doesn't help much: (lldb) bt * thread #14, stop reason = signal SIGSTOP * frame #0: 0x00007fff5d63fb6e libsystem_kernel.dylib`__pthread_kill + 10 frame #1: 0x00007fff5d80a080 libsystem_pthread.dylib`pthread_kill + 333 frame #2: 0x00007fff5d59b1ae libsystem_c.dylib`abort + 127 frame #3: 0x00000001083f5617 libjvm.dylib`os::abort(bool, void*, void const*) + 141 frame #4: 0x000000010861b18b libjvm.dylib`VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long) + 2759 frame #5: 0x000000010861a69e libjvm.dylib`VMError::report_and_die(Thread*, unsigned int, unsigned char*, void*, void*, char const*, ...) + 152 frame #6: 0x000000010861b1d9 libjvm.dylib`VMError::report_and_die(Thread*, unsigned int, unsigned char*, void*, void*) + 33 frame #7: 0x00000001083f9bba libjvm.dylib`JVM_handle_bsd_signal + 519 frame #8: 0x00000001083f73b8 libjvm.dylib`signalHandler(int, __siginfo*, void*) + 112 frame #9: 0x00007fff5d7fdf5a libsystem_platform.dylib`_sigtramp + 26 frame #10: 0x0000000111cabf81 frame #11: 0x0000000107d6d697 libjvm.dylib`CodeCache::print_names(outputStream*) + 105 frame #12: 0x0000000107db9051 libjvm.dylib`CompileBroker::print_heapinfo(outputStream*, char const*, char const*) + 795 frame #13: 0x0000000107eaaa13 libjvm.dylib`DCmd::parse_and_execute(DCmdSource, outputStream*, char const*, char, Thread*) + 473 frame #14: 0x0000000107bfaa6b libjvm.dylib`jcmd(AttachOperation*, outputStream*) + 122 frame #15: 0x0000000107bfa419 libjvm.dylib`attach_listener_thread_entry(JavaThread*, Thread*) + 717 frame #16: 0x000000010859221c libjvm.dylib`JavaThread::thread_main_inner() + 510 frame #17: 0x0000000108591cf6 libjvm.dylib`JavaThread::run() + 684 frame #18: 0x000000010858e65f libjvm.dylib`Thread::call_run() + 131 frame #19: 0x00000001083f4d14 libjvm.dylib`thread_native_entry(Thread*) + 329 frame #20: 0x00007fff5d807661 libsystem_pthread.dylib`_pthread_body + 340 frame #21: 0x00007fff5d80750d libsystem_pthread.dylib`_pthread_start + 377 frame #22: 0x00007fff5d806bf9 libsystem_pthread.dylib`thread_start + 13 (lldb) f 11 frame #11: 0x0000000107d6d697 libjvm.dylib`CodeCache::print_names(outputStream*) + 105 libjvm.dylib`CodeCache::print_names: 0x107d6d697 <+105>: incl -0x20(%rbp) 0x107d6d69a <+108>: movq 0xcb2cf7(%rip), %rsi ; CodeCache::_allocable_heaps 0x107d6d6a1 <+115>: movl 0x18(%rsi), %edx 0x107d6d6a4 <+118>: movq %rbx, %rdi (lldb) di -s 0x107d6d697-30 -c 20 libjvm.dylib`CodeCache::print_names: 0x107d6d679 <+75>: leal -0x48(%rbp), %ebx 0x107d6d67c <+78>: leaq -0x30(%rbp), %r15 0x107d6d680 <+82>: movq -0x28(%rbp), %rdi 0x107d6d684 <+86>: movl -0x20(%rbp), %esi 0x107d6d687 <+89>: callq 0x107d6dfd2 ; GrowableArray<CodeHeap*>::at(int) const 0x107d6d68c <+94>: movq (%rax), %rsi 0x107d6d68f <+97>: movq %r14, %rdi 0x107d6d692 <+100>: callq 0x107d75a70 ; CodeHeapState::print_names(outputStream*, CodeHeap*) 0x107d6d697 <+105>: incl -0x20(%rbp) 0x107d6d69a <+108>: movq 0xcb2cf7(%rip), %rsi ; CodeCache::_allocable_heaps 0x107d6d6a1 <+115>: movl 0x18(%rsi), %edx 0x107d6d6a4 <+118>: movq %rbx, %rdi 0x107d6d6a7 <+121>: callq 0x1083bea60 ; GrowableArrayIterator<CodeHeap*>::GrowableArrayIterator(GrowableArray<CodeHeap*> const*, int) 0x107d6d6ac <+126>: movq %r15, %rdi 0x107d6d6af <+129>: movq %rbx, %rsi 0x107d6d6b2 <+132>: callq 0x107d68e5c ; GrowableArrayIterator<CodeHeap*>::operator!=(GrowableArrayIterator<CodeHeap*> const&) 0x107d6d6b7 <+137>: testb %al, %al 0x107d6d6b9 <+139>: jne 0x107d6d680 ; <+82> 0x107d6d6bb <+141>: addq $0x38, %rsp 0x107d6d6bf <+145>: popq %rbx
09-01-2019

Working on the issue... I have no idea so far how it may happen that program control branches from print_names (c++ code) into the code heap.
09-01-2019

I've attached the hs_err file. The crash is very intermittent and only happens with an internal long running test suite that runs several different stress tests (JDK-8209950 was found by the same test). Also, it only ever showed up on Mac OSX. The test fires different jcmds while the VM is running some stress tests in more or less random order: [JcmdPickerModule] Starting: /scratch/mesos/jib-master/install/jdk12-jdk.1183/macosx-x64-debug.jdk/jdk-12/fastdebug/bin/jcmd 20928 Compiler.codecache finished at 1546739225146 (Sun Jan 06 01:47:05 GMT 2019) [JcmdPickerModule] Verified: /scratch/mesos/jib-master/install/jdk12-jdk.1183/macosx-x64-debug.jdk/jdk-12/fastdebug/bin/jcmd 20928 Compiler.codecache finished at 1546739229475 (Sun Jan 06 01:47:09 GMT 2019) [JcmdPickerModule] Starting: /scratch/mesos/jib-master/install/jdk12-jdk.1183/macosx-x64-debug.jdk/jdk-12/fastdebug/bin/jcmd 20928 Compiler.CodeHeap_Analytics finished at 1546739259544 (Sun Jan 06 01:47:39 GMT 2019) [JcmdPickerModule] Failed: /scratch/mesos/jib-master/install/jdk12-jdk.1183/macosx-x64-debug.jdk/jdk-12/fastdebug/bin/jcmd 20928 Compiler.CodeHeap_Analytics finished at 1546739390506 (Sun Jan 06 01:49:50 GMT 2019) [JcmdPickerModule] The output is saved to file: Failed.Compiler.CodeHeap_Analytics.1546739259544.out The above mentioned output file only contains: 20928:
09-01-2019

Unfortunately, it seems that the test suite did not log the output. Probably it's discarded for disk space reasons.
09-01-2019

I found this hint in the hs_err file: Stack slot to memory mapping: stack at sp + 0 slots: 0x0000000107d75f58: _ZN13CodeHeapState11print_namesEP12outputStreamP8CodeHeap+0x4e8 in /scratch/mesos/jib-master/install/jdk12-jdk.1183/macosx-x64-debug.jdk/jdk-12/fastdebug/lib/server/libjvm.dylib at 0x0000000107a00000 stack at sp + 1 slots: 0x000000010868b963: __cxx_global_var_init+0xcde5 in /scratch/mesos/jib-master/install/jdk12-jdk.1183/macosx-x64-debug.jdk/jdk-12/fastdebug/lib/server/libjvm.dylib at 0x0000000107a00000 stack at sp + 2 slots: 0x000000010898caa0: _ZTV12ResourceMark+0x10 in /scratch/mesos/jib-master/install/jdk12-jdk.1183/macosx-x64-debug.jdk/jdk-12/fastdebug/lib/server/libjvm.dylib at 0x0000000107a00000 So it appears we were at 0x0000000107d75f58 before "calling into death": (lldb) di -s 0x0000000107d75f58-32 -c 20 libjvm.dylib`CodeHeapState::print_names: 0x107d75f38 <+1224>: xorl $0x97be3f, %eax ; imm = 0x97BE3F 0x107d75f3d <+1229>: callq 0x1083ff948 ; outputStream::print(char const*, ...) 0x107d75f42 <+1234>: movl $0x21, %esi 0x107d75f47 <+1239>: movq %rbx, %rdi 0x107d75f4a <+1242>: callq 0x1083ffaa0 ; outputStream::fill_to(int) 0x107d75f4f <+1247>: movq (%r15), %rax 0x107d75f52 <+1250>: movq %r15, %rdi 0x107d75f55 <+1253>: callq *0x10(%rax) 0x107d75f58 <+1256>: testb %al, %al 0x107d75f5a <+1258>: movq %r15, %rbx 0x107d75f5d <+1261>: jne 0x107d75f61 ; <+1265> 0x107d75f5f <+1263>: xorl %ebx, %ebx 0x107d75f61 <+1265>: movq %rbx, %rdi 0x107d75f64 <+1268>: callq 0x107dc1464 ; CompiledMethod::nmethod_access_is_safe(nmethod*) 0x107d75f69 <+1273>: testb %al, %al 0x107d75f6b <+1275>: je 0x107d760bc ; <+1612> 0x107d75f71 <+1281>: leaq -0xf8(%rbp), %r15 0x107d75f78 <+1288>: movq 0x80(%rbx), %rax 0x107d75f7f <+1295>: movq %rax, -0x80(%rbp) 0x107d75f83 <+1299>: leaq -0x138(%rbp), %rdi Looks like we are right after the call to ast->fill_to(33): http://hg.openjdk.java.net/jdk/jdk/file/7d8676b2487f/src/hotspot/share/code/codeHeapState.cpp#l2167 0x107d75f55 <+1253>: callq *0x10(%rax) Assuming that RAX=0x0000000111c92c80: (lldb) x/a 0x0000000111c92c80+0x10 0x111c92c90: 0x0000000111cabd80 But for some reason we hit the SIGILL only at 0x0000000111cabf80.
09-01-2019

Didn't see the last comment while typing, looking into it...
09-01-2019

Thanks a lot for your efforts! Any idea where the output could have gone? There is a lot of output before print_names is even called. Could lldb give a hint on the source code line? Offset 105(dec) into print_names isn't very far. - CodeCache is not segmented (48M, too small). - heap->low_boundary() can't cause problems. It is called later to print information into hs_err file. - first theoretically possible failure location would be @ bool blob_initialized = ... Not very enlightening so far, continuing to think...
09-01-2019

Is there some more information available? hs_err* file? SIGSEGV would be easier to understand than SIGILL. What does the stress test do? Just fire jcmds as fast as it can? Which jcmds? What is the target system doing? Heavy compilation? Thanks for the info. Need it to maybe build a repo case. No extreme rush required in providing this. I have to leave now. Will take care of the issue asap. Probably not before tomorrow morning.
08-01-2019

Very similar to JDK-8209950, probably the checks added to CompiledMethod::nmethod_access_is_safe are not sufficient. ILW = Crash while printing code heap statistics, intermittent with diagnostics (jcmd), no workaround (but can disable diagnostic option) = HLM = P3
08-01-2019