JDK-8218266 : G1 crash in AccessInternal::PostRuntimeDispatch
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 13
  • Priority: P2
  • Status: Closed
  • Resolution: Fixed
  • OS: windows
  • CPU: x86_64
  • Submitted: 2019-02-03
  • Updated: 2019-10-25
  • Resolved: 2019-02-28
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 13
13 b11Fixed
Related Reports
Relates :  
Description
Test:

java/lang/ClassLoader/forNameLeak/ClassForNameLeak.java

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x00007ff87544ac2a, pid=5556, tid=18440
#
# JRE version: Java(TM) SE Runtime Environment (13.0) (fastdebug build 13-internal+0-jdk13-jdk.303)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 13-internal+0-jdk13-jdk.303, compiled mode, sharing, tiered, compressed oops, g1 gc, windows-amd64)
# Problematic frame:
# V  [jvm.dll+0x2eac2a]  AccessInternal::PostRuntimeDispatch<G1BarrierSet::AccessBarrier<663668,G1BarrierSet>,2,663668>::oop_access_barrier+0xa
#
# Core dump will be written. Default location: T:\\testOutput\\test-support\\jtreg_open_test_jdk_jdk_lang\\scratch\\0\\hs_err_pid5556.mdmp
#
# An error report file with more information is saved as:
# T:\\testOutput\\test-support\\jtreg_open_test_jdk_jdk_lang\\scratch\\0\\hs_err_pid5556.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
Comments
There is stale memory for the pd_set for the class that this test runs into. I didn't think that was possible when I removed the code that cleaned the pd sets for live dictionary entries, but this test does it. To further complicate things, the protection domain cache table is now cleaned concurrently and the pd_set points at table entries.
23-02-2019

Looking at the crash from Feb 7, the caller of oop_access_barrier() appears to be DictionaryEntry::contains_protection_domain(): Stack slot to memory mapping: stack at sp + 0 slots: 0x00007ffbbc5e2768 jvm.dll::DictionaryEntry::contains_protection_domain + 0xc8 which means that bad value is a protection_domain oop.
23-02-2019

All of the failing CI tasks are with -Xcomp -XX:+TieredCompilation. Also, in every failing CI task, jvm.dll was loaded "far away", so that references to locations in that library from compiled Java code can't use %RIP+32b-offset addressing, and instead likely use Rscratch1 as an intermediate temporary. The crash being in PostRuntimeDispatch is likely a red herring. I suspect we're clobbering a value that was stashed in Rscratch1, and then bad things happen. This looks very similar to JDK-8203466. Because it looks like a clobbered register in compiled code kind of problem, I'm moving this to the compiler subcomponent. There doesn't seem to be anything really GC-related here. Also lowering the priority to P2. I've not been able to reproduce the failure, even with -XX:+ForceUnreachable. 100 times each, with and without that option. Since we've had 8 failures in roughly 150 CI executions of this test, this suggests being loaded "far away" has more impact than that option. (Changing Assembler::reachable to return false even for 32-bit displacements, under control of ForceUnreachable, fails to compile the test when that option is enabled.) Despite the discussion in JDK-8203466, we're still using setup/restore_arg_regs that uses Rscratch1 to preserve RSI. If anything in the scope of one of those regions is sensitive to jvm.dll being far away, that could cause something like these failures too. However, I *think* the remaining uses are okay. There are two that at first glance look suspicious: generate_checkcast_copy -- saves R10 (Rscratch1) on the stack within the setup/restore pair generate_generic_copy -- jumps into checkcast_copy after the setup but before the save of R10 So they seem to be okay. The other uses are: generate_disjoint_byte_copy generate_conjoint_byte_copy generate_disjoint_short_copy generate_conjoint_short_copy generate_multipyToLen generate_squareToLen generate_mulAdd Also, none of this stub generation code seems to have changed in the right timeframe to be associated with these failures.
06-02-2019

The test performs the following steps: (1) In a separate thread for each, create a class loader, load a class from it, run some code from that class, and capture the class loader in a PhantomReference that is recorded. (2) Do a bunch of full GCs. (3) Verifies the PhantomReferences referring to the class loaders were cleared. The crash seems to be near the beginning of that verification, since the expected 10 full GCs have completed, but no messages from the verification loop (successful or not) have been printed.
05-02-2019

Looking at the mach5 history, we have 8 failures like this (as of 02 Feb 2019), starting 16 Jan 2019, all on Windows, all mostly similar. All failures are with G1 as the (default) collector. As noted above, the Java part of stack trace for all of these failures is the same. In each case, there is one native frame below that. That native frame varies in incomprehensible ways though. It's always a call to AccessInternal::PostRuntimeDispatch<>::oop_access_barrier, but the template parameters for PostRuntimeDispach vary strangly. 2/03 (jdk-13.303) V [jvm.dll+0x2eac2a] AccessInternal::PostRuntimeDispatch<G1BarrierSet::AccessBarrier<663668,G1BarrierSet>,2,663668>::oop_access_barrier+0xa (access.inline.hpp:83) 1/29 (jdk-13.246) # V [jvm.dll+0x2ed75a] AccessInternal::PostRuntimeDispatch<G1BarrierSet::AccessBarrier<598100,G1BarrierSet>,2,598100>::oop_access_barrier+0xa 1/28 (jdk-13.245) # V [jvm.dll+0x2ed75a] AccessInternal::PostRuntimeDispatch<G1BarrierSet::AccessBarrier<598100,G1BarrierSet>,2,598100>::oop_access_barrier+0xa 1/25 (jdk-13.231) # V [jvm.dll+0x2ed8ca] AccessInternal::PostRuntimeDispatch<CardTableBarrierSet::AccessBarrier<663636,CardTableBarrierSet>,2,663636>::oop_access_barrier+0xa 1/24 (jdk-13.220) # V [api-ms-win-crt-environment-l1-1-0.dll+0x2ee38a] AccessInternal::PostRuntimeDispatch<CardTableBarrierSet::AccessBarrier<598132,CardTableBarrierSet>,2,598132>::oop_access_barrier+0xa 1/19 (jdk-13.178) # V [jvm.dll+0x44769a] AccessInternal::PostRuntimeDispatch<CardTableBarrierSet::AccessBarrier<1196148,CardTableBarrierSet>,2,1196148>::oop_access_barrier+0xa 1/18 (jdk-13.173) # V [jvm.dll+0x2ed95a] AccessInternal::PostRuntimeDispatch<CardTableBarrierSet::AccessBarrier<663636,CardTableBarrierSet>,2,663636>::oop_access_barrier+0xa 1/16 (jdk-13.147) # V [jvm.dll+0x2ed93a] AccessInternal::PostRuntimeDispatch<EpsilonBarrierSet::AccessBarrier<663636,EpsilonBarrierSet>,2,663636>::oop_access_barrier+0xa I don't have an explanation for the different barrier set types and decorators in those entries. Wrong mapping of debug info has been suggested, but it seems really odd that we would get something that looks "close" to right, but is so fundamentally wrong.
05-02-2019

Stack trace from hs_err for all occurrences of this issue (8) in CI: Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) v ~RuntimeStub::resolve_virtual_call J 4343 c1 ClassForNameLeak.main([Ljava/lang/String;)V (138 bytes) @ 0x000000a75b10bcac [0x000000a75b10aea0+0x0000000000000e0c] v ~StubRoutines::call_stub J 980 jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; java.base@13-internal (0 bytes) @ 0x000000a761cf441e [0x000000a761cf4300+0x000000000000011e] J 979 c2 jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; java.base@13-internal (104 bytes) @ 0x000000a761cf3e38 [0x000000a761cf3d00+0x0000000000000138] J 977 c2 jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; java.base@13-internal (10 bytes) @ 0x000000a761cf39c8 [0x000000a761cf38e0+0x00000000000000e8] J 4189 c1 java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; java.base@13-internal (65 bytes) @ 0x000000a75b05ffb4 [0x000000a75b05fd40+0x0000000000000274] J 4173 c1 com.sun.javatest.regtest.agent.MainWrapper$MainThread.run()V (476 bytes) @ 0x000000a75b052a24 [0x000000a75b050040+0x00000000000029e4] J 4171 c1 java.lang.Thread.run()V java.base@13-internal (17 bytes) @ 0x000000a75b04ec24 [0x000000a75b04eb00+0x0000000000000124] v ~StubRoutines::call_stub The memory address it SIGSEGVs on is always outside the Java heap, and outside of other large GC memory areas.
04-02-2019