JDK-8254252 : Generic arraycopy stub overwrites callee-save rdi register on 64-bit Windows
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 15,16
  • Priority: P2
  • Status: Closed
  • Resolution: Fixed
  • OS: windows
  • CPU: x86_64
  • Submitted: 2020-10-08
  • Updated: 2021-01-19
  • Resolved: 2020-10-14
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 16
16 b20Fixed
Related Reports
Duplicate :  
Relates :  
Relates :  
Relates :  
Description
The following test failed in the JDK16 CI:

compiler/c1/6551887/Test.java 

Here's a snippet from the log file:

----------stdout:(18/1094)*----------
# To suppress the following error report, specify this argument
# after -XX: or in .hotspotrc: SuppressErrorAt=\\oops/compressedOops.inline.hpp:100
#
# A fatal error has been detected by the Java Runtime Environment:
#
# Internal Error (t:\\workspace\\open\\src\\hotspot\\share\\oops/compressedOops.inline.hpp:100), pid=747596, tid=814580
# assert(!is_null(v)) failed: narrow klass value can never be zero
#
# JRE version: Java(TM) SE Runtime Environment (16.0+19) (fastdebug build 16-ea+19-971)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 16-ea+19-971, compiled mode, tiered, compressed oops, g1 gc, windows-amd64)
# Core dump will be written. Default location: T:\\testoutput\\test-support\\jtreg_closed_test_hotspot_jtreg_tier2_compiler\\scratch\\0\\hs_err_pid747596.mdmp
#
# An error report file with more information is saved as:
# T:\\testoutput\\test-support\\jtreg_closed_test_hotspot_jtreg_tier2_compiler\\scratch\\0\\hs_err_pid747596.log
#
# If you would like to submit a bug report, please visit:
# https://bugreport.java.com/bugreport/crash.jsp
#
result: Error. Agent communication error: java.net.SocketException: Connection reset; check console log for any additional details


Here's the crashing thread's stack:

--------------- T H R E A D ---------------

Current thread (0x000001a5cddd8bb0): JavaThread "AgentVMThread" [_thread_in_vm, id=814580, stack(0x0000000101a00000,0x0000000101b00000)]

Stack: [0x0000000101a00000,0x0000000101b00000]
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V [jvm.dll+0xa53ea1] os::platform_print_native_stack+0xf1 (os_windows_x86.cpp:236)
V [jvm.dll+0xc6e627] VMError::report+0xf97 (vmError.cpp:731)
V [jvm.dll+0xc6ffee] VMError::report_and_die+0x7de (vmError.cpp:1547)
V [jvm.dll+0xc706a4] VMError::report_and_die+0x64 (vmError.cpp:1340)
V [jvm.dll+0x444c87] report_vm_error+0x117 (debug.cpp:267)
V [jvm.dll+0x1a774] oopDesc::klass+0x44 (oop.inline.hpp:81)
V [jvm.dll+0xb0df9d] SharedRuntime::slow_arraycopy_C+0x12d (sharedRuntime.cpp:1997)
C 0x000001a5eeb436d5

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
v ~RuntimeStub::_slow_arraycopy_Java
J 7360% c2 Test.testGeneric(Ljava/lang/Object;ILjava/lang/Object;II)V (26 bytes) @ 0x000001a5f670ac3c [0x000001a5f670a7e0+0x000000000000045c]
J 7357 c1 Test.main([Ljava/lang/String;)V (739 bytes) @ 0x000001a5f0655c84 [0x000001a5f0655680+0x0000000000000604]
v ~StubRoutines::call_stub
J 1527 jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; java.base@16-ea (0 bytes) @ 0x000001a5f65548f3 [0x000001a5f6554820+0x00000000000000d3]
J 1526 c2 jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; java.base@16-ea (117 bytes) @ 0x000001a5f654465c [0x000001a5f6544540+0x000000000000011c]
J 1524 c2 jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; java.base@16-ea (10 bytes) @ 0x000001a5f6565274 [0x000001a5f65651e0+0x0000000000000094]
J 7352 c1 com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run()V (170 bytes) @ 0x000001a5f0276adc [0x000001a5f0276100+0x00000000000009dc]
J 6535 c1 java.lang.Thread.run()V java.base@16-ea (17 bytes) @ 0x000001a5f03d7f9c [0x000001a5f03d7e80+0x000000000000011c]
v ~StubRoutines::call_stub

--------------- P R O C E S S ---------------

Here's the description for the test task:

Run test closed/test/hotspot/jtreg/:tier2_compiler with windows-x64-debug with -Xcomp -XX:+CreateCoredumpOnCrash -ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:+TieredCompilation -XX:+VerifyOops 

I'm starting this bug in hotspot/gc for initial triage.
Comments
Changeset: 31d9b7fe Author: Tobias Hartmann <thartmann@openjdk.org> Date: 2020-10-14 07:26:13 +0000 URL: https://git.openjdk.java.net/jdk/commit/31d9b7fe
14-10-2020

Summary: Since JDK-8241825, MacroAssembler::load_klass requires a temporary register to decode the klass pointer. In the generic arraycopy stub, rdi is used for that on 64-bit Windows because r9 is already used as an argument register: https://hg.openjdk.java.net/jdk/jdk/rev/0bb101fbeb10#l17.32 The problem is that rdi is callee-save [1] but not restored when returning from the stub. This leads to register corruption and more or less random crashes in the caller. Although JDK-8241825 is part of JDK 15, this was never a problem because we did not set the _WIN64 macro in adlc and as a result accidentally treated rdi (and rsi) as caller-save: https://github.com/openjdk/jdk/blob/b9873e18330b7e43ca47bc1c0655e7ab20828f7a/src/hotspot/cpu/x86/x86_64.ad#L89 Now that this got fixed as part of JDK-8248238 [2], we hit the bug. [1] https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention?view=vs-2019 [2] https://openjdk.github.io/cr/?repo=jdk&pr=212&range=11#sdiff-8
12-10-2020

Found the offending change: https://openjdk.github.io/cr/?repo=jdk&pr=212&range=11#sdiff-8 My current working hypothesis is that before JDK-8248238, we did not set _WIN64 in adlc and setting it now triggers an existing bug. This code in x86_64.ad is suspicious: #ifdef _WIN64 reg_def RSI (SOC, SOE, Op_RegI, 6, rsi->as_VMReg()); reg_def RSI_H(SOC, SOE, Op_RegI, 6, rsi->as_VMReg()->next()); reg_def RDI (SOC, SOE, Op_RegI, 7, rdi->as_VMReg()); reg_def RDI_H(SOC, SOE, Op_RegI, 7, rdi->as_VMReg()->next());
09-10-2020

ILW = Assert due to unexpected null narrow klass value, intermittent but reproducible, no known workaround but disable compilation = HMM = P2 I can not reproduce this in mainline but with a different test in the Valhalla CI. I've narrowed it down to the Windows AArch64 Support (JDK-8248238) but maybe that one just triggers it. I have no idea how an aarch64 specific change can trigger this failure on Windows x86_64 but looking through the changes, this seems incorrect: -cp $DEVKIT_ROOT/VC/redist/x64/$MSVCP_DLL $DEVKIT_ROOT/VC/bin/x64 +cp $DEVKIT_ROOT/VC/redist/x64/$MSVCR_DLL $DEVKIT_ROOT/VC/bin/x64 +cp $DEVKIT_ROOT/VC/redist/arm64/$MSVCP_DLL $DEVKIT_ROOT/VC/bin/arm64 +cp $DEVKIT_ROOT/VC/redist/arm64/$MSVCP_DLL $DEVKIT_ROOT/VC/bin/arm64 https://openjdk.github.io/cr/?repo=jdk&pr=212&range=11#udiff-6 I've filed JDK-8254311 for this independent issue.
09-10-2020

Forwarding to compiler team for further evaluation since it is a compiler test. No GC event happened yet according to hs_err log, so it is very unlikely that this is a GC issue.
08-10-2020

I original matched this failure to the following bug: JDK-8253081 G1 fails on stale objects in archived module graph in Open Archive regions However, Thomas pointed out that JDK-8253081 was caused by the following fix: JDK-8244778 Archive full module graph in CDS and the following fix was used to disable that failure mode: JDK-8253261 Disable CDS full module graph until JDK-8253081 is fixed The fix for JDK-8253261 was integrated on 2020.09.16. The jdk-16+19-971 build-ID was created on 2020-10-06 16:44 so it must include the fix for JDK-8253261.
08-10-2020