JDK-8299074 : [AArch64] compiler/codecache/stress/UnexpectedDeoptimizationAllTest.java fails with NullPointerException
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 19,20,21
  • Priority: P3
  • Status: In Progress
  • Resolution: Unresolved
  • CPU: aarch64
  • Submitted: 2022-12-20
  • Updated: 2023-01-13
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 21
21Unresolved
Description
Has only been seen on linux-aarch64 so far, debug versions seem to be not affected. The VM had additional arguments:

-Xcomp -XX:+CreateCoredumpOnCrash -XX:TieredStopAtLevel=1

We've got a stack trace:

java.lang.Error: Exception occurred during test execution
	at compiler.codecache.stress.CodeCacheStressRunner.runTest(CodeCacheStressRunner.java:42)
	at compiler.codecache.stress.UnexpectedDeoptimizationAllTest.main(UnexpectedDeoptimizationAllTest.java:67)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
	at java.base/java.lang.reflect.Method.invoke(Method.java:578)
	at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:125)
	at java.base/java.lang.Thread.run(Thread.java:1623)
Caused by: java.lang.NullPointerException: Cannot invoke "[Ljava.lang.Class;.clone()" because "this.parameterTypes" is null
	at java.base/java.lang.reflect.Method.getParameterTypes(Method.java:316)
	at java.base/jdk.internal.reflect.MethodHandleAccessorFactory.slotCount(MethodHandleAccessorFactory.java:348)
	at java.base/jdk.internal.reflect.MethodHandleAccessorFactory.useNativeAccessor(MethodHandleAccessorFactory.java:332)
	at java.base/jdk.internal.reflect.MethodHandleAccessorFactory.newConstructorAccessor(MethodHandleAccessorFactory.java:96)
	at java.base/jdk.internal.reflect.ReflectionFactory.newConstructorAccessor(ReflectionFactory.java:200)
	at java.base/java.lang.reflect.Constructor.acquireConstructorAccessor(Constructor.java:547)
	at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:497)
	at java.base/java.lang.reflect.ReflectAccess.newInstance(ReflectAccess.java:128)
	at java.base/jdk.internal.reflect.ReflectionFactory.newInstance(ReflectionFactory.java:304)
	at java.base/java.lang.Class.newInstance(Class.java:685)
	at compiler.codecache.stress.Helper$TestCase.get(Helper.java:124)
	at compiler.codecache.stress.CodeCacheStressRunner.test(CodeCacheStressRunner.java:47)
	at jdk.test.lib.TimeLimitedRunner.call(TimeLimitedRunner.java:71)
	at compiler.codecache.stress.CodeCacheStressRunner.runTest(CodeCacheStressRunner.java:40)
	... 5 more

Different failure mode:

java.lang.Error: Exception occurred during test execution
	at compiler.codecache.stress.CodeCacheStressRunner.runTest(CodeCacheStressRunner.java:42)
	at compiler.codecache.stress.UnexpectedDeoptimizationAllTest.main(UnexpectedDeoptimizationAllTest.java:67)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
	at java.base/java.lang.reflect.Method.invoke(Method.java:578)
	at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:125)
	at java.base/java.lang.Thread.run(Thread.java:1623)
Caused by: java.lang.NullPointerException: Cannot read the array length because "this.parameterTypes" is null
	at java.base/java.lang.reflect.Method.getParameterCount(Method.java:323)
	at java.base/jdk.internal.reflect.MethodHandleAccessorFactory.useNativeAccessor(MethodHandleAccessorFactory.java:324)
	at java.base/jdk.internal.reflect.MethodHandleAccessorFactory.newConstructorAccessor(MethodHandleAccessorFactory.java:96)
	at java.base/jdk.internal.reflect.ReflectionFactory.newConstructorAccessor(ReflectionFactory.java:200)
	at java.base/java.lang.reflect.Constructor.acquireConstructorAccessor(Constructor.java:547)
	at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:497)
	at java.base/java.lang.reflect.ReflectAccess.newInstance(ReflectAccess.java:128)
	at java.base/jdk.internal.reflect.ReflectionFactory.newInstance(ReflectionFactory.java:304)
	at java.base/java.lang.Class.newInstance(Class.java:685)
	at compiler.codecache.stress.Helper$TestCase.get(Helper.java:124)
	at compiler.codecache.stress.CodeCacheStressRunner.test(CodeCacheStressRunner.java:47)
	at jdk.test.lib.TimeLimitedRunner.call(TimeLimitedRunner.java:71)
	at compiler.codecache.stress.CodeCacheStressRunner.runTest(CodeCacheStressRunner.java:40)
Comments
I find only DependencyContext that consider if the method is already marked or not. Thus can return 0 if the method is already marked and then caller skips calling Deoptimization::deoptimize_all_marked(). If the caller now assumes that the methods are already deoptimiaezd we have a bug. I find these suspicious places: MethodHandles::flush_dependent_nmethods() MHN_clearCallSiteContext CodeCache::flush_dependents_on Non-problematic use-cases, but if someone clever adds a check to already marked it may create a bug: CodeCache::flush_dependents_on_method
13-01-2023

It seems like logical issue to assume that something marked for deopt can be assumed to also *be* deopt. (probably me that introduced this logical issue since we used to barrier this with a safepoint) I think your assessment for the bug in question is correct and your suggested fix would fix this case. But anytime the code make Deoptimization::deoptimize_all_marked() depended on if we marked anything and do not consider if it was already marked we could have a bug. So for a more general solution we should either always call Deoptimization::deoptimize_all_marked(), I don't think that would be noticeable performance-wise in general workload (maybe in some test). A better solution might be to make the call to Deoptimization::deoptimize_all_marked() dependent on if we marked or anyone else marked the nmethod we are concerned about. Yes could also use a lock and serialize the marked+deopt stage, but I'm a bit worried that when we do thread local deopts would need to grab the global lock.
13-01-2023

Thanks Robbin, I think I'll go with the solution of calling Deoptimization::deoptimize_all_marked() when we observed a marked nmethod (no matter if we marked it ourselves or another thread did). I'll look at the other methods as well.
13-01-2023

The problem is a race condition between one thread repeatedly calling WB_DeoptimizeAll and the main thread checking nmethod dependencies on class loading and also attempting marking/deoptimization of nmethods due to dependency violations. Details below. Thread1: useNativeAccessor is compiled under the assumption that java.lang.reflect.Executable has only one implementer java.lang.reflect.Method. A corresponding dependency is registered in the nmethod. Thread2: Calls Whitebox API method WB_DeoptimizeAll -> CodeCache::mark_all_nmethods_for_deoptimization() that marks useNativeAccessor for deoptimization. Thread1: Triggers class loading of java.lang.reflect.Constructor and CodeCache::flush_dependents_on -> CodeCache::mark_for_deoptimization -> ... -> DependencyContext::mark_dependent_nmethods detects that useNativeAccessor needs to be deoptimized now that java.lang.reflect.Executable has more than one implementer. However, the nmethod is already marked for deoptimization (most nmethods are) and therefore ignored. The marked counter is 0 and therefore Deoptimization::deoptimize_all_marked() is not executed either. The thread continues execution and ends up crashing because a java.lang.reflect.Constructor object is passed to compiled useNativeAccessor which can not handle it. Thread2: Is still in WB_DeoptimizeAll, marking nmethods for deoptimization but didn't get a chance to call Deoptimization::deoptimize_all_marked() yet. Before JDK-8221734 in JDK 13, WB_DeoptimizeAll acquired the Compile_lock but it got removed: http://hg.openjdk.java.net/jdk/jdk/rev/9b70ebd131b4#l15.7 I think it should be restored. [~rehn] what do you think?
12-01-2023

We SIGSEGV at 0x0000fffd19c45a68 in C1 compiled method 'useNativeAccessor(Executable member)' (see attached hs_err_pid985220.log) because the field 'Method::parameterTypes' appears to be NULL when calling 'getParameterCount()' on the 'member' argument: 0x0000fffd19c45a5c: ldr x1, [sp,#96] // Load 'member' argument from stack 0x0000fffd19c45a60: ldr w0, [x1,#48] // Load 'parameterTypes' field from 'member' which is a 'Method' 0x0000fffd19c45a64: lsl x0, x0, #3 // Load parameterTypes.length 0x0000fffd19c45a68: ldr w2, [x0,#12] ; implicit exception: dispatches to 0x0000fffd19c45bfc [...] 0x0000fffd19c45bfc: bl 0x0000fffd190cfc80 ; ImmutableOopMap {c_rarg1=Oop c_rarg0=Oop [96]=Oop } ;*arraylength {reexecute=0 rethrow=0 return_oop=0} ; - java.lang.reflect.Method::getParameterCount@4 (line 323) ; - jdk.internal.reflect.MethodHandleAccessorFactory::useNativeAccessor@29 (line 324) ; {runtime_call throw_null_pointer_exception Runtime1 stub} Looking at the stack trace, 'useNativeAccessor' was called with a 'java/lang/reflect/Constructor' argument but we are in inlined 'java.lang.reflect.Method::getParameterCount' instead of 'java.lang.reflect.Constructor::getParameterCount' and therefore load garbage: J 510 c1 jdk.internal.reflect.MethodHandleAccessorFactory.useNativeAccessor(Ljava/lang/reflect/Executable;)Z java.base@20-ea (74 bytes) J 2909 c1 jdk.internal.reflect.MethodHandleAccessorFactory.newConstructorAccessor(Ljava/lang/reflect/Constructor;)Ljdk/internal/reflect/ConstructorAccessorImpl; java.base@20-ea (84 bytes) Looks like at compile time, 'Method' was the only implementer of 'Executable' and we therefore speculatively inlined 'java.lang.reflect.Method::getParameterCount' at the virtual call site. A dependency should ensure that the nmethod is deoptimized when class loading invalidates that assumption. For some reason, that didn't work in this case. UPDATE: I verified that the dependencies are there in the nmethod: Dependencies: Dependency of type unique_concrete_method_4 context = *java.lang.reflect.Executable method = {method} {0x0000000800043198} 'getModifiers' '()I' in 'java/lang/reflect/Method' class = java.lang.reflect.Executable method = *{method} {0x0000000800433620} 'getModifiers' '()I' in 'java/lang/reflect/Executable' Dependency of type unique_concrete_method_4 context = *java.lang.reflect.Executable method = {method} {0x00000008000434b0} 'getParameterCount' '()I' in 'java/lang/reflect/Method' class = java.lang.reflect.Executable method = *{method} {0x0000000800433990} 'getParameterCount' '()I' in 'java/lang/reflect/Executable' Dependency of type unique_concrete_method_4 context = *java.lang.reflect.Executable method = {method} {0x0000000800043400} 'isVarArgs' '()Z' in 'java/lang/reflect/Method' class = java.lang.reflect.Executable method = {method} {0x00000008004338e0} 'isVarArgs' '()Z' in 'java/lang/reflect/Executable' Dependency of type unique_concrete_method_4 context = *java.lang.reflect.Executable method = {method} {0x00000008000432a0} 'getParameterTypes' '()[Ljava/lang/Class;' in 'java/lang/reflect/Method' class = java.lang.reflect.Executable method = *{method} {0x0000000800433728} 'getParameterTypes' '()[Ljava/lang/Class;' in 'java/lang/reflect/Executable'
12-01-2023

Updated ILW = Incorrect execution of C1 compiled code, reproducible with test that stresses deoptimization, disable CHA = HLM = P3
11-01-2023

This also reproduces with JDK 19 but only with --enable-preview -DhelperVirtualThread=true which uses virtual threads in the test (Virtual Threads were added in JDK 19 b21 with JDK-8284161). This might therefore well be a Loom related issue. Attached 8299074.diff modifies the test to reproduce the issue more reliably (still needs to be executed in a loop with "-Xcomp -XX:TieredStopAtLevel=1").
09-01-2023

I can reproduce this and it fails first with JDK 20 b3 (jdk-20+3-101). EDIT: It also reproduces with earlier versions (1/100) but just triggers much more often after JDK-8288425 in JDK 20 b3 (3/10).
06-01-2023

ILW = Test failure with C1, only on linux-aarch64 product seen so far and single test, no workaround = MLH = P4
20-12-2022