JDK-8228758 : assert(_no_handle_mark_nesting == 0) failed: allocating handle inside NoHandleMark
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 14
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2019-07-29
  • Updated: 2021-01-08
  • Resolved: 2019-09-09
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 14
14 b14Fixed
Related Reports
Relates :  
Relates :  
Description
Stress test cause crashes with 
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/scratch/mesos/slaves/00f4d7f9-7805-4b6a-aef8-9bb130db2435-S438/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/0308a46f-9717-4049-a800-d91b0568f68b/runs/8b166c2e-b35b-4aa5-a171-bc93872413d3/workspace/open/src/hotspot/share/runtime/handles.cpp:35), pid=36670, tid=39683
#  assert(_no_handle_mark_nesting == 0) failed: allocating handle inside NoHandleMark
#
# JRE version: Java(TM) SE Runtime Environment (14.0+7) (fastdebug build 14-ea+7-204)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 14-ea+7-204, mixed mode, sharing, tiered, compressed oops, g1 gc, bsd-amd64)
# Core dump will be written. Default location: /cores/core.36670
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
#

---------------  S U M M A R Y ------------

Command Line: -Xbootclasspath/a:. -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI -XX:MaxRAMPercentage=6 -XX:MaxRAMPercentage=50 -XX:+HeapDumpOnOutOfMemoryError -XX:+CrashOnOutOfMemoryError -Djava.net.preferIPv6Addresses=false -XX:+DisplayVMOutputToStderr -XX:+UsePerfData -Xlog:gc*,gc+heap=debug:gc.log:uptime,timemillis,level,tags -XX:+DisableExplicitGC -XX:+StartAttachListener -Djava.io.tmpdir=/Volumes/Mesos/mesos/work_dir/slaves/00f4d7f9-7805-4b6a-aef8-9bb130db2435-S30/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/44b90806-9674-4f0e-be9b-9281d4a6f059/runs/19a22e3d-464f-422c-86d3-058d17d50012/testoutput/test-support/jtreg_closed_test_hotspot_jtreg_applications_sparkexamples_SparkExamples24H_java/scratch/0/java.io.tmpdir -Duser.home=/Volumes/Mesos/mesos/work_dir/slaves/00f4d7f9-7805-4b6a-aef8-9bb130db2435-S30/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/44b90806-9674-4f0e-be9b-9281d4a6f059/runs/19a22e3d-464f-422c-86d3-058d17d50012/testoutput/test-support/jtreg_closed_test_hotspot_jtreg_applications_sparkexamples_SparkExamples24H_java/scratch/0/user.home -Dhadoop.root.logger=WARN,console -DSEED=1000 --add-exports=java.base/java.lang=ALL-UNNAMED --add-exports=java.base/java.util=ALL-UNNAMED --add-exports=java.base/java.util.concurrent=ALL-UNNAMED --add-exports=java.base/java.nio=ALL-UNNAMED -Dhadoop.home.dir=/Volumes/Mesos/mesos/work_dir/jib-master/install/org/apache/hadoop/common/hadoop/3.1.1/hadoop-3.1.1.tar.gz/hadoop-3.1.1 -Dspark.master=local[9] -Dspark.sql.warehouse.dir=spark_tmp -Duser.country=US -Duser.language=en applications.kitchensink.process.stress.Main /Volumes/Mesos/mesos/work_dir/slaves/00f4d7f9-7805-4b6a-aef8-9bb130db2435-S30/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/44b90806-9674-4f0e-be9b-9281d4a6f059/runs/19a22e3d-464f-422c-86d3-058d17d50012/testoutput/test-support/jtreg_closed_test_hotspot_jtreg_applications_sparkexamples_SparkExamples24H_java/scratch/0/kitchensink.final.properties

Host: MacPro6,1 x86_64 3700 MHz, 8 cores, 16G, Darwin 18.2.0
Time: Sun Jul 28 02:18:37 2019 GMT elapsed time: 71002 seconds (0d 19h 43m 22s)

---------------  T H R E A D  ---------------

Current thread (0x00007fbb050d4800):  JavaThread "SparkStressModule" [_thread_in_vm, id=39683, stack(0x000070000721a000,0x000070000731a000)] _threads_hazard_ptr=0x00007fbad40e0ae0, _nested_threads_hazard_ptr_cnt=0

Stack: [0x000070000721a000,0x000070000731a000],  sp=0x0000700007314aa0,  free space=1002k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.dylib+0xc11f81]  VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long)+0x6e5
V  [libjvm.dylib+0xc1269d]  VMError::report_and_die(Thread*, void*, char const*, int, char const*, char const*, __va_list_tag*)+0x47
V  [libjvm.dylib+0x41ab68]  report_vm_error(char const*, int, char const*, char const*, ...)+0x145
V  [libjvm.dylib+0x59f816]  HandleArea::allocate_handle(oop)+0x6a
V  [libjvm.dylib+0x41d041]  ConstantOopReadValue::ConstantOopReadValue(DebugInfoReadStream*)+0x10d
V  [libjvm.dylib+0x41c4b7]  ScopeValue::read_from(DebugInfoReadStream*)+0x139
V  [libjvm.dylib+0x41c221]  ObjectValue::read_object(DebugInfoReadStream*)+0x1d
V  [libjvm.dylib+0x41c1ae]  DebugInfoReadStream::read_object_value(bool)+0x196
V  [libjvm.dylib+0xabc1ae]  ScopeDesc::decode_object_values(int)+0xca
V  [libjvm.dylib+0xabc0b8]  ScopeDesc::ScopeDesc(CompiledMethod const*, int, int, bool, bool, bool)+0x48
V  [libjvm.dylib+0x3ad9d1]  CompiledMethod::scope_desc_at(unsigned char*)+0x9d
V  [libjvm.dylib+0xc0476a]  compiledVFrame::compiledVFrame(frame const*, RegisterMap const*, JavaThread*, CompiledMethod*)+0x64
V  [libjvm.dylib+0xbfe67f]  vframe::new_vframe(frame const*, RegisterMap const*, JavaThread*)+0xc7
V  [libjvm.dylib+0xbfe815]  vframe::sender() const+0xd5
V  [libjvm.dylib+0xc04b80]  compiledVFrame::sender() const+0x10a
V  [libjvm.dylib+0xbfe875]  vframe::java_sender() const+0x21
V  [libjvm.dylib+0x2080dd]  get_or_compute_monitor_info(JavaThread*)+0x110
V  [libjvm.dylib+0x209967]  BiasedLocking::walk_stack_and_revoke(oop, JavaThread*)+0x44f
V  [libjvm.dylib+0x20b6d6]  RevokeOneBias::do_thread(Thread*)+0x212
V  [libjvm.dylib+0x5a0754]  HandshakeThreadsOperation::do_handshake(JavaThread*)+0x112
V  [libjvm.dylib+0x5a0b87]  HandshakeState::process_self_inner(JavaThread*)+0x12f
V  [libjvm.dylib+0xab9075]  SafepointMechanism::block_or_handshake(JavaThread*)+0x4f
V  [libjvm.dylib+0xab9092]  SafepointMechanism::block_if_requested_slow(JavaThread*)+0xe
V  [libjvm.dylib+0x9b58a3]  ThreadBlockInVMWithDeadlockCheck::~ThreadBlockInVMWithDeadlockCheck()+0x83
V  [libjvm.dylib+0x9b4468]  Monitor::lock(Thread*)+0x1e8
V  [libjvm.dylib+0xb8d837]  Thread::check_for_dangling_thread_pointer(Thread*)+0x9f
V  [libjvm.dylib+0xb8e03a]  Thread::is_interrupted(Thread*, bool)+0x12
V  [libjvm.dylib+0x716fd9]  JVM_IsInterrupted+0x115
J 1039  java.lang.Thread.isInterrupted(Z)Z java.base@14-ea (0 bytes) @ 0x000000011e83dc02 [0x000000011e83dac0+0x0000000000000142]
J 51640 c2 java.util.concurrent.ThreadPoolExecutor.interruptIdleWorkers(Z)V java.base@14-ea (120 bytes) @ 0x00000001223ef228 [0x00000001223ee220+0x0000000000001008]
J 54599 c2 java.util.concurrent.ThreadPoolExecutor.shutdown()V java.base@14-ea (45 bytes) @ 0x00000001229d6614 [0x00000001229d5ec0+0x0000000000000754]
J 93267 c2 org.apache.spark.SparkContext.stop()V (321 bytes) @ 0x00000001225fa4d8 [0x00000001225f81c0+0x0000000000002318]
j  org.apache.spark.examples.graphx.SynthBenchmark$.main([Ljava/lang/String;)V+949
j  org.apache.spark.examples.graphx.SynthBenchmark.main([Ljava/lang/String;)V+4
v  ~StubRoutines::call_stub
V  [libjvm.dylib+0x60f819]  JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, Thread*)+0x3c5
V  [libjvm.dylib+0xa9861b]  invoke(InstanceKlass*, methodHandle const&, Handle, bool, objArrayHandle, BasicType, objArrayHandle, bool, Thread*)+0xc9b
V  [libjvm.dylib+0xa9772c]  Reflection::invoke_method(oop, Handle, objArrayHandle, Thread*)+0x288
V  [libjvm.dylib+0x71c6d4]  JVM_InvokeMethod+0x422
J 5889  jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; java.base@14-ea (0 bytes) @ 0x000000011ec0dbe7 [0x000000011ec0da60+0x0000000000000187]
J 19940 c2 jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; java.base@14-ea (104 bytes) @ 0x000000011f76bca0 [0x000000011f76b440+0x0000000000000860]
J 96989 c2 applications.kitchensink.process.stress.modules.SparkStressModule.runExample(Ljava/lang/String;)V (181 bytes) @ 0x0000000121453b8c [0x0000000121450da0+0x0000000000002dec]
j  applications.kitchensink.process.stress.modules.SparkStressModule.execute()V+58
j  applications.kitchensink.process.stress.modules.StressModule.run()V+109
v  ~StubRoutines::call_stub
V  [libjvm.dylib+0x60f819]  JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, Thread*)+0x3c5
V  [libjvm.dylib+0x60e309]  JavaCalls::call_virtual(JavaValue*, Klass*, Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x1f1
V  [libjvm.dylib+0x60e424]  JavaCalls::call_virtual(JavaValue*, Handle, Klass*, Symbol*, Symbol*, Thread*)+0x58
V  [libjvm.dylib+0x714e61]  thread_entry(JavaThread*, Thread*)+0x13f
V  [libjvm.dylib+0xb90cf5]  JavaThread::thread_main_inner()+0x1a1
V  [libjvm.dylib+0xb908a8]  JavaThread::run()+0x2aa
V  [libjvm.dylib+0xb8d35f]  Thread::call_run()+0x11b
V  [libjvm.dylib+0x9f80dc]  thread_native_entry(Thread*)+0x13a
C  [libsystem_pthread.dylib+0x3305]  _pthread_body+0x7e
C  [libsystem_pthread.dylib+0x626f]  _pthread_start+0x46
C  [libsystem_pthread.dylib+0x2415]  thread_start+0xd

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
J 1039  java.lang.Thread.isInterrupted(Z)Z java.base@14-ea (0 bytes) @ 0x000000011e83db8f [0x000000011e83dac0+0x00000000000000cf]
J 51640 c2 java.util.concurrent.ThreadPoolExecutor.interruptIdleWorkers(Z)V java.base@14-ea (120 bytes) @ 0x00000001223ef228 [0x00000001223ee220+0x0000000000001008]
J 54599 c2 java.util.concurrent.ThreadPoolExecutor.shutdown()V java.base@14-ea (45 bytes) @ 0x00000001229d6614 [0x00000001229d5ec0+0x0000000000000754]
J 93267 c2 org.apache.spark.SparkContext.stop()V (321 bytes) @ 0x00000001225fa4d8 [0x00000001225f81c0+0x0000000000002318]
j  org.apache.spark.examples.graphx.SynthBenchmark$.main([Ljava/lang/String;)V+949
j  org.apache.spark.examples.graphx.SynthBenchmark.main([Ljava/lang/String;)V+4
v  ~StubRoutines::call_stub
J 5889  jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; java.base@14-ea (0 bytes) @ 0x000000011ec0db6e [0x000000011ec0da60+0x000000000000010e]
J 19940 c2 jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; java.base@14-ea (104 bytes) @ 0x000000011f76bca0 [0x000000011f76b440+0x0000000000000860]
J 96989 c2 applications.kitchensink.process.stress.modules.SparkStressModule.runExample(Ljava/lang/String;)V (181 bytes) @ 0x0000000121453b8c [0x0000000121450da0+0x0000000000002dec]
j  applications.kitchensink.process.stress.modules.SparkStressModule.execute()V+58
j  applications.kitchensink.process.stress.modules.StressModule.run()V+109
v  ~StubRoutines::call_stub

Comments
URL: https://hg.openjdk.java.net/jdk/jdk/rev/647d623650d3 User: rehn Date: 2019-09-09 07:32:46 +0000
09-09-2019

I don't think we can generalize the rules as it will depend on exactly what the nature of the capability is. For example, HandleMark does actually seem reasonable for the handshake processing code because we're effectively switching to a new execution context and the HandleMark is saying "nothing allocated in this context should leak out into the caller". But yes we can take this up in another issue.
22-08-2019

I meant like we do in e.g. JRT_ENTRY, we setup things, no clue if it's going to be used or not. In HandshakeState::process_self_inner we already setup a HandleMark. With your line of thoughts, arguably that should be removed. So we have an inconsistency here, I rather have it setup everything or nothing. (that don't solve the negative capabilities issues as here) But this is not in scope of this, here I'll just remove the quick entry. Let's create an issue to think about this and continue the discussion there.
22-08-2019

> I don't see this as a handshake problem directly, but doing more VM work with JavaThread's. But it is handshakes that are causing that "VM work" to be done in JavaThreads. This can open up nested execution of code sequences that was never expected. > Another option is to create things we need for VM work in HandshakeThreadsOperation::do_handshake, e.g. adding a HandleMark, etc... I'm not sure what you mean exactly. do_handshake doesn't know either the state of the current thread or the needs of the code to be executed. Code is responsible for setting up needed capabilities to execute, that isn't something that should be deferred to do_handshake. Negative capabilities, like having a NoHandleMark, in the calling thread can't be anticipated or overridden by do_handshake.
22-08-2019

In the old case we would have safepointed and the VM thread would have done the work, therefore current thread didn't need a handle mark. Now when we execute code with the thread itself we need it. If we would have borrowed JavaThread todo safepoint work instead of having a special thread we would have this isssue. And with handshake we could do a hand-over to another thread doing the handshake work. So I don't see this as a handshake problem directly, but doing more VM work with JavaThread's. Another option is to create things we need for VM work in HandshakeThreadsOperation::do_handshake, e.g. adding a HandleMark, etc... But I still think we should get rid of quick entry, since it seem it have out-played it usefulness.
21-08-2019

I can't see any difference here: http://aurora.se.oracle.com/performance/reporting/report/robbin.ehn.no-quick.entry?mode=first I'll do a microbench just to see if it's at all visible.
21-08-2019

Just to be clear, removing QUICK_ENTRY is a point fix and does not address the general problem that has been introduced by using handshakes. We also need to understand what performance concerns drove this lack of HandleMarks.
11-08-2019

ILW = HLM = P3
06-08-2019

Thanks. Handles and HandleMarks are not expensive, so I vote to remove QUICK_ENTRY. We have recently optimized the inlining of them, so they are now even cheaper. If there is no other reason then speed.
06-08-2019

QUICK_ENTRY points allow safepoints/handshakes. // LEAF routines do not lock, GC or throw exceptions // ENTRY routines may lock, GC and throw exceptions <= this all implies safepoints // QUICK_ENTRY routines behave like ENTRY but without a handle mark I don't know the significance of the "quick" and its relation to "handle mark", but regardless safepoints are allowed.
06-08-2019

It should be. I'm still not sure if we are allowed to safepoint (check poll) inside a JVM_QUICK_ENTRY or what the expected semantic is. Either we should add NoSafepointVerifier or allow safepoints (polls, which includes handshakes).
06-08-2019

But are we guaranteed there is no other path in that code that can lead to a safepoint check? JVM_QUICK_ENTRY(jboolean, JVM_IsInterrupted(JNIEnv* env, jobject jthread, jboolean clear_interrupted)) JVMWrapper("JVM_IsInterrupted"); ThreadsListHandle tlh(thread); JavaThread* receiver = NULL; bool is_alive = tlh.cv_internal_thread_to_JavaThread(jthread, &receiver, NULL); if (is_alive) { // jthread refers to a live JavaThread. return (jboolean) Thread::is_interrupted(receiver, clear_interrupted != 0); } else { return JNI_FALSE; } JVM_END It's far from obvious to me that the TLH code is safepoint-check-free. Aside: Both Thread::is_interrupted and os::interrupted call check_for_dangling_thread_pointer! The Thread one should be deleted.
06-08-2019

Thread::check_for_dangling_thread_pointer is debug only. So as long as this is contained to that, no users are affected. ILW=LMH, which is P5 I=For user it's L I=For us it's H ? This should be investigated and fixed so I set it P3.
06-08-2019

I observed same crash building lucene test suite. Soon we might have a report from Apache Lucene team. I increases priority of this bug since it very likely affect actual users so ILW=HMM
05-08-2019

That is just the failure mode in this particular case. These "quick" entries can encounter safepoint checks and these can now lead to execution of code previously only executed by the VMThread. So far it is probably only biased-locking revocation but this is a general problem in replacing a safepoint VM operation with a handshake. You must know that the code executed at the handshake is safe to be called no matter what "state" the thread was in at the time of the safepoint/handshake poll.
05-08-2019

I disagree, the problem is calling the very complex check_for_dangling_thread_pointer from a 'quick entry'. This comment: // Ensure that the VMNativeEntryWrapper constructor, which can cause // a GC, is called outside the NoHandleMark (set via VM_QUICK_ENTRY_BASE). Suggest that we should not safepoint (check polls) ('cause GC'), since then we might need a handle mark.
05-08-2019

Yes, I see your point. But that comment and NoHandleMark seems wrong. Why do want to safepoint before we create a NoHandleMark but later safepoint with it? I don't see any reason to have the quick entry at all, I'll look a bit more, maybe there are something to it....
05-08-2019

This is a runtime bug - it is a fundamental problem with using handshakes to execute code that used to be executed by the VMThread.
30-07-2019

It is unclear to me what the significance of using the NoHandleMark is with this code - why do we need to us eit? Is it just a reflection that we expect this code to be very simple?
29-07-2019

We're executing the debug code: void Thread::check_for_dangling_thread_pointer(Thread *thread) { assert(!thread->is_Java_thread() || Thread::current() == thread || !((JavaThread *) thread)->on_thread_list() || SafepointSynchronize::is_at_safepoint() || ThreadsSMRSupport::is_a_protected_JavaThread_with_lock((JavaThread *) thread), "possibility of dangling Thread pointer"); } and have entered the: ThreadsSMRSupport::is_a_protected_JavaThread_with_lock where we hit a safepoint/handshake check which appears to indicate there is a biased-locking revocation against the current thread. Looking back up the stack we are executing JVM_IsInterrupted which is defined as a JVM_QUICK_ENTRY #define JVM_QUICK_ENTRY(result_type, header) \ extern "C" { \ result_type JNICALL header { \ JavaThread* thread=JavaThread::thread_from_jni_environment(env); \ ThreadInVMfromNative __tiv(thread); \ debug_only(VMNativeEntryWrapper __vew;) \ VM_QUICK_ENTRY_BASE(result_type, header, thread) and then we have: // QUICK_ENTRY routines behave like ENTRY but without a handle mark #define VM_QUICK_ENTRY_BASE(result_type, header, thread) \ TRACE_CALL(result_type, header) \ debug_only(NoHandleMark __hm;) \ Thread* THREAD = thread; \ os::verify_stack_alignment(); \ /* begin of body */ and there is our NoHandleMark. So the problem is, I think, that we can now execute code in the handshake in the current thread that would previously have been executed by the VMThread. So the execution of that code now invalidates the use of NoHandleMark. This could potentially invalidate all "quick entries" that could hit a safepoint/handshake check.
29-07-2019