JDK-8275723 : Crash on macOS 12 in GlassRunnable::dealloc
  • Type: Bug
  • Component: javafx
  • Sub-Component: window-toolkit
  • Affected Version: 8,openjfx11,openjfx17
  • Priority: P2
  • Status: Resolved
  • Resolution: Fixed
  • OS: os_x
  • Submitted: 2021-10-21
  • Updated: 2022-04-26
  • Resolved: 2021-11-05
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 8 Other
8u311Fixed openjfx11.0.14Fixed
Related Reports
Duplicate :  
Duplicate :  
Duplicate :  
Duplicate :  
Duplicate :  
Duplicate :  
Duplicate :  
Duplicate :  
Duplicate :  
Duplicate :  
Duplicate :  
Duplicate :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Description
On MacOS 12 beta (Monterey), crashes are reported that originate from glass, using JavaFX 17 and Java 11.0.12

The relevant stacktrace for this is:

V  [libjvm.dylib]  _Z12report_fatalPKciS0_z+0xb5
V  [libjvm.dylib]  _ZN20SafepointSynchronize5blockEP10JavaThread+0x1d1
V  [libjvm.dylib]  _ZN10JavaThread44check_safepoint_and_suspend_for_native_transEPS_+0xab
V  [libjvm.dylib]  _ZN20ThreadInVMfromNativeC1EP10JavaThread+0x6d
V  [libjvm.dylib]  jni_DeleteGlobalRef+0x3e
C  [libglass.dylib]  -[GlassRunnable dealloc]+0x28
C  [Foundation]  -[_NSThreadPerformInfo dealloc]+0x2a

The final crash is triggered by an assert in safepoint.cpp in SafepointSynchronize::block(JavaThread *thread) where the passed thread is in status 4 (_thread_in_native) where it should be in _thread_in_native_trans probably. 

The trace originates from glass, where the `dealloc` invocation on GlassRunnable calls `jni_DeleteGlobalRef` and enters the VM.

At this point, I'm not sure if this is an issue in JavaFX (glass on Mac) or in hotspot.
Comments
I see, I guess this is the best option then. Thanks for the quick reply!
08-11-2021

I wouldn't recommend an EA build for production. For that you will likely need to wait until 17.0.2 is released. Having said that, the 17.0.2-ea+1 build only has a couple other fixes in addition to this fix, so if you test it with your application, it might be a stop-gap solution for you.
08-11-2021

Developers can use the latest JavaFX 18 early access build. It is already available in an 18-ea+6 build. I should add that there is also a JavaFX 17.0.2 early access build, 17.0.2+1, that contains this fix.
08-11-2021

So are EA builds intended for production?
08-11-2021

Other than the title of this issue may suggest, macOS 12 is no longer beta. As this is currently scheduled for OpenJFX 17.0.2, which is scheduled for early 2022, I wonder if there shouldn't be some kind of hotfix for 17.0.1. After all this bug makes even the simplest Hello World app crash on macOS 12. Or is there a workaround, that I missed?
08-11-2021

Changeset: 4d8e12d2 Author: Andrew Brygin <bae@openjdk.org> Committer: Johan Vos <jvos@openjdk.org> Date: 2021-11-05 08:02:42 +0000 URL: https://git.openjdk.java.net/jfx/commit/4d8e12d231476fe72742cf10c223d8baf5028677
05-11-2021

I falsely assumed that the performSelectorOnMainThread would lead to both the selector (run) and dealloc (if applicable) being executed on the main thread. Since we never saw a report like this one before macOS 12, I guess this was the behaviour until now. I believe the proposed patch is indeed the appropriate way to fix this, and indeed, we probably need a follow-up issue to check for other dealloc's.
02-11-2021

Yeah, this looks like a case where some change in behavior in a newer version of macOS has exposed a latent bug in the JavaFX glass code. And while it would be nice to know what the change in behavior was, the fix looks correct, since it is incorrect to assume NSObject::dealloc is called on any particular thread unless you know all calls to retain and release are done on the same thread. It seems worth a follow-up effort to look at other classes to see if any of them have the same problem: namely, an object that is created or retained by a thread whose dealloc method assumes it is called on the main thread.
02-11-2021

Johan, api doesn't guarantee the dealloc will be called on same thread as our selector ( via performSelectorOnMainThread). dealloc is called by GC on objects which retain count becomes 0 check this comment https://bugs.openjdk.java.net/browse/JDK-8275723?focusedCommentId=14455546&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14455546
02-11-2021

[~bae] that patch should work indeed (cleaning up at the end of the run) hence I think it's the correct approach. It is still unclear to me how we end up with not being on the main thread, even though performSelectorOnMainThread is used. That worries me a bit, especially since this didn't happen before macOS 12 -- did the contract of performSelectorOnMainThread change somehow?
02-11-2021

[~kcr] thanks for verification, I will create a PR for this change shortly.
02-11-2021

[~bae] I was just starting down the rabbit hole of trying to call the JNIDeleteGlobalRef from dealloc using performSelectorOnMainThread, but that is overly complex. I much prefer your proposed solution of calling it from the run method. I'll do some testing, including running a full set of unit tests, but I suspect it will work just fine. Do you want to create a PR for this? Otherwise, I will create a PR based on your proposed solution, and add you as a contributor of this PR.
02-11-2021

Following change makes the problem gone (tested on both aarch64 and x64 with DebugNative build): ``` diff --git a/modules/javafx.graphics/src/main/native-glass/mac/GlassApplication.m b/modules/javafx.graphics/src/main/native-glass/mac/GlassApplication.m index 686893aad9..7f73ac3e30 100644 --- a/modules/javafx.graphics/src/main/native-glass/mac/GlassApplication.m +++ b/modules/javafx.graphics/src/main/native-glass/mac/GlassApplication.m @@ -98,30 +98,21 @@ - (void)run { assert(pthread_main_np() == 1); JNIEnv *env = jEnv; - if (env != NULL) + if (env != NULL && self->jRunnable != NULL) { (*env)->CallVoidMethod(env, self->jRunnable, jRunnableRun); GLASS_CHECK_EXCEPTION(env); + + (*env)->DeleteGlobalRef(env, self->jRunnable); } + self->jRunnable = NULL; + [self release]; } [pool drain]; } -- (void)dealloc -{ - assert(pthread_main_np() == 1); - JNIEnv *env = jEnv; - if (env != NULL) - { - (*env)->DeleteGlobalRef(env, self->jRunnable); - } - self->jRunnable = NULL; - - [super dealloc]; -} - @end #pragma mark --- GlassApplication ``` With this change we do all jni work in the run() method, and have no need to control on what thread the dealloc() method is executed.
02-11-2021

I can confirm that in almost all cases, the assert in the GlassRunnable:dealloc function is satisfied: ``` assert(pthread_main_np() == 1); ``` However, in some cases, it turns out that `pthread_main_np()` is not 0, hence we are not on the main thread -- and then the crash can happen. The strange thing about this is that, if we look at the stacktrace posted above, the function is called via `performSelectorOnMainThread:withObject:waitUntilDone` I would expect that after `performSelectorOnMainThread` is called, a call to `pthread_main_np()` should always return `1` so that assert should always be true.
02-11-2021

We have seen many reports of this issue, mostly on M1 (aarch64) systems, but a few reports on an Intel x64 system. Raising to P2 as this issue has no known workaround and is very likely to occur.
02-11-2021

>So we are either getting lucky on x64, or there is some difference in the sequence of operations when running on M1 that is triggering this failure. we are lucky in one moment: additional W^X logic on macos-aarch64 allowed to catch it earlier. however bug is similar on intel and aarch64 macs. you just need a fastdebug build of jfx ( with asserts enabled) to easily catch the issue on intel mac as well.
01-11-2021

I can reproduce this quite easily on an M1 system (aarch64) running macOS 12.0.1 beta. I cannot reproduce it on an Intel x64 system. Both systems are running the same version of macOS: $ sw_vers ProductName: macOS ProductVersion: 12.0.1 BuildVersion: 21A559 So we are either getting lucky on x64, or there is some difference in the sequence of operations when running on M1 that is triggering this failure.
01-11-2021

https://developer.apple.com/library/archive/technotes/tn2109/_index.html When a secondary thread retains the target object, you have to ensure that the thread releases that reference before the main thread releases its last reference to the object. If you don't do this, the last reference to the object is released by the secondary thread, which means that the object's -dealloc method runs on that secondary thread. This is problematic if the object's -dealloc method does things that are not safe to do on a secondary thread, something that's common for UIKit objects like a view controller. So maybe the body of this if block: if (jEnv != NULL) { GlassRunnable *runnable = [[GlassRunnable alloc] initWithRunnable:(*env)->NewGlobalRef(env, jRunnable)]; [runnable performSelectorOnMainThread:@selector(run) withObject:nil waitUntilDone:NO]; } should be put in a obj-c block which should be scheduled to run on main thread or just eliminate unsafe dealloc
29-10-2021

A slowdebug build with option -XX:+VerifyJNIEnvThread gives a bit more precise picture: Current thread (0x00007faf0d035c20): JavaThread "InvokeLaterDispatcher" daemon [_thread_in_native, id=36367, stack(0x000070000d6d7000,0x000070000d7d7000)] Stack: [0x000070000d6d7000,0x000070000d7d7000], sp=0x000070000d7d5f30, free space=1019k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.dylib+0x11e871b] VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long)+0x80b V [libjvm.dylib+0x11e8d69] VMError::report_and_die(Thread*, void*, char const*, int, char const*, char const*, __va_list_tag*)+0x89 V [libjvm.dylib+0x5d2591] report_vm_error(char const*, int, char const*, char const*, ...)+0x1e1 V [libjvm.dylib+0x9b4c38] jni_DeleteGlobalRef+0x68 C [libglass.dylib+0x8288] -[GlassRunnable dealloc]+0x28 C [Foundation+0x81b7b] -[_NSThreadPerformInfo dealloc]+0x2a C [Foundation+0x4ea47] -[NSObject(NSThreadPerformAdditions) performSelectorOnMainThread:withObject:waitUntilDone:]+0x7c j com.sun.glass.ui.mac.MacApplication._submitForLaterInvocation(Ljava/lang/Runnable;)V+0 javafx.graphics@17.0.1-internal j com.sun.glass.ui.mac.MacApplication.submitForLaterInvocation(Ljava/lang/Runnable;)V+2 javafx.graphics@17.0.1-internal We can see that we a trying to execute GlassRunnable::dealloc() on InvokeLaterDispatcher thread whereas it is supposed to be running on main application thread: - (void)dealloc { assert(pthread_main_np() == 1); JNIEnv *env = jEnv; if (env != NULL) { (*env)->DeleteGlobalRef(env, self->jRunnable); } self->jRunnable = NULL; [super dealloc]; } The jEnv is created for the main thread, so the option -XX:+VerifyJNIEnvThread will trigger following assert # Internal Error (/Users/home/ws/17/jdk17/src/hotspot/share/prims/jni.cpp:686), pid=5730, tid=36367 # assert(!VerifyJNIEnvThread || (thread == Thread::current())) failed: JNIEnv is only valid in same thread With debugNative jfx build we can see following failure: Assertion failed: (pthread_main_np() == 1), function -[GlassRunnable dealloc], file /Users/home/ws/jfx/modules/javafx.graphics/src/main/native-glass/mac/GlassApplication.m, line 114 A straightforward solution seems to be to eliminate dealloc() and destroy jRunnable reference inside run() method. However there could be an ellegant way to schedule dealloc() execution on the main thread.
29-10-2021

Version added.
21-10-2021

Can you run "sw_vers" and add the output to the above "Environment" field? (so we can see the build number you are using in case it is relevant)
21-10-2021