Bug ID: JDK-8058715 stability issues when being launched as an embedded JVM via JNI

Versions (Unresolved/Resolved/Fixed)

The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.

JDK 7	JDK 8	JDK 9
7u75Fixed	8u31Fixed	9 b40Fixed

This issue can manifest in many different ways. The common pattern appears to be stability issues with a JVM launched via JNI from a native host application. In the issue brought to us from another product team here at Oracle, consistently HotSpot would crash almost immediately after being launched with the following assertion error:

===
Internal Error (threadLocalStorage.cpp:56), pid=14368, tid=3072534208
guarantee(get_thread() == thread) failed: must be the same thread, quickly
===

This issue started when they upgraded to from 7u25 to 7u67.

SQE OK to take fix for customer's escalation to CPU15_01
06-11-2014
This is a regression caused by jdk8023956
05-11-2014
My proposed solution is to use a mmap call that will gracefully give up if the assumptions we make about our address space (namely, that the page right below the main thread's stack is available for us to monopolize) are in fact not true. As the workaround is a best-effort attempt to work around an issue that is fairly rare to begin with, this is acceptable. Looking at the code, it is clear that this behavior was the original intention of the author of the workaround. The Oracle product team that is able to reproduce this issue at will tried but was unable to create a stand-alone reproducer. I have also not been able to write a reproducible testcase for this issue. But the root cause of the issue is well understood (it is obvious that what our implementation is doing is wrong), and the fix is straightforward and safe. We have tested this fix and have confirmed that it does fix the crashes seen by the Oracle product team. I have also set up an environment where I can easily reproduce the NX kernel bug at will and have confirmed that the workaround still works exactly as expected.
05-11-2014
This is a regression resulting from jdk8023956, a best-effort workaround for the EXEC SHIELD NX kernel bug found on various ia32 Linux distributions. In jdk8023956, we allocate a page of memory several pages below the main thread's stack and run some code in it to trick to kernel into marking almost the entire user portion of the address space as "executable". Unfortunately, jdk8023956 ends up mmap:ping this page with the MAP_FIXED flag, so if the page in question is already being used (say, for example, as part of another thread's stack), we end up possibly corrupting whatever data is there.
05-11-2014