At the beginning of JNIHandleBlock::allocate_handle(...), all thread-local JNIHandleBlocks are being iterated by the following code:
if (_top == 0) {
for (JNIHandleBlock* current = _next; current != NULL; current = current->_next) {
assert(current->_last == NULL, "only first block should have _last set");
assert(current->_free_list == NULL, "only first block should have _free_list set");
current->_top = 0;
if (ZapJNIHandleArea) current->zap();
}
...
}
If the number of thread-local JNIHandleBlocks is high, this iteration can take a significant amount of time. We particularly observed this issue for the JVMCI method "installCode", where a large number of local JNI references is being created for doing the code installation within HotSpot. With the proposed fix in place (see attachments), certain Graal parts are nearly 3x faster than before.
Another way of resolving the issue particularly for JVMCI / Graal, would be to do something similar to JNI's PushLocalFrame/PopLocalFrame around the JVMCI method "installCode". This also potentially improves the footprint a bit.
To reproduce this issue more easily, a simple example application is attached as well.