JDK-8191360 : Lookup of critical JNI method causes duplicate library loading with leaking handler
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 8u144,9,10
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2017-11-15
  • Updated: 2019-09-13
  • Resolved: 2017-12-05
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 10 JDK 11
10 b36Fixed 11Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Relates :  
Description
java/lang/ClassLoader/nativeLibrary/NativeLibraryTest.java failed on all platforms in hs PIT:

#section:main
----------messages:(4/196)----------
command: main -Xcheck:jni NativeLibraryTest
reason: User specified action: run main/othervm/native -Xcheck:jni NativeLibraryTest 
Mode: othervm [/othervm specified]
elapsed time (seconds): 19.326
----------configuration:(0/0)----------
----------System.out:(0/0)----------
----------System.err:(15/932)----------
java.lang.RuntimeException: Expected count = 1 but got 2
	at testloader//p.Test.run(Test.java:34)
	at NativeLibraryTest.runTest(NativeLibraryTest.java:81)
	at NativeLibraryTest.main(NativeLibraryTest.java:56)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:564)
	at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:115)
	at java.base/java.lang.Thread.run(Thread.java:844)

JavaTest Message: Test threw exception: java.lang.RuntimeException: Expected count = 1 but got 2
JavaTest Message: shutting down test

STATUS:Failed.`main' threw exception: java.lang.RuntimeException: Expected count = 1 but got 2
 
Comments
I agree with David's analysis. This failure is unrelated to the problem I've fixed. Please file a new (runtime) bug.
13-12-2017

Fix summary: When creating a native wrapper, the compiler checks the shared library for a critical version of the native method (-XX:+CriticalJNINatives, see comments in JDK-7013347) that can be called with reduced overhead. This is done in NativeLookup::lookup_critical_style() with a call to dll_load() to get a handle to the (already loaded) library and a subsequent call to dll_lookup() to get the entry address of the critical version. The problem is that without a call to dll_unload(), this handle to the library will stay live and internal reference counting will prevent the library from ever being unloaded (even if the native method holder is unloaded). As a result, static fields in the library code are not reset, causing the NativeLibraryTest to fail due to unexpected field values once the library is loaded again. I've fixed this by closing the handle immediately after retrieving the critical entry address. This is fine because the library is still kept alive by JNI (see JVM_LoadLibrary). As soon as the holder class and the library are unloaded (see JVM_UnloadLibrary), the native wrapper that calls 'critical_entry' becomes unreachable and is unloaded as well. http://cr.openjdk.java.net/~thartmann/8191360/webrev.00/ I've also thought about passing 'RTLD_NOLOAD' to dlopen [1] but although that avoids loading the library, it still creates a handle (that needs to be closed) if the library is already loaded. [1] https://linux.die.net/man/3/dlopen
04-12-2017

I agree with David that this is not specific to -Xcomp but could happen with any compiled code that creates a native wrapper. This is a very old bug introduced by JDK-7013347. ILW = Duplicate loading of the same library with leaking handler, when creating native wrapper during JIT compilation, -XX:-CriticalJNINatives = MMM = P3
17-11-2017

Workaround: -XX:-CriticalJNINatives
17-11-2017

I've found out what is happening. The first call to dlopen is for the expected JVM_LoadLibrary call: V [libjvm.so+0x148a43a] os::Linux::dlopen_helper(char const*, char*, int)+0x9a V [libjvm.so+0x148a4af] os::dll_load(char const*, char*, int)+0x2f V [libjvm.so+0x1051532] JVM_LoadLibrary+0x172 C [libjava.so+0xd668] Java_java_lang_ClassLoader_00024NativeLibrary_load0+0x138 the second call happens as the JIT compiles the code that calls the native count() method. It tries to see if count() is a critical JNI method: V [libjvm.so+0x148a43a] os::Linux::dlopen_helper(char const*, char*, int)+0x9a V [libjvm.so+0x148a4af] os::dll_load(char const*, char*, int)+0x2f V [libjvm.so+0x141ee62] NativeLookup::lookup_critical_style(methodHandle const&, char*, char const*, int, bool)+0x92 V [libjvm.so+0x141f3de] NativeLookup::lookup_critical_entry(methodHandle const&)+0x38e V [libjvm.so+0x13a5a92] Method::critical_native_function()+0x22 V [libjvm.so+0x1641e3c] SharedRuntime::generate_native_wrapper(MacroAssembler*, methodHandle const&, int, BasicType*, VMRe gPair*, BasicType)+0x7c V [libjvm.so+0x162420f] AdapterHandlerLibrary::create_native_wrapper(methodHandle const&)+0x6ff V [libjvm.so+0xa94836] CompileBroker::compile_method(methodHandle const&, int, int, methodHandle const&, int, CompileTask: :CompileReason, DirectiveSet*, Thread*)+0x346 V [libjvm.so+0xa9509a] CompileBroker::compile_method(methodHandle const&, int, int, methodHandle const&, int, CompileTask: :CompileReason, Thread*)+0x7a V [libjvm.so+0xa6f38a] CompilationPolicy::compile_if_required(methodHandle const&, Thread*)+0x17a V [libjvm.so+0x121adf3] LinkResolver::resolve_static_call(CallInfo&, LinkInfo const&, bool, Thread*)+0x93 V [libjvm.so+0x1220b05] LinkResolver::resolve_invoke(CallInfo&, Handle, constantPoolHandle const&, int, Bytecodes::Code, T hread*)+0x135 V [libjvm.so+0x161f28c] SharedRuntime::find_callee_info_helper(JavaThread*, vframeStream&, Bytecodes::Code&, CallInfo&, Th read*)+0x6dc V [libjvm.so+0x161fd89] SharedRuntime::resolve_sub_helper(JavaThread*, bool, bool, Thread*)+0x1a9 V [libjvm.so+0x162068e] SharedRuntime::resolve_helper(JavaThread*, bool, bool, Thread*)+0x4e V [libjvm.so+0x1629447] SharedRuntime::resolve_static_call_C(JavaThread*)+0x117 v ~RuntimeStub::resolve_static_call J 2626 c1 p.Test.run()V (79 bytes) @ 0x00007f66f756b23c [0x00007f66f756af80+0x00000000000002bc] The handle created on this code path is never closed, and so the library is never truly unloaded at the OS level. (The four calls are an artifact of changes I made to the test - in the real case there are only two calls, but that is sufficient to break the unloading logic.) I think the compiler folk need to examine the way JNI critical lookup is performed to see if there is a way to avoid the dlopen of a library that has already been opened. And whether the leaking of the hamdle can be resolved. While this failure was clearly demonstrated by -Xcomp mode it seems to me that this problem will hit any code that gets JIT compiled and which calls a native method - yes? Which means that in general we will not be unloading native library when we should be. A short-term fix for the test is to exclude Xcomp mode.
17-11-2017

CountJVMCalls shows the same counts of JVM_LoadLibrary regardless -Xcomp flag. running a simple app which loads a library in main and exit shows 2 for both Xcomp and Xint
16-11-2017

I added printf to where dlopen and dlclosed get called in the VM. The result was very interesting. For a good run we see the expected pattern: dlopen of 0x7f0eb004f460 JNI_OnLoad: 0 Returning count = 1 Returning count = 1 JNI_OnUnload: 1 dlclose of 0x7f0eb004f460 dlopen of 0x7f0eb004f460 JNI_OnLoad: 0 Returning count = 1 Returning count = 1 JNI_OnUnload: 1 dlclose of 0x7f0eb004f460 ... But for the Xcomp run we get: dlopen of 0x7ffd34049d40 JNI_OnLoad: 0 dlopen of 0x7ffd34049d40 dlopen of 0x7ffd34049d40 dlopen of 0x7ffd34049d40 dlopen of 0x7ffd34049d40 Returning count = 1 Returning count = 1 JNI_OnUnload: 1 dlclose of 0x7ffd34049d40 dlopen of 0x7ffd34049d40 JNI_OnLoad: 1 dlopen of 0x7ffd34049d40 dlopen of 0x7ffd34049d40 dlopen of 0x7ffd34049d40 dlopen of 0x7ffd34049d40 Returning count = 2 Returning count = 2 JNI_OnUnload: 2 dlclose of 0x7ffd34049d40 This happens not only with the test's native library but all of them. I see multiple blocks of dlopen calls for the same library. /export/users/dh198349/jdk10-hs-dev/open/test/jdk > ../../../build/linux-x64-debug/images/jdk/bin/java -version dlopen of 0x7fd8e8001ab0 dlopen of 0x7fd8e8002150 dlopen of 0x7fd8e8029a50 dlopen of 0x7fd8e802a1a0 java version "10-internal" Java(TM) SE Runtime Environment (fastdebug build 10-internal+0-2017-11-07-2118353.daholme.jdk10-hs-dev) Java HotSpot(TM) 64-Bit Server VM (fastdebug build 10-internal+0-2017-11-07-2118353.daholme.jdk10-hs-dev, mixed mode) /export/users/dh198349/jdk10-hs-dev/open/test/jdk > ../../../build/linux-x64-debug/images/jdk/bin/java -Xcomp -version dlopen of 0x7f013c001ab0 dlopen of 0x7f013c002150 dlopen of 0x7f013c029ac0 dlopen of 0x7f013c02a210 dlopen of 0x1758510 dlopen of 0x1758510 dlopen of 0x1758510 dlopen of 0x1758510 dlopen of 0x7f013c002150 dlopen of 0x7f013c002150 dlopen of 0x7f013c002150 dlopen of 0x7f013c002150 java version "10-internal" Java(TM) SE Runtime Environment (fastdebug build 10-internal+0-2017-11-07-2118353.daholme.jdk10-hs-dev) Java HotSpot(TM) 64-Bit Server VM (fastdebug build 10-internal+0-2017-11-07-2118353.daholme.jdk10-hs-dev, compiled mode)
16-11-2017

Yes we seem to not actually unload the library. Trying to see how that can be ... perhaps we just don't wait long enough and Xcomp speeds something up ...
16-11-2017

here is a small test which illustrates that if dlclose isn't called for all handlers, static variables won't be reset: test.c #include <dlfcn.h> #include <stdio.h> int main() { void* so; int (*fptr)(); so = dlopen("liba.so", RTLD_LAZY | RTLD_LOCAL); fptr = (int (*)())dlsym(so, "count"); printf("%d\n", fptr()); printf("%d\n",fptr()); dlclose(so); so = dlopen("liba.so", RTLD_LAZY | RTLD_LOCAL); fptr = (int (*)())dlsym(so, "count"); printf("%d\n",fptr()); printf("%d\n",fptr()); // dlclose(so); so = dlopen("liba.so", RTLD_LAZY | RTLD_LOCAL); fptr = (int (*)())dlsym(so, "count"); printf("%d\n",fptr()); printf("%d\n",fptr()); dlclose(so); } a.c: static int _count; int count() { return _count++; }
16-11-2017

it seems we have called JNI_OnUnload, but haven't call dlclose, i.e. having a leak of library handlers.
16-11-2017

I printed out the value of count from the three native methods. Here's a good run: JNI_OnLoad: 0 Returning count = 1 Returning count = 1 JNI_OnUnload: 1 JNI_OnLoad: 0 Returning count = 1 Returning count = 1 JNI_OnUnload: 1 JNI_OnLoad: 0 Returning count = 1 Returning count = 1 JNI_OnUnload: 1 JNI_OnLoad: 0 Returning count = 1 Returning count = 1 JNI_OnUnload: 1 JNI_OnLoad: 0 Returning count = 1 Returning count = 1 JNI_OnUnload: 1 Here's the failing Xcomp run: JNI_OnLoad: 0 Returning count = 1 Returning count = 1 JNI_OnUnload: 1 JNI_OnLoad: 1 Returning count = 2 Returning count = 2 So we unloaded the library, yet on next load count is not reset to zero.
16-11-2017

The test doesn't report the unload count at the time it finds an unexpected load count. Given the C code: static jint count = 0; does this get reset to zero on each load of the native library? Answer: yes
16-11-2017

This test was added for JDK-8164512. Since this test has been passing except -Xcomp, I agree this is not a hs integration blocker. It looks like the native library is not unloaded before the second time this native library is loaded.
16-11-2017

It will be called every time the library is loaded. A library can only be loaded by one classloader at a time. If OnLoad is called a second time then the library must have been unloaded in between. The library may be getting unloaded unexpectedly in Xcomp mode.
16-11-2017

does the spec say that JNI_OnLoad will be called only once? update: it does not, at least this is how I read "The VM calls JNI_OnLoad when the native library is loaded" https://docs.oracle.com/javase/9/docs/specs/jni/invocation.html#jni_onload
16-11-2017

it seems we have this before JDK-8164512. I run a bit modified version of NativeLibraryTest.java[*] w/ 9.0.1+11. it fails java.lang.RuntimeException: Expected count = 1 but got 2 at testloader//p.Test.run(Test.java:34) w/ Xcomp and passes w/ Xint [*] diff -r f7e430cbfe34 test/jdk/java/lang/ClassLoader/nativeLibrary/libnativeLibraryTest.c --- a/test/jdk/java/lang/ClassLoader/nativeLibrary/libnativeLibraryTest.c Wed Nov 15 11:50:55 2017 -0800 +++ b/test/jdk/java/lang/ClassLoader/nativeLibrary/libnativeLibraryTest.c Wed Nov 15 17:46:28 2017 -0800 @@ -29,7 +29,7 @@ static jint count = 0; static jclass test_class; -static jint current_jni_version = JNI_VERSION_10; +static jint current_jni_version = JNI_VERSION_9; JNIEXPORT jint JNICALL JNI_OnLoad(JavaVM *vm, void *reserved) { @@ -61,10 +61,6 @@ (*env)->FatalError(env, "Exception thrown"); } - cl = (*env)->FindClass(env, "p/Test"); - if (cl != NULL) { - (*env)->FatalError(env, "p/Test class should not be found"); - } } JNIEXPORT jint JNICALL
16-11-2017

moved to hotspot/compiler as it fails only w/ Xcomp
16-11-2017

This is either a compiler bug (due to Xcomp), a bug with the recent native library handling changes in the JDK, or a test bug. None of which are hotspot->runtime issues AFAICS.
16-11-2017

w/ "-Xcomp -XX:-TieredCompilation -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -XX:+UseJVMCICompiler -Djvmci.Compiler=graal" the test fails w/ a different error message, which suggests that JNI_OnUnload from libnativeLibraryTest wasn't called at all. java.lang.RuntimeException: Expected unloaded=1 but got=0 at NativeLibraryTest.main(NativeLibraryTest.java:64) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:564) at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:115) at java.base/java.lang.Thread.run(Thread.java:844) JavaTest Message: Test threw exception: java.lang.RuntimeException: Expected unloaded=1 but got=0
15-11-2017

it looks like that Test::count returns 2 in 1st loop iteration, which means that JNI_OnLoad from libnativeLibraryTest called twice.
15-11-2017

the test fails in jdk/jdk w/ -Xcomp and passed w/ -Xmixed => not an integration blocker.
15-11-2017

the test fails in jdk/hs w/ -Xcomp -Xcomp -XX:+TieredCompilation -XX:TieredStopAtLevel=1 -Xcomp -XX:-TieredCompilation", "-Xcomp -XX:-TieredCompilation -Xcomp -XX:-TieredCompilation -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -XX:+UseJVMCICompiler -Djvmci.Compiler=graal and passes w/ -Xmixed and -Xint
15-11-2017

[~dholmes], I'm checking right now if it fails in jdk/jdk and fails only w/ Xcomp.
15-11-2017

I think this only fails in -Xcomp mode. Can you confirm that? I would also expect it to fail in jdk/jdk, so not an integration blocker.
15-11-2017

might be caused by JDK-8164512
15-11-2017

marking as intergration_blocker as we haven't seen it in prev. PIT.
15-11-2017