JDK-8017498 : JVM crashes when native code calls sigaction(sig) where sig>=0x20
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 7u21
  • Priority: P3
  • Status: Closed
  • Resolution: Fixed
  • Submitted: 2013-06-03
  • Updated: 2014-11-17
  • Resolved: 2013-07-18
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 6 JDK 7 JDK 8 Other
6u81Fixed 7u60Fixed 8Fixed hs25Fixed
Description
FULL PRODUCT VERSION :
java version  " 1.7.0_21 " 
Java(TM) SE Runtime Environment (build 1.7.0_21-b11)
Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode)


FULL OS VERSION :
Linux xxx.xxx.xxx 2.6.18-128.1.6.el5 #1 SMP Tue Mar 24 12:05:57 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux


A DESCRIPTION OF THE PROBLEM :
We have native code in our JVM. The native code uses signals. It attempts to call sigaction() where the signal >= 0x20.

We LD_PRELOAD the libjsig.so library as documented.

When the native code attempts to sigaction(0x2B) i.e. signal 0x2B, the JVM crashes.

The problem is with the macro

#define MAXSIGNUM 32
#define MASK(sig) ((unsigned int)1 << sig)

In OpenJDK source file jsig.c.

MASK() appears to assume that if sig>=0x20 then the result will be zero. But this is not correct. When shifting a 32-bit integer by a variable amount, the shift amount is masked by 0x1F before the shift happens.

So ((unsigned int)1 << 0x2B) is the same as ((unsigned int)1 << 0x0B) and this results in an attempt to access beyond the end of the array sact[].


THE PROBLEM WAS REPRODUCIBLE WITH -Xint FLAG: Yes

THE PROBLEM WAS REPRODUCIBLE WITH -server FLAG: Yes

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Write a JNI function that calls sigaction(0x2B) and run it while libjsig.so is LD_PRELOADed.

EXPECTED VERSUS ACTUAL BEHAVIOR :
The libjsig.so library should pass through the sigaction(0x2B) from its interceptor to the OS function.

In fact, the JVM crashes.
ERROR MESSAGES/STACK TRACES THAT OCCUR :
This is the gdb stack trace:

(gdb) whe
#0  0x0000003d0960ce74 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x0000003d09608874 in _L_lock_106 () from /lib64/libpthread.so.0
#2  0x0000003d096082e0 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00002b30d520ea44 in signal_lock () from /home/murrap/jdk1.7.0_21/jre/lib/amd64/libjsig.so
#4  0x00002b30d520ee40 in sigaction () from /home/murrap/jdk1.7.0_21/jre/lib/amd64/libjsig.so
#5  0x00002b30d5ee606e in VMError::reset_signal_handlers ()
   from /home/murrap/jdk1.7.0_21/jre/lib/amd64/server/libjvm.so
#6  0x00002b30d5ee5b46 in VMError::report_and_die () from /home/murrap/jdk1.7.0_21/jre/lib/amd64/server/libjvm.so
#7  0x00002b30d5d89370 in JVM_handle_linux_signal () from /home/murrap/jdk1.7.0_21/jre/lib/amd64/server/libjvm.so
#8  <signal handler called>
#9  0x00002b30d520eed7 in sigaction () from /home/murrap/jdk1.7.0_21/jre/lib/amd64/libjsig.so
#10 0x00002aaab82eb6ee in Java_TestJNI_doSomething () from /home/murrap/jni/libTestJNI.so

The thread hangs in sigaction() while attempting to acquire a lock that is already held at frame #9 by the application native code.


REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
This is the native code. You'd need to construct a caller in java:

#include <stdio.h>
#include <jni.h>
#define __USE_GNU
#include <signal.h>
#include <sys/ucontext.h>

void sig_handler(int sig, siginfo_t *info, ucontext_t *context) {
        int thrNum;

        printf( " HANDLER (1)
 " );
        // Move forward RIP to skip failing instruction
        context->uc_mcontext.gregs[REG_RIP] += 6;
}

JNIEXPORT void JNICALL Java_TestJNI_doSomething(JNIEnv *env, jclass klass, jint val) {
        struct sigaction act;
        struct sigaction oact;
        pthread_attr_t attr;
        stack_t stack;

        act.sa_flags = SA_ONSTACK|SA_RESTART|SA_SIGINFO;
        sigfillset(&act.sa_mask);
        act.sa_handler = SIG_DFL;
        act.sa_sigaction = (void (*)())sig_handler;
        sigaction(0x20+SIGSEGV, &act, &oact);

        printf( " doSomething(%d)
 " , val);
        printf( " old handler = %p
 " , oact.sa_handler);
}

---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
There is no solution for this problem except a modification of the behaviour of jsig.c.
Comments
7u60 request : Request to backport as per OpenJDK process : http://mail.openjdk.java.net/pipermail/jdk7u-dev/2014-January/008232.html
07-01-2014

If I pass in a larger number (100) to the native function, I could see a hang and the test program eventually timed out but I still didn't see the crash. The above suggested fix needs to be applied to the sigaction() function as well as the test case doesn't seem to call set_signal().
03-07-2013

Updated testcase: the TestJNI.java accepts an optional sig value which will be passed to the native function.
03-07-2013

The report states that we access outside the sact[] and looking at the code there is only one case where this can occur: static sa_handler_t set_signal(int sig, sa_handler_t disp, bool is_sigset) { sa_handler_t oldhandler; bool sigused; signal_lock(); sigused = (MASK(sig) & jvmsigs) != 0; <=== here sig was masked with 0x1F before shift to constrain it to a value less than 32 if (jvm_signal_installed && sigused) { /* jvm has installed its signal handler for this signal. */ /* Save the handler. Don't really install it. */ oldhandler = sact[sig].sa_handler; <=== here we use raw sig value so try to access outside of sact[] at sact[43] save_signal_handler(sig, disp); signal_unlock(); return oldhandler; } else if (jvm_signal_installing) { /* jvm is installing its signal handlers. Install the new * handlers and save the old ones. jvm uses sigaction(). * Leave the piece here just in case. */ oldhandler = call_os_signal(sig, disp, is_sigset); save_signal_handler(sig, oldhandler); /* Record the signals used by jvm */ jvmsigs |= MASK(sig); signal_unlock(); return oldhandler; } else { /* jvm has no relation with this signal (yet). Install the * the handler. */ oldhandler = call_os_signal(sig, disp, is_sigset); signal_unlock(); return oldhandler; } } Whether or not we crash depends on where sact[43].sa_handler points to. It is evident from the code that we don't handle sig values >= MAXSIGNUM correctly. We should simply check for that and install the user handler directly. Simple fix might be just: sigused = (sig < MAXSIGNUM) && ((MASK(sig) & jvmsigs) != 0);
02-07-2013

Test case source and binaries.
01-07-2013

Unable to reproduce the crash so far with hs24 (7u21) and hs25 (8). 7u21 java version: java version "1.7.0_21" Java(TM) SE Runtime Environment (build 1.7.0_21-b11) Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode) java wrapper test program (TestJNI.java): public class TestJNI { static { System.loadLibrary("TestJNI"); } public static native void doSomething(int val); public static void main(String[] args) { TestJNI.doSomething(43); } } Note: the number 43 passed into the native function doSomething() isn't used; the native function always performs a sigaction with signal 0x20+SIGSEGV (same as 0x2b). Attaching the testcase containing both java and c code. The c code was compiled as follows: gcc -fPIC -shared -o ./libTestJNI.so -I${JAVA_HOME}/include -I${JAVA_HOME}/include/linux ./TestJNI.c The java code was compiled as follows: ${JAVA_HOME}/bin/javac TestJNI.java set LD_PRELOAD to point to libjsig.so e.g. export LD_PRELOAD=${JAVA_HOME}/jre/lib/amd64/libjsig.so The test program was run as follows: ${JAVA_HOME}/bin/java -Djava.library.path=/scratch/cccheung/8017498 -server TestJNI (replace the above /scratch/ccheung/8017498 with your path containing the libTestJNI.so) one can use strace to confirm that the libjsig.so was loaded. e.g. in the strace output, it showed: open("/net/koori/onestop/jdk/7u21/promoted/latest/binaries/linux-x64/jre/lib/amd64/libjsig.so", O_RDONLY) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0`\t\0\0\0\0\0\0"..., 832) = 832 the output from running the test: doSomething(43) old handler = (nil)
01-07-2013

We could look at modifying the linux/bsd code to use the same approach as Solaris, which is not constrained to the lower 32 signal types. Or we simply ignore signal numbers >=32. As a workaround on linux/bsd I don't think the signal interposition library needs to be used for signals >= 32 as these are not used by the JVM. Note for bsd: os/bsd/vm/jvm_bsd.h contains very suspicious definitions for SIGRTMIN/MAX on OpenBSD.
27-06-2013