United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
JDK-8017498 : JVM crashes when native code calls sigaction(sig) where sig>=0x20

Details
Type:
Bug
Submit Date:
2013-06-03
Status:
Closed
Updated Date:
2014-06-26
Project Name:
JDK
Resolved Date:
2013-07-18
Component:
hotspot
OS:
Sub-Component:
runtime
CPU:
Priority:
P3
Resolution:
Fixed
Affected Versions:
7u21
Fixed Versions:
hs25 (b43)

Related Reports
Backport:
Backport:
Backport:
Backport:

Sub Tasks

Description
FULL PRODUCT VERSION :
java version  " 1.7.0_21 " 
Java(TM) SE Runtime Environment (build 1.7.0_21-b11)
Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode)


FULL OS VERSION :
Linux xxx.xxx.xxx 2.6.18-128.1.6.el5 #1 SMP Tue Mar 24 12:05:57 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux


A DESCRIPTION OF THE PROBLEM :
We have native code in our JVM. The native code uses signals. It attempts to call sigaction() where the signal >= 0x20.

We LD_PRELOAD the libjsig.so library as documented.

When the native code attempts to sigaction(0x2B) i.e. signal 0x2B, the JVM crashes.

The problem is with the macro

#define MAXSIGNUM 32
#define MASK(sig) ((unsigned int)1 << sig)

In OpenJDK source file jsig.c.

MASK() appears to assume that if sig>=0x20 then the result will be zero. But this is not correct. When shifting a 32-bit integer by a variable amount, the shift amount is masked by 0x1F before the shift happens.

So ((unsigned int)1 << 0x2B) is the same as ((unsigned int)1 << 0x0B) and this results in an attempt to access beyond the end of the array sact[].


THE PROBLEM WAS REPRODUCIBLE WITH -Xint FLAG: Yes

THE PROBLEM WAS REPRODUCIBLE WITH -server FLAG: Yes

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Write a JNI function that calls sigaction(0x2B) and run it while libjsig.so is LD_PRELOADed.

EXPECTED VERSUS ACTUAL BEHAVIOR :
The libjsig.so library should pass through the sigaction(0x2B) from its interceptor to the OS function.

In fact, the JVM crashes.
ERROR MESSAGES/STACK TRACES THAT OCCUR :
This is the gdb stack trace:

(gdb) whe
#0  0x0000003d0960ce74 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x0000003d09608874 in _L_lock_106 () from /lib64/libpthread.so.0
#2  0x0000003d096082e0 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00002b30d520ea44 in signal_lock () from /home/murrap/jdk1.7.0_21/jre/lib/amd64/libjsig.so
#4  0x00002b30d520ee40 in sigaction () from /home/murrap/jdk1.7.0_21/jre/lib/amd64/libjsig.so
#5  0x00002b30d5ee606e in VMError::reset_signal_handlers ()
   from /home/murrap/jdk1.7.0_21/jre/lib/amd64/server/libjvm.so
#6  0x00002b30d5ee5b46 in VMError::report_and_die () from /home/murrap/jdk1.7.0_21/jre/lib/amd64/server/libjvm.so
#7  0x00002b30d5d89370 in JVM_handle_linux_signal () from /home/murrap/jdk1.7.0_21/jre/lib/amd64/server/libjvm.so
#8  <signal handler called>
#9  0x00002b30d520eed7 in sigaction () from /home/murrap/jdk1.7.0_21/jre/lib/amd64/libjsig.so
#10 0x00002aaab82eb6ee in Java_TestJNI_doSomething () from /home/murrap/jni/libTestJNI.so

The thread hangs in sigaction() while attempting to acquire a lock that is already held at frame #9 by the application native code.


REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
This is the native code. You'd need to construct a caller in java:

#include <stdio.h>
#include <jni.h>
#define __USE_GNU
#include <signal.h>
#include <sys/ucontext.h>

void sig_handler(int sig, siginfo_t *info, ucontext_t *context) {
        int thrNum;

        printf( " HANDLER (1)
 " );
        // Move forward RIP to skip failing instruction
        context->uc_mcontext.gregs[REG_RIP] += 6;
}

JNIEXPORT void JNICALL Java_TestJNI_doSomething(JNIEnv *env, jclass klass, jint val) {
        struct sigaction act;
        struct sigaction oact;
        pthread_attr_t attr;
        stack_t stack;

        act.sa_flags = SA_ONSTACK|SA_RESTART|SA_SIGINFO;
        sigfillset(&act.sa_mask);
        act.sa_handler = SIG_DFL;
        act.sa_sigaction = (void (*)())sig_handler;
        sigaction(0x20+SIGSEGV, &act, &oact);

        printf( " doSomething(%d)
 " , val);
        printf( " old handler = %p
 " , oact.sa_handler);
}

---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
There is no solution for this problem except a modification of the behaviour of jsig.c.
                                    

Comments
7u60 request : Request to backport as per OpenJDK process : http://mail.openjdk.java.net/pipermail/jdk7u-dev/2014-January/008232.html
                                     
2014-01-07
URL:   http://hg.openjdk.java.net/hsx/hsx25/hotspot/rev/732af649bc3a
User:  amurillo
Date:  2013-07-26 14:00:17 +0000

                                     
2013-07-26
URL:   http://hg.openjdk.java.net/hsx/hotspot-rt/hotspot/rev/732af649bc3a
User:  minqi
Date:  2013-07-18 00:10:32 +0000

                                     
2013-07-18
Updated testcase: the TestJNI.java accepts an optional sig value which will be passed to the native function.
                                     
2013-07-03
If I pass in a larger number (100) to the native function, I could see a hang and the test program eventually timed out but I still didn't see the crash.

The above suggested fix needs to be applied to the sigaction() function as well as the test case doesn't seem to call set_signal().
                                     
2013-07-03
The report states that we access outside the sact[] and looking at the code there is only one case where this can occur:

static sa_handler_t set_signal(int sig, sa_handler_t disp, bool is_sigset) {
  sa_handler_t oldhandler;
  bool sigused;

  signal_lock();

  sigused = (MASK(sig) & jvmsigs) != 0;  <=== here sig was masked with 0x1F before shift to constrain it to a value less than 32
  if (jvm_signal_installed && sigused) {
    /* jvm has installed its signal handler for this signal. */
    /* Save the handler. Don't really install it. */
    oldhandler = sact[sig].sa_handler;    <=== here we use raw sig value so try to access outside of sact[] at sact[43]
    save_signal_handler(sig, disp);

    signal_unlock();
    return oldhandler;
  } else if (jvm_signal_installing) {
    /* jvm is installing its signal handlers. Install the new
     * handlers and save the old ones. jvm uses sigaction().
     * Leave the piece here just in case. */
    oldhandler = call_os_signal(sig, disp, is_sigset);
    save_signal_handler(sig, oldhandler);
   /* Record the signals used by jvm */
    jvmsigs |= MASK(sig);

    signal_unlock();
    return oldhandler;
  } else {
    /* jvm has no relation with this signal (yet). Install the
     * the handler. */
    oldhandler = call_os_signal(sig, disp, is_sigset);

    signal_unlock();
    return oldhandler;
  }
}

Whether or not we crash depends on where sact[43].sa_handler points to.

It is evident from the code that we don't handle sig values >= MAXSIGNUM correctly. We should simply check for that and install the user handler directly. Simple fix might be just:

sigused =  (sig < MAXSIGNUM) && ((MASK(sig) & jvmsigs) != 0);

                                     
2013-07-02
Unable to reproduce the crash so far with hs24 (7u21) and hs25 (8).

7u21 java version:
java version "1.7.0_21"
Java(TM) SE Runtime Environment (build 1.7.0_21-b11)
Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode)

java wrapper test program (TestJNI.java):
public class TestJNI {
    static {
        System.loadLibrary("TestJNI");
    }
    public static native void doSomething(int val);
    public static void main(String[] args) {
        TestJNI.doSomething(43);
    }
}

Note: the number 43 passed into the native function doSomething() isn't used; the native function always performs a sigaction with signal 0x20+SIGSEGV (same as 0x2b). Attaching the testcase containing both java and c code.

The c code was compiled as follows:
gcc -fPIC -shared -o ./libTestJNI.so -I${JAVA_HOME}/include -I${JAVA_HOME}/include/linux ./TestJNI.c

The java code was compiled as follows:
${JAVA_HOME}/bin/javac TestJNI.java

set LD_PRELOAD to point to libjsig.so
e.g. export LD_PRELOAD=${JAVA_HOME}/jre/lib/amd64/libjsig.so

The test program was run as follows:
${JAVA_HOME}/bin/java -Djava.library.path=/scratch/cccheung/8017498 -server TestJNI

(replace the above /scratch/ccheung/8017498 with your path containing the libTestJNI.so)

one can use strace to confirm that the libjsig.so was loaded.
e.g. in the strace output, it showed:
open("/net/koori/onestop/jdk/7u21/promoted/latest/binaries/linux-x64/jre/lib/amd64/libjsig.so", O_RDONLY) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0`\t\0\0\0\0\0\0"..., 832) = 832

the output from running the test:
doSomething(43)  old handler = (nil)



                                     
2013-07-01
Test case source and binaries.
                                     
2013-07-01
We could look at modifying the linux/bsd code to use the same approach as Solaris, which is not constrained to the lower 32 signal types. Or we simply ignore signal numbers >=32.

As a workaround on linux/bsd I don't think the signal interposition library needs to be used for signals >= 32 as these are not used by the JVM.

Note for bsd: os/bsd/vm/jvm_bsd.h contains very suspicious definitions for SIGRTMIN/MAX on OpenBSD.
                                     
2013-06-27



Hardware and Software, Engineered to Work Together