Bug ID: JDK-8279124 VM does not handle SIGQUIT during initialization

JDK 17	JDK 19
17.0.3Fixed	19 b07Fixed

A pull request was submitted for review. URL: https://git.openjdk.java.net/jdk17u-dev/pull/137 Date: 2022-01-31 22:59:47 +0000
31-01-2022
Xin Liu, please make a PR for this agains jdk17u-dev.
31-01-2022
Fix Request(17u): It reduces the risk that the launching JVMs are suppressed due to SIGQUIT sent by attaching clients(such as jcmd) on Posix systems. The patch applies cleanly to jdk17u. The risk is low because it only changes the signal handling of SIGQUIT at the early initialization phrase.
28-01-2022
Changeset: 9bf6ffa1 Author: Xin Liu <xliu@openjdk.org> Date: 2022-01-24 05:05:07 +0000 URL: https://git.openjdk.java.net/jdk/commit/9bf6ffa19f1ea9efcadb3396d921305c9ec0b1d1
24-01-2022
Here is reproducible. We have to specialize pid directly in jcmd. It's not that hotspot get crashed. hotspot will quit due to the asynchronous signal SIGQUIT. ➜ jdk git:(master) ✗ java -Xms64g -XX:+AlwaysPreTouch -XX:ParallelGCThreads=1 & [1] 23264 ➜ jdk git:(master) ✗ jcmd 23264 VM.flags 23264: [1] + 23264 quit java -Xms64g -XX:+AlwaysPreTouch -XX:ParallelGCThreads=1 java.io.IOException: No such process at jdk.attach/sun.tools.attach.VirtualMachineImpl.sendQuitTo(Native Method) at jdk.attach/sun.tools.attach.VirtualMachineImpl.<init>(VirtualMachineImpl.java:100) at jdk.attach/sun.tools.attach.AttachProviderImpl.attachVirtualMachine(AttachProviderImpl.java:58) at jdk.attach/com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:207) at jdk.jcmd/sun.tools.jcmd.JCmd.executeCommandForPid(JCmd.java:113) at jdk.jcmd/sun.tools.jcmd.JCmd.main(JCmd.java:97) jcmd without any parameter will probe available JVMs first before sending SIGQUIT.
15-01-2022
Hi - I was looking at the signal handler startup, but I can't actually reproduce the problem... I can cause java startup to take 18 seconds, with java -Xms100g -XX:+AlwaysPreTouch ...but my attempts to attach to it so far only fail with the 5 second timeout from PerfDataBuffer.java (I've tried various different Java versions...): sun.jvmstat.monitor.MonitorException: Could not synchronize with target at jdk.internal.jvmstat/sun.jvmstat.perfdata.monitor.v2_0.PerfDataBuffer.synchWithTarget(PerfDataBuffer.java:274) at jdk.internal.jvmstat/sun.jvmstat.perfdata.monitor.v2_0.PerfDataBuffer.buildMonitorMap(PerfDataBuffer.java:132) at jdk.internal.jvmstat/sun.jvmstat.perfdata.monitor.PerfDataBufferImpl.findByName(PerfDataBufferImpl.java:242) at jdk.internal.jvmstat/sun.jvmstat.perfdata.monitor.AbstractPerfDataBuffer.findByName(AbstractPerfDataBuffer.java:99) at jdk.internal.jvmstat/sun.jvmstat.perfdata.monitor.AbstractMonitoredVm.findByName(AbstractMonitoredVm.java:82) at jdk.jcmd/sun.tools.jstat.ExpressionResolver.evaluate(ExpressionResolver.java:70) at jdk.jcmd/sun.tools.jstat.SymbolResolutionClosure.visit(SymbolResolutionClosure.java:56) at jdk.jcmd/sun.tools.jstat.OptionFormat.apply(OptionFormat.java:82) at jdk.jcmd/sun.tools.jstat.OptionOutputFormatter.resolve(OptionOutputFormatter.java:52) at jdk.jcmd/sun.tools.jstat.OptionOutputFormatter.<init>(OptionOutputFormatter.java:46) at jdk.jcmd/sun.tools.jstat.Jstat.logSamples(Jstat.java:113) at jdk.jcmd/sun.tools.jstat.Jstat.main(Jstat.java:70) (which we expect - the AlwaysPreTouch delay is before the AttachListener starts) Could you say exactly what build you reproduce this with, and can we get a backtrace of the JVM process that you crash? Thanks!
14-01-2022
Hi, Kevins, Thank you for writing down your thoughts on it. > It seems we cannot reproduce killing a JVM by attaching, unless we make the heap initialization very slow, or pause during startup. > Is hotdog the project where this problem was noticed? We get reports from a systemd service which does the similar thing like hotdog, but yes, I think we can say that it's hard to trigger this issue unless heap initialization is slow. All issues we got so far involve AlwaysPreTouch. "java -version" is quick. The purpose is to probe the version number of java in use. The real target is the java applications. User can provide any JVM option combinations. > I thought this was encouraging, that an earlier signal handling change would also solve the problem. you are right. I just feel scared to mess with os::init_2() after I realize even some java code depends on its behavior. On the other side, SIGQUIT(Ctrl-\) itself is also a kind of "user interface", right? maybe users expect to quit and generate "coredump" in the early stage of java. The issue we are trying to solve here is to prevent attach from quitting HotSpot prematurely. That's why prefer change in attach.
13-01-2022
> What I'm trying to ask is, can you reproduce killing a JVM by attaching, when the target JVM is executing anything earlier than the heap pretouch work? yes, I can. -XX:+PauseAtStartup pauses JVM startup before heap init. ➜ jdk git:(master) ✗ ./build/linux-x86_64-server-release/jdk/bin/java -XX:+UnlockDiagnosticVMOptions -XX:+PauseAtStartup -version [1] 6356 quit ./build/linux-x86_64-server-release/jdk/bin/java -XX:+PauseAtStartup -versio OTOH, jcmd will force 6356 quit. ➜ jdk git:(master) ✗ pidof java 6356 ➜ jdk git:(master) ✗ jcmd 6356 VM.flags 6356: com.sun.tools.attach.AttachNotSupportedException: Unable to open socket file: target process not responding or HotSpot VM not loaded at sun.tools.attach.LinuxVirtualMachine.<init>(LinuxVirtualMachine.java:106) at sun.tools.attach.LinuxAttachProvider.attachVirtualMachine(LinuxAttachProvider.java:63) at com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:208) at sun.tools.jcmd.JCmd.executeCommandForPid(JCmd.java:147) at sun.tools.jcmd.JCmd.main(JCmd.java:131) I also tried to block SIGQUIT in the first place, but I ran into an obstacle. On Linux, the 1st line of code of HotSpot isn't the primordial thread. I found that sigprocmask doesn't work in a thread. of course, I can achieve that in java launcher, but I think launcher should remain platform-independent.
12-01-2022
It seems we cannot reproduce killing a JVM by attaching, unless we make the heap initialization very slow, or pause during startup. There's still a case for making the JVM more robust, I was just trying to talk about whether the /proc signal info parsing on every attach attempt is the right thing, or whether there is something simpler we can do on JVM startup (yes, it would be every JVM startup, but maybe a smaller change). Is hotdog the project where this problem was noticed? hotdog searches /proc for processes called java. If I read it correctly, it then runs java -version of the same binary as the process it found. That must take some time, so it does not seem realistic that there is an attach attempt anywhere in the very early stage of the process startup, so it is realistically going to be in this long heap pretouch delay. I thought this was encouraging, that an earlier signal handling change would also solve the problem. In Threads::create_vm(), we do some signal work very early, i.e. os::init_2(void) calls PosixSignals::init(), which calls signal_sets_init() and then install_signal_handlers() But we don't handle SIGQUIT/BREAK until later, when create_vm calls os::initialize_jdk_signal_support(). Maybe that is a bug. Maybe we need to ignore or handle break in install_signal_handlers() with the other signals (if ReduceSignalUsage is not true). Would need to check we don't break how we save previous signal handlers for later checking.
12-01-2022
What I'm trying to ask is, can you reproduce killing a JVM by attaching, when the target JVM is executing anything earlier than the heap pretouch work?
11-01-2022
Yes. The latest jdk can utilize multicore to accelerate AlwaysPreTouch, but this does not change the time complexity. I show the case in the PR. '-XX:ParallelGCThreads=1' only uses 1 parallel worker so it is easy to trigger this problem. > And does this have to be a JVM that you find by scanning for processes named "java", or can you ever get it to fail by launching a java process, getting the PID and trying to attach? I found that it is possible that a systemd/init service or OCI hook(https://github.com/bottlerocket-os/hotdog/pull/6) in containers tries to attach whatever Java pids.
11-01-2022
A pull request was submitted for review. URL: https://git.openjdk.java.net/jdk/pull/7003 Date: 2022-01-10 05:19:26 +0000
11-01-2022
OK thanks - if an early sigaction to ignore SIGQUIT during the heap/pretouch work is in place, are you saying you can you actually get the timing right to kill a JVM by attaching? And does this have to be a JVM that you find by scanning for processes named "java", or can you ever get it to fail by launching a java process, getting the PID and trying to attach?
10-01-2022
> If this is a real problem, possibly an early signal action change to ignore signals/SIGQUIT until the point where it can be handled, could avoid introducing a new platform-specific mechanism in all client tools. I think of this approach in the first place. however, it's not reliable. The process could emerge from ps -aux in the very beginning. No mechanism can prevent the process from receiving SIGQUIT. If we can check SIGQUIT will be caught before VirtualMachineImpl.sendQuitTo, at least it's reliable on Linux.
07-01-2022
a simpler reproducible: $ java -Xms128g -XX:+AlwaysPreTouch -version & [1] 108658 $ kill -3 108658 $gdb ./build/linux-x86_64-server-fastdebug/jdk/bin/java /tmp/core.108658.108658 [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `java -Xms128g -XX:+AlwaysPreTouch -version'. Program terminated with signal SIGQUIT, Quit. #0 0x00007f394bb7a70a in __pthread_join (threadid=139884066653952, thread_return=thread_return@entry=0x7ffe1c1b7ca8) at pthread_join.c:90 90 lll_wait_tid (pd->tid); [Current thread is 1 (Thread 0x7f394c1a5fc0 (LWP 108658))] yes, '-Xms128g -XX:+AlwaysPreTouch' is placed there on purpose. HotSpot heap initialization will be linear to heap size if it enables AlwaysPreTouch. this happens before os::initialize_jdk_signal_support().
07-01-2022
When you attach to a process and crash it, do you get a core dump, and can we get a stacktrace -- that would be interesting. Is it even in create_vm ? I think we need to show that stacktrace of where we are in the VM when it dies. There is a window between process startup and getting the JVM signal handlers installed where a signal is not handled, and a signal can cause a default action e.g. termination. I didn't yet reproduce it myself. The monitoring process has to find the (very new) JVM's pid, and the JVM for a tool has itself got to start, so it is hard to make happen. I can't do it at the command-line or a simple script. I see in your script a slowdown comes from -Xms128g -XX:+AlwaysPreTouch which causes seconds of delay. Not sure if that is what triggers it for you. But also in the script there's the "attacher" loop function checking for a java process every 5 seconds. How often does it kill a newly launched VM? (Such a script can notice a new process by chance at a very early stage before it does very much at all...) If this is a real problem, possibly an early signal action change to ignore signals/SIGQUIT until the point where it can be handled, could avoid introducing a new platform-specific mechanism in all client tools. (But that happens relatively early in Threads::create_vm() already, so curious about where the VM is when it is killed...)
05-01-2022
test.sh is a test to show case that even jcmd $pid may force an initializing hotspot quit. $ sh ./test.sh Testing default java... 111013: ./test.sh: line 40: 111013 Quit $JAVA_OPTS -version 2> /dev/null FAILED: java process quit java.io.IOException: No such process at jdk.attach/sun.tools.attach.VirtualMachineImpl.sendQuitTo(Native Method) at jdk.attach/sun.tools.attach.VirtualMachineImpl.<init>(VirtualMachineImpl.java:100) at jdk.attach/sun.tools.attach.AttachProviderImpl.attachVirtualMachine(AttachProviderImpl.java:58) at jdk.attach/com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:207) at jdk.jcmd/sun.tools.jcmd.JCmd.executeCommandForPid(JCmd.java:113) at jdk.jcmd/sun.tools.jcmd.JCmd.main(JCmd.java:97)
22-12-2021

Relates :	JDK-8284331 - Add sanity check for signal handler modification warning.
Relates :	JDK-8283337 - Posix signal handler modification warning triggering incorrectly
Relates :	JDK-8292695 - SIGQUIT and jcmd attaching mechanism does not work with signal chaining library