JDK-8308297 : Release Note: Fixed Indefinite `jspawnhelper` Hangs
  • Type: Sub-task
  • Component: core-libs
  • Sub-Component: java.lang
  • Affected Version: 21
  • Priority: P4
  • Status: Resolved
  • Resolution: Delivered
  • OS: linux
  • Submitted: 2023-05-17
  • Updated: 2023-12-07
  • Resolved: 2023-12-05
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 21
21Resolved
Description
Since JDK 13, executing commands in a sub-process uses the so-called `POSIX_SPAWN` launching mechanism (that is, `-Djdk.lang.Process.launchMechanism=POSIX_SPAWN`) by default on Linux. In cases where the parent JVM process terminates abnormally before the handshake between the JVM and the newly created `jspawnhelper` process has completed, `jspawnhelper` can hang indefinitely in JDK 13 to JDK 20. This issue is fixed in JDK 21. The issue was especially harmful if the parent process had open sockets, because in that case, the forked `jspawnhelper` process will inherit them and keep all the corresponding ports open, effectively preventing other processes from binding to them.

This misbehavior has been observed with applications which frequently fork child processes in environments with tight memory constraints. In such cases, the OS can kill the JVM in the middle of the forking process leading to the described issue. Restarting the JVM process after such a crash will be impossible if the new process tries to bind to the same ports as the initial application because they will be blocked by the hanging `jspawnhelper` child process.

The root cause of this issue is `jspawnhelper`'s omission to close its writing end of the pipe, which is used for the handshake with the parent JVM. It was fixed by closing the writing end of the communication pipe before attempting to read data from the parent process. This way, `jspawnhelper` will reliably read an EOF event from the communication pipe and terminate once the parent process dies prematurely.

A second variant of this issue could happen because the handshaking code in the JDK didn't handle interrupts to `write(2)` correctly. This could lead to incomplete messages being sent to the `jspawnhelper` child process. The result is a deadlock between the parent thread and the child process which manifests itself in a `jspawnhelper` process being blocked while reading from a pipe and the following stack trace in the corresponding parent Java process:
```
java.lang.Thread.State: RUNNABLE
  at java.lang.ProcessImpl.forkAndExec(java.base@17.0.7/Native Method)
  at java.lang.ProcessImpl.<init>(java.base@17.0.7/ProcessImpl.java:314)
  at java.lang.ProcessImpl.start(java.base@17.0.7/ProcessImpl.java:244)
  at java.lang.ProcessBuilder.start(java.base@17.0.7/ProcessBuilder.java:1110)
  at java.lang.ProcessBuilder.start(java.base@17.0.7/ProcessBuilder.java:1073)
```
    
Comments
Changed the title as discussed. As of why this change requires a RN I'm on your side. But initially, [~darcy] even requested a CSR for the change (see https://github.com/openjdk/jdk/pull/13956#issuecomment-1551331092) until [~rriggs] proposed to do at least a RN instead (see https://github.com/openjdk/jdk/pull/13956#issuecomment-1551531952). And now that we've put so much effort and love into it, it would be sad to drop it :)
30-06-2023

If you want to change the title then go ahead, except "Fixing" is present tense and past tense makes it clear that the bug has been fixed. To be honest, I'm surprised there is a RN for this change. I can't think of an observable change that someone upgrading to JDK 21 would need to know about.
29-06-2023

[~simonis] The original title on the RN was "Jspawnhelper Can Hang Indefinitely" which I think is confusing because this is about a bug that is fixed in JDK 21, not a bug that exists in JDK 21 (if you see what I mean). I hope this is okay with you. If want to change it again then okay but I think make it clear that this is documenting a bug fix, not documenting a known issue in JDK 21.
29-06-2023

[~alanb] thanks for your update, I think it makes sense. To make it even clearer, maybe we could use just "Fixing indefinite jspawnhelper hangs" as title? This would also account for the fact that the fix in the end solves two reasons for hangs, 1/ when the parent terminates before the child is fully started and 2/ interrupted writes (which are both described in the note). I'm OK either way.
29-06-2023

The release note should say something about the nature or implications of the fix in addition to describing the problem.
23-05-2023