JDK-8223777 : In posix_spawn mode, failing to exec() jspawnhelper may not result in an error
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.lang
  • Affected Version: 11,13
  • Priority: P2
  • Status: Closed
  • Resolution: Fixed
  • Submitted: 2019-05-13
  • Updated: 2020-07-23
  • Resolved: 2019-06-05
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11 JDK 13 JDK 14
11.0.9Fixed 13 b24Fixed 14Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Relates :  
Description
java.lang.Process uses posix_spawn() by default now on Linux too. That exposes a quirk/bug/depending-how-you-see-it in the glibc.

posix_spawn() does start a new child process with whatever method it likes - in case of glibc, with fork(), vfork() or clone() depending on the glibc version - then, in the child process, proceeds to exec() the target binary. In our case the target binary is always the jspawnhelper binary from the JDK.

The problem is that the glibc version of posix_spawn() does not report back an error if the exec() step fails. Instead, the child process will just quit with error code 127.

Strictly speaking that is no bug since POSIX allows that. But for us this situation is indistinguishable from a user program exiting with 127. If we fail to exec() the jspawnhelper, we need to raise an IOException. If a user program exits with 127, we do not care - user should handle this.

In our case this only affects the first exec() the child process does, execing the jspawnhelper. The jspawnhelper then will proceed to exec() the target binary, and we have error reporting for that in place.

But if the jspawnhelper cannot be exec()'ed - e.g. because its permissions are forbidding execution - currently, on glibc, we will not get an error. From the outside it looks like the child process quits immediately.

See also associated glibc discussion here:

https://sourceware.org/bugzilla/show_bug.cgi?id=18433





Comments
Fix Request: I'd like to fix this bug in 11u. It fixes a bug in sub process spawning where, if the spawn method is "posix_spawn", and the jspawnhelper could not be spawned - e.g. because someone copied the jdk somewhere without resetting execute permissions - no exception would be thrown and it would look like the child process just terminated immediately. On 11u, default spawn method is still vfork; it has been switched to posix_spawn in later releases. So arguably, this fix is not that urgent for 11u. However, it can hit if someone manually switches the fork method to spawn, and if it does, it causes confusion and a lot of unnecessary error analysis (fork errors are hard to analyse). The patch applies cleanly. The risk is small: we gave the fix ample time to stew in head. It was fixed in head in June 2019, delivered with JDK14 and JDK15. Tests: nightlies ran at SAP without issues for various nights in a row.
22-07-2020

Note: I plan to backport this to 11, but wait for JDK-8226192 to be solved. This patch is also needed in 11, albeit less urgent since there posix_spawn is not the default launch mechanism
27-06-2019

Patch posted for review. No reviews yet. https://mail.openjdk.java.net/pipermail/core-libs-dev/2019-May/060267.html
04-06-2019

Solaris is similar, it returns errno=2 if the jspawnhelper is not found or doesn't have execute permission.
13-05-2019

Well at least they report something back, on Ubuntu I get nothing :( This is surprisingly difficult to fix. It touches the problem in Martin's original "Johnny"-Comment about the difficulties in predicting when exec() will fail. However, it is a bit easier since we do not want tp spawn arbitrary programs but just jspawnhelper which is under our control and should have been set up in a canonical way. I propose to stat() the binary before spawn. This will at least catch permission errors, but not other errors which could happen. stat() can either happen in java or in C (latter is probably faster). I will prepare a patch.
13-05-2019

On macOS the error is reported, albeit it seems to be "build/macosx-x86_64-server-release/jdk/bin/java": error=2, No such file or directory" when there isn't execute permission.
13-05-2019