JDK-8352489 : Relax jspawnhelper version checks to informative
  • Type: Enhancement
  • Component: core-libs
  • Sub-Component: java.lang
  • Affected Version: 17.0.13,21.0.4,23
  • Priority: P4
  • Status: New
  • Resolution: Unresolved
  • Submitted: 2025-03-20
  • Updated: 2025-03-25
Related Reports
Causes :  
Description
We have introduced strong jspawnhelper version checks with JDK-8325621. The original intent was to capture possible incompatibilities in jspawnhelper protocol that would silently break the process invocation. 

It is now apparent there is the visible drawback to this: some Java services using distro-installed JDKs are running into version incompatibility checks after in-place (often unattended) update. We have now seen multiple reports by multiple customers that ran into jspawnhelper version checks. It does not help that jspawnhelper warnings are not bubbling up to the user-visible IOExceptions. I explored that part a bit (JDK-8352533), but it requires a much more comprehensive fix to cover all possible jspawnhelper failure modes.

Note this highlights the larger problem: overwriting the JDK while some application is using it can break the application in mysterious ways. jspawnhelper version checks detect this problem explicitly. Therefore, vendors still have to figure out how to do JDK upgrades safely: either putting updates in full-versioned paths to avoid overwrite, or prompting Java-based services to restart after overwriting JDK installation, or restart dependent services automatically, or block unattended upgrades while services are running.

But at this point, strong jspawnhelper version checks do more harm than good. Therefore, I believe the pragmatic approach would be to demote jspawnhelper version checks from being fatal to being informative, when the protocol detectably breaks, e.g. on incorrect number of arguments and/or incorrect FDs passed. This would still allow better debugging for jspawnhelper failures, while providing a smoother update experience, while distro vendors catch up.

To emphasize, this is not a bugfix, as version checks still a valid thing to do. This is merely a pragmatic concession to match the inconvenient, but hopefully only temporary broken state of the world.

This improvement is also targeted to be easily backportable, so we can catch up with update releases.
Comments
[~simonis] The jspawn protocol is not the real issue. The issue is the in-use replacement of one JDK by another.
25-03-2025

Have we thought about defining a "JSPAWNHELPER_VERSION" which only changes if the protocol changes (instead of using the normal Java version string for comparison)?
24-03-2025

> So while it is indeed "undefined territory", it is more or less defined in a practical sense. I don't agree, because you are only focusing on jspawnhelper, where in reality if a JDK has been updated in place whilst applications are running the old version, then there can be a range of things that could go wrong if an updated file is loaded by the old JDK. That is why it is "undefined territory". In "the old days" if you replaced an in-use memory-mapped rt.jar then you crashed - end of story. Don't do that. > Unfortunately, it is currently the norm in distros to replace the JDKs in-place, sometimes through the unattended upgrades. Wow! If that is really the case then there is some serious re-education needed! When did it become okay to replace software in-place without checking if it is in-use? If I update on Android it kills the apps that are running before doing the update. If I update software on Windows it tells me I have to kill running apps before it can do the install, or else it tells me I will have to reboot to complete the install. So what do these JDK "packaging" tools actually do? Do they not have safeguards in them to avoid replacing software that is in-use? If not then surely they should! If they do then is it a case of the "sys admin" forcing the update to occur in-place regardless? If so that is their problem (or rather their users as they are the ones whose applications may be corrupted and/or crash). > Maybe we should really relax this in 17u and 21u, though, and keep the check strong in mainline and future 25u, as Alan suggested in PR That is a compromise I could live with - though ultimately it is up to the project leads if they are okay with it. It is less likely there is some significant "protocol"/format change in an update release.
21-03-2025

> So what changed to allow for this undefined state? Nothing changed to allow this undefined state. But I think reports from the field show us the impact is larger than we initially predicted. We did JDK-8325621 because the changes in jspawnhelper protocol lead to cryptic issues. We have seen it when JDK-8310265 landed in update releases, for example. jspawnhelper then started to fail after in-place update with e.g.: java.io.IOException: Cannot run program "pwd": error=0, Failed to exec spawn helper: pid: 92322, exit value: 1 at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1110) ...because the protocol changed. With JDK-8325621, we implemented version checks, which were supposed to avoid running into this again without us or users knowing. We argued it is a good thing to check, because we are in uncharted territory if jspawnhelper gets replaced in-flight. It is still a good idea, if not for the following: 1. Check pessimism. jspawnhelper checks just blindly matches versions, without knowing if a protocol is actually compatible. This was the tradeoff we made for simplicity of the checks, as we very specifically did not want to standardize the protocol. We only wanted to prevent _future_ protocol updates from failing cryptically. But as a side effect of this, jspawnhelper checks fail on in-place JDK update every time a version is bumped. So we are having the behavior like in JDK-8310265 _on every update_, even though jspawnhelper would technically still work fine. So while it is indeed "undefined territory", it is more or less defined in a practical sense. 2. Logging. All right, but at least we fail with the correct message, so it provides both good diagnostics in cases that do get broken? Well, yes, but not completely: jspawnhelper would print the errors on stdout, not to the IOException. So if the logging is configured incompletely -- i.e. we are not catching the stdout from JVM -- then the failure would look exactly like JDK-8310265, without any debugging clues. It is apparently a norm in services that capture some sort of java.util.logging stream, which only sees the IOExceptions. We need to fix that, and I have a prototype that bubbles up some errors to IOException's, but it is going to take a while (JDK-8352533), and it might not be as safe to backport. 3. Widespread in-place updates. OK, fine, but then maybe in-place updates do not happen that often for us to care? After all, if you are doing bad things, expect bad results? Unfortunately, it is currently the norm in distros to replace the JDKs in-place, sometimes through the unattended upgrades. This exposes a relatively large group of users to the issue, some of them are running large services. I am seeing more of these, as people upgrade and catch up with JDK-8325621. So the thing we introduced to check for exceptionally corner cases turns to reveal the exceptionally bad state of the world. We can continue to insist exposing this is a right thing to do. But I am now leaning to retracting a bit and limit the checks impact while we repair things. Maybe we should really relax this in 17u and 21u, though, and keep the check strong in mainline and future 25u, as Alan suggested in PR. Plus, make sure JDK-8352533 is done, so that diagnostics actually works out of the box for all users.
20-03-2025

Thanks for this pragmatic approach! Could you provide some pointers to customer reports, so we have some data points about which vendors are upgrading JDK unsafely? (I suppose most Linux distros are using the unsafe way today.)
20-03-2025

> What do you mean by "fix on the JDK side"? After the proposal we'd have: Time 0: JVM of version X runs from JAVA_HOME in /foo/bar, call it service T Time 1: /foo/bar gets updated to version X' (service T keeps running). Time 2: service T uses ProcessBuilder (jspawnhelper) with now version X' - instead of X At Time 2, one is very much in undefined territory. While this was possible before JDK-8325621 it's not clear why we'd want to get back to "undefined territory". Current state is that the user gets "a signal" something is amiss. After this the user would not. Remember, when arguing for JDK-8325621 this was one reason why it should get backported (the protocol change is another, but is there a way to detect the protocol changes other than using the version?). So what changed to allow for this undefined state?
20-03-2025

What do you mean by "fix on the JDK side"? It is useful to think about this change as restoring the old behavior before JDK-8325621, which accepted jspawnhelper overwrites when it did not actually break the protocol.
20-03-2025

Is there a way to "fix" this on the JDK side, though? It seems a work-around which could cause potentially more harm than it does good?
20-03-2025

The alternative is to do this only for 17u and 21u, leaving prospective 25u with strong checks. I do not like it much, because I think vendors would not be able to catch up until 25u releases this year, and users would still run into these problems. But it is a viable alternative nevertheless.
20-03-2025

[~sgehwolf], [~mdoerr], [~clanger] -- I would like to get this to 17u and 21u sooner, possibly as critical patch for April 2025 release, please take a look? It would make April 2025 update smoother. We are planning to pick it up in Corretto for April 2025 release either way.
20-03-2025

A pull request was submitted for review. Branch: master URL: https://git.openjdk.org/jdk/pull/24127 Date: 2025-03-20 09:52:02 +0000
20-03-2025