JDK-8268605 : [warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 11.0.10-oracle
  • Priority: P4
  • Status: Closed
  • Resolution: Won't Fix
  • OS: generic
  • CPU: generic
  • Submitted: 2021-06-11
  • Updated: 2022-02-01
  • Resolved: 2021-06-23
Related Reports
Relates :  
Relates :  
Description
Sometimes when running JDK11+ in CI I see:

[11.028s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 2048k, guardsize: 0k, detached.

What does this warning mean, and why is it shown by default (on JDK11+, not on JDK8)?
The application does continue to run after the warning, so I think pthread_create() was retried and succeeded.

It would be useful if this warning also showed the thread name.

If it's literally pthread_create() returning EAGAIN, is there any point to show this warning by default if retrying pthread_create() works? I would think not.

To make it worse, this is output on STDOUT, and so messes up the application output, and causes transient CI failures, while there seems to be no semantic problem.
Comments
[~chaeubl] (Christian Haeubl) found that a likely cause for this is one thread is calling execve() and the other about the same time is calling pthread_create(). In such case, pthread_create() can return EAGAIN "just because there is a concurrent execve()". https://twitter.com/dvyukov/status/1090893278579486721 The retry introduced in JDK-8268773 https://github.com/openjdk/jdk/commit/e35005d5ce383ddd108096a3079b17cb0bcf76f1 might help, although I wonder if the limit of 3 is enough in practice. It's arguably also a bug of execve() to not "hard kill" every thread before doing anything and even having the notion of "concurrent execve" seems pretty broken. But I don't know details, maybe the kernel can't just do the equivalent of SIGKILL on threads?
01-02-2022

The issues that we plan to fix are listed in the related enhancement. Closing as will not fix.
23-06-2021

> One more thing, do you think it would be reasonable to retry (e.g. 1-2 times) pthread_create() if it returns EAGAIN? That is not unreasonable, but may not help depending on exactly why the failure is occurring. If we have hit the ulimit process/thread limit for example, then it won't self-correct unless a thread/process has terminated since the first call. So in a tight loop it may just fail continually. But i'll look into that and repurpose JDK-8268773 (thanks for filing!) to do general improvements in this area. > From https://openjdk.java.net/jeps/158 it seems warnings should default to stderr, not stdout. > But it is clearly emitted on stdout in this case. Maybe that's a bug? Maybe, but not something we are likely to change. I suspect the JEP text was original intent, but for compatibility reasons we kept VM warnings going to stdout. I'll see if I can track down any discussion on that. Update: it was deliberately changed using JDK-8153723 "Change the default logging output for errors and warnings from stderr to stdout". The rationale may be a bit surprising and it is unclear from the comments as to how it was settled, but was probably a combination of: a) it fixes the duplication problem b) historically Hotspot used stdout
17-06-2021

ILW = MLM = P4
15-06-2021

Another question, should this warning be emitted on stderr by default and not stdout? From https://openjdk.java.net/jeps/158 it seems warnings should default to stderr, not stdout. But it is clearly emitted on stdout in this case. Maybe that's a bug?
15-06-2021

> That should be possible. OK, that sounds like a useful improvement, I'll file a feature request for that: JDK-8268773 > No. There are two environment variables that can be used to pass flags (JAVA_TOOL_OPTIONS and _JAVA_OPTIONS) but when they are used their use must be reported as a security precaution. Thanks for confirming, I wasn't sure. > Which test suite is it? It is the test suite of TruffleRuby, a Ruby implementation. The test suite creates many subprocesses. I might be able to pass the flag via RUBYOPT or so. One more thing, do you think it would be reasonable to retry (e.g. 1-2 times) pthread_create() if it returns EAGAIN? I think the general meaning of EAGAIN is "feel free to try again, resource was not available at the time of the call", e.g, read(2) -> EAGAIN.
15-06-2021

> Would it be possible to show the name of the thread which failed to be created in the log? That should be possible. > Is there a way I can set -Xlog:os+thread=off or change log output to go to stderr with an environment variable? (and without producing extra output due to that variable being set) Based on the phrasing of the question I think you already know the answer to that. No. There are two environment variables that can be used to pass flags (JAVA_TOOL_OPTIONS and _JAVA_OPTIONS) but when they are used their use must be reported as a security precaution. > The reason I ask is because many subprocesses are involved in that test suite, so it is impractical to pass the flag down to every subprocess. Which test suite is it? Depending on how the subprocesses are created, tests can be configured to pass flags from jtreg to immediate subprocesses.
13-06-2021

Thank you for the reply. I did not see any OutOfMemoryError, and the process later exited with an exit code of 0 (success), so it is likely an internal VM thread. Would it be possible to show the name of the thread which failed to be created in the log? Is there a way I can set -Xlog:os+thread=off or change log output to go to stderr with an environment variable? (and without producing extra output due to that variable being set) The reason I ask is because many subprocesses are involved in that test suite, so it is impractical to pass the flag down to every subprocess.
12-06-2021

> What does this warning mean It means that the VM was unable to call pthread_create successfully when the application/library has called start() on a java.lang.Thread (or the VM was unable to start an internal thread) > and why is it shown by default (on JDK11+, not on JDK8)? Warnings are enabled by default in unified logging in JDK 11 and this situation is considered worthy of a warning as it can indicate a misconfigured ulimit setting, constraining thread creation. In JDK 8 we don't have unified logging and in this situation perror() is called only if PrintMiscellaneous && (Verbose || WizardMode) are set. > The application does continue to run after the warning, so I think pthread_create() was retried and succeeded. No it wasn't retried and did not succeed. The application should have received an OutOfMemoryError from the call to start() if an application/library thread. If it was an internal VM thread then it depends on what thread - the GC appears to just ignore it (which means you'll have fewer GC threads than expected). The warning can be disabled: -Xlog:os+thread=off However it is indicating an underlying problem. But not knowing from what code this originated I can't say what exactly.
12-06-2021