JDK-7182040 : volano29 limited by os resource on Linux - need better diagnostic message
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: hs24,hs25
  • Priority: P4
  • Status: Closed
  • Resolution: Fixed
  • OS: linux_2.6
  • CPU: x86
  • Submitted: 2012-07-05
  • Updated: 2014-06-26
  • Resolved: 2014-02-11
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 9
9 b04Fixed
Related Reports
Duplicate :  
Duplicate :  
Duplicate :  
Relates :  
Relates :  
Description
When running volano29 on Linux, the server dies after client starts. 

The error message in the log is
[Tue Jul 03 15:40:27 PDT 2012] Unexpected error in MainServer thread. (java.lang.OutOfMemoryError: unable to create new native thread)

Server log has
Creating room number 12 ...
Exception in thread "main" java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:691)
at COM.volano.e.b(DashoA5383)
at COM.volano.am.<init>(DashoA5383)
at COM.volano.ak.a(DashoA5383)
at COM.volano.ak.a(DashoA5383)
at COM.volano.Mark.main(DashoA5383) 

This workload needs 2 system limits to be increased for users on Linux: number of open files(max 10240) and number of processes(about 1714 for client and 868 for server).  The test script benchscript/volano29/run checkes 'ulimit -Hn' and 'ulimit -Hs' for number of open files.  But does not check the processes. The nproc limitation causes JVM not able to create more threads.  However the error message is confusing.  Suggest to add checking nproc for Linux in the run script and give better diagnotistic message.

Meanwhile, the error message thrown by JVM 
Exception in thread "main" java.lang.OutOfMemoryError: unable to
create new native thread

is confusing.  we should have better diagnostic message.
It is true that Linux returns 11 (EAGAIN) for pthread_create() either out of memory or nproc limitation is reached.

I agree that we can not tell exactly what is the cause...

Consulting Dave Keenan for his input...
comments from David.Keenan

"Perhaps a OS neutral message isn't possible.  If we are not able to determine the cause on linux, provide the information available to us while on linux.

For example:
Exception in thread "main" java.lang.OutOfMemoryError: unable to
create new native thread.  Potential causes are out of memory or nproc limitation is reached."
I think this is a better description as it covers the 2 reasons provided by Linux.

Comments
For more info about default limit of 'ulimit -u 1024' in redhat, see https://bugzilla.redhat.com/show_bug.cgi?id=432903
04-09-2013

A work around would be to use ulimit -u 10000 before running volano
04-09-2013

Feel free to pick this one up. Not sure why it was "deferred" to 9 as it seems a simple and useful enhancement once we have settled on the new message. As we are using OEL 6 more and more, then more and more people will hit this.
08-07-2013

Actually, "Unable to create native thread: possibly out of memory or process/resource limits reached", or anything that says "process" would have saved me a few hours of tracing volano and hotspot. I really think we should add that to the exception message.
08-07-2013

PUBLIC COMMENTS On 17/07/2012 12:13 AM, David D. Keenan wrote: > Perhaps a OS neutral message isn't possible. If we are not able to determine > the cause on linux, provide the information available to us while on linux. > > For example: > Exception in thread "main" java.lang.OutOfMemoryError: unable to > create new native thread. Potential causes are out of memory or nproc > limitation is reached. > > > If there other errors caught under (EAGAIN) we should list them as well. > > The bottom line is that we shouldn't have to troll JDK or linux sources to > determine the cause of an error, and, our error messages shouldn't exclude > potential causes. I don't think this is practical in general. The potential number of things that might go wrong is huge. We provide a coarser categorization of error messages and exception types to group these. And in many/most cases the place where the exception message is put together is a long way from the OS call that failed. I can imagine in a debug build that we print information messages any time an OS call fails (one that doesn't lead to an abort of the VM). Or perhaps even add a Verbose mode to a product build.
16-07-2012

PUBLIC COMMENTS Exception messages are meant to be simple one-line descriptions, not paragraphs of descriptive text. If it is useful to say something like: "Unable to create native thread: possibly out of memory or process/resource limits reached" then we can say that. But it isn't obvious that this in itself provides enough information to diagnoze the particular problem in this case.
16-07-2012

PUBLIC COMMENTS Please comment on what you think the diagnostic message should say.
13-07-2012

PUBLIC COMMENTS The VM will throw "java.lang.OutOfMemoryError: unable to create new native thread" if attempts to create the OS level thread fail. Basically, for linux, if pthread_create fails. From the POSIX spec: The pthread_create() function shall fail if: [EAGAIN] The system lacked the necessary resources to create another thread, or the system-imposed limit on the total number of threads in a process {PTHREAD_THREADS_MAX} would be exceeded. [EPERM] The caller does not have appropriate privileges to set the required scheduling parameters or scheduling policy. -- This is ignorant of the Linux imposed a-thread-is-a-process limit but EAGAIN is still the error code. So there is no way to discern what the actual "resource limitation" was. The "cause" message for the OutOfMemoryError should be clear and succinct so I'm unsure how to expand on this to give more detail. Suggestions would be welcome, bearing in mind the message is generic, not OS specific.
06-07-2012