JDK-8225035 : Thread stack size issue caused by large TLS size
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 8,9,10,11,12,13
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2019-05-30
  • Updated: 2022-01-17
  • Resolved: 2019-07-09
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 14
14 b05Fixed
Related Reports
CSR :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Sub Tasks
JDK-8227417 :  
Description
There is a well-known glibc issue [1] that causes program with large TLS segments fail. This issue has been observed and reported for Java/JVM [2]. The original bug id is JDK-8130425, which is manifested as a StackOverflowError in the reported failure instance. The issue can cause other symptoms that may be difficult to diagnose. Please see more details in the related CSR, JDK-8225498.

Based on the glibc discussion thread [2], Rust implemented a fix by taking into account of the TLS size. This bug is created with the intent to address the TLS issue with a similar solution (comparing to the -Djdk.lang.processReaperUseDefaultStackSize workaround introduced by JDK-8130425) in Java/JVM layer.

[1] glibc discussion archive:
http://sourceware.org/bugzilla/show_bug.cgi?id=11787
[2] OpenJDK discussion archive:
http://mail.openjdk.java.net/pipermail/core-libs-dev/2015-December/037558.html
[3] http://cr.openjdk.java.net/~jiangli/tls_size/webrev/
(contributed by Jeremy Manson)

Comments
URL: http://hg.openjdk.java.net/jdk/jdk/rev/cb90a20eb99a User: jiangli Date: 2019-07-09 17:28:44 +0000
09-07-2019

Following approaches have been brought up and discussed in the mailing list: 1) Adjust the stack size by adding the value obtained from _dl_get_tls_static_info (original patch) Pros: only add stack space for needed TLS usage Cons: _dl_get_tls_static_info is not stable. All thread are affected by default. 2) Adjust the stack size by adding the pthread minstack value returned by __pthread_get_minstack when enabled from command line. By default, no adjustment is done. Pros: no side effects by default. Cons: users need to explicitly enable the command-line option when run into the TLS issues, which may be difficult to diagnose. 3) Adjust the stack size by adding the pthread minstack value returned by __pthread_get_minstack if minstack_to_request-stack-size ratio is > 10%. The ratio can be changed from command-line. Pros: the adjustment is done by default for smaller requested size (relative to pthread minstack) only Cons: may not cover all cases with the threshold ratio 4) Adjust the stack size by adding a value specified by user in a command-line option Pros: user has more controls Cons: usability issues, it may be difficult for user to identify the proper size; may cause confusion with the existing -Xss option Option (2) appears to be the cleanest solution to address this issue.
25-06-2019

ILW = LHM = P4
04-06-2019

Here is the update on top of Jeremy's original patch: http://cr.openjdk.java.net/~jiangli/8225035/webrev.00/ (1) Replaced _dl_get_tls_static_info usage with __pthread_get_minstack. (2) Changed to only adjust the stack_size if it's less than 10 * min_stack_size (value obtained by __pthread_get_minstack). (I mentioned 25% in an earlier email. But decided go with 10%.) By default, the value returned by __pthread_get_minstack is 7-page (28K) with the glibc that I'm currently working with (2.24). 7-page includes the extra guard page size. The guard page size is not included in newer versions of the function. With the update (2) above, by default threads with stack sizes less than 280K are adjusted with the additional space. All other threads are unaffected. David, please let me know your thoughts about the above. I haven't incorporated your other suggestion for using a runtime option to enable the thread size adjustment. My main concern is the same as the one stated in the mailing list discussion thread. Users may not know there is an existing flag when they run into the TLS issue. With the update (2), a runtime option may not be needed since by default most of the threads are not affected. Please let me know if those sound reasonable to you. Thanks!
30-05-2019

The recent discussion on this: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2019-May/034459.html seemed to immediately conclude that implementing the Rust fix was not a good idea. Further discussion suggests one possibility is to just add a flag to specify "extra stack" that the user can apply if they encounter this kind of TLS problem. This is easy to do and I'd support it. It's not ideal of course but there is no ideal solution here at the VM level. The other point of discussion is whether we can use something less intrusive than the Rust solution and query some other glibc values to try and determine how much extra stack to add - ref __pthread_get_minstack. Though I'm unclear exactly how that would be used. Presumably if a large TLS were in play then __pthread_get_minstack would be slightly larger than that and so we would expand our requested stack by a sufficient amount. But it is unclear how much additional stack would be added in that case. I'm reluctant to consider any change that increases the stack usage for every single Java thread across every single Java application. We just do not know what impact that will have on those applications - epsecially ones that have been carefully tuned to balance thread count and memory.
30-05-2019