Bug ID: JDK-8173817 StackOverflowError in "process reaper" thread

JDK-8173817 : StackOverflowError in "process reaper" thread

Type: Bug
Component: core-libs
Sub-Component: java.lang
Affected Version: 9,10

Priority: P3
Status: Resolved
Resolution: Fixed

Submitted: 2017-02-02
Updated: 2019-04-15
Resolved: 2017-08-22

Versions (Unresolved/Resolved/Fixed)

The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.

JDK 10
10 b21Fixed

Related Reports

Relates :	JDK-8217475 - Unexpected StackOverflowError in "process reaper" thread
Relates :	JDK-8222441 - StackOverflowError in "process reaper" thread has returned
Relates :	JDK-8193803 - Application/microbenchmarks test gets errors in JMH or StackOverflow

Description

JDK9 HS PIT, two different tests failed with  "process reaper" java.lang.StackOverflowError
java/util/zip/EntryCount64k.java: on Intel Xeon 2901 MHz, 32 cores, 252G, Linux / Oracle Linux 7.0, x86_64

runtime/SharedArchiveFile/SharedBaseAddress.java: on Intel Xeon 2901 MHz, 32 cores, 220G, Linux / Oracle Linux 7.0, x86_64 

Cut from jtr file:


Exception in thread "process reaper" java.lang.StackOverflowError
	at java.base/java.util.concurrent.ConcurrentHashMap.fullAddCount(ConcurrentHashMap.java:2591)
	at java.base/java.util.concurrent.ConcurrentHashMap.addCount(ConcurrentHashMap.java:2340)
	at java.base/java.util.concurrent.ConcurrentHashMap.putVal(ConcurrentHashMap.java:1085)
	at java.base/java.util.concurrent.ConcurrentHashMap.putIfAbsent(ConcurrentHashMap.java:1555)
	at java.base/java.lang.invoke.MethodType$ConcurrentWeakInternSet.add(MethodType.java:1304)
	at java.base/java.lang.invoke.MethodType.makeImpl(MethodType.java:314)
	at java.base/java.lang.invoke.MethodType.insertParameterTypes(MethodType.java:403)
	at java.base/java.lang.invoke.MethodHandleNatives.varHandleOperationLinkerMethod(MethodHandleNatives.java:448)
	at java.base/java.lang.invoke.MethodHandleNatives.linkMethodImpl(MethodHandleNatives.java:378)
	at java.base/java.lang.invoke.MethodHandleNatives.linkMethod(MethodHandleNatives.java:366)
	at java.base/java.util.concurrent.CompletableFuture.completeValue(CompletableFuture.java:305)
	at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2072)
	at java.base/java.lang.ProcessHandleImpl.lambda$completion$2(ProcessHandleImpl.java:134)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1161)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:844)

Comments

> There is absolutely no expectation that setting stack sizes is intended to be portable! It's true that the spec makes this "merely a hint". But consider the extreme case - Linux always crashes with StackOverflowError if stacksize is a prime number, while windows always crashes if it's NOT. >Should it have been implemented differently to take it and then add the guards ... perhaps - but that ship has sailed. I continue to think it's a quality of implementation BUG - and bugs can be fixed. (But I also think it's a quality of implementation BUG that StackOverflowError ever happens at all (in non-native code) !)
22-08-2017
This improvement is valid and may be useful for JDK 9 also.
22-08-2017
There is absolutely no expectation that setting stack sizes is intended to be portable! From the Thread constructor docs: The stack size is the approximate number of bytes of address space that the virtual machine is to allocate for this thread's stack. The effect of the stackSize parameter, if any, is highly platform dependent. On some platforms, specifying a higher value for the stackSize parameter may allow a thread to achieve greater recursion depth before throwing a StackOverflowError. Similarly, specifying a lower value may allow a greater number of threads to exist concurrently without throwing an OutOfMemoryError (or other internal error). The details of the relationship between the value of the stackSize parameter and the maximum recursion depth and concurrency level are platform-dependent. On some platforms, the value of the stackSize parameter may have no effect whatsoever. The virtual machine is free to treat the stackSize parameter as a suggestion. If the specified value is unreasonably low for the platform, the virtual machine may instead use some platform-specific minimum value; if the specified value is unreasonably high, the virtual machine may instead use some platform-specific maximum. Likewise, the virtual machine is free to round the specified value up or down as it sees fit (or to ignore it completely). Specifying a value of zero for the stackSize parameter will cause this constructor to behave exactly like the Thread(ThreadGroup, Runnable, String) constructor. Due to the platform-dependent nature of the behavior of this constructor, extreme care should be exercised in its use. The thread stack size necessary to perform a given computation will likely vary from one JRE implementation to another. In light of this variation, careful tuning of the stack size parameter may be required, and the tuning may need to be repeated for each JRE implementation on which an application is to run. --- This API exists for very rare use-cases and the expectation is that it is tuned in-place as needed. Should it have been implemented differently to take it and then add the guards ... perhaps - but that ship has sailed. For the case in hand removing the use of lambda's seems most reasonable as their implementation seems to be constantly changing and thus their stack usage.
18-08-2017
How is one supposed to know the guard page sizes? That is an implementation dependent value. I can understand the -Xss value being the total amount. But the Java API for new Thread(threadGroup, runnable, name, stackSize, inherthreadLocals) should not have to include vm specific adjustments that are inherently non-portable. In this particular case, I'm inclined to switch from lambda to an anonymous class to avoid the extra stack requirement. The stack is limited to avoid wasting stack space for what may be many threads waiting for processes.
17-08-2017
The stacksize for Java threads has to accommodate the guard pages as well. -Xss specifies actual stack size to give the thread, not stack size available to application code. The log shows the additional native frames related to the method handle use. So it seems that the stack size for the reaper thread is simply too small for some circumstances.
17-08-2017
There's a min stack size check that adds in all the extra zones to the actual allowed minimum specified stack size, and makes sure that the specified stack size is at least this big. So you must have passed this test. However, if 128k was specified, I'm not sure why you got 132k. Maybe there's some rounding going on there.
16-08-2017
Thanks for the explanation. The requested stack size is currently 128*1024. I would have assumed that the extra stack zones would be outside the requested size.
16-08-2017
It says your thread stack is 136k with 92k remaining. There are 22 shadow pages on linux-x64, which would be 88k, so you are right near 92k. I'm a bit rusty on all the stack overflow limit checks, but probably another stack zone size (like red zone) is being added to the shadow zone size as part of the limit check, and that puts you over 92k. Martin mentioned the requested stack size for this thread being 32k so, that means another 104k was added for all the stack zones, and the actual usage at this point is 44k, exceeding the 32k requested (although some of that is for error reporting).
16-08-2017
Attached hs_err file for StackOverflowError.
15-08-2017
Internal diagnostic option: -XX:+UnlockDiagnosticVMOptions -XX:AbortVMOnException=java.lang.StackOverflowError
15-08-2017
I spent some time trying to reproduce this and failed. Hotspot/jl.invoke folks will need to debug. It's amusing that I've been doing Java long enough that I've touched all of EntryCount64k, "process reaper", CompletableFuture and ConcurrentHashMap.
12-08-2017
[~martin] Adding the pre-invocations of CompleteableFuture.completedState/failedFuture in a static initializer does not seem to be sufficiently effective. Running the zip/EntryCount64k test repeatedly still fails (this time after 53 iterations).
11-08-2017
I would expect VH/MH one-time initialization code to appear in the stacktrace if it is the cause. Such frames should not be hidden. I have no idea what ShowHiddenFrames affects. To see the true stack usage we would need to abort at the point where we want to throw the StackOverflowError.
10-08-2017
I propose we adopt my patch if we have evidence it fixes the problem.
10-08-2017
It may be useful to update the tests to see the hidden frames with: -XX:+UnlockDiagnosticVMOptions -XX:+ShowHiddenFrames
10-08-2017
Sorry ... I confess to having my own preconceived notions about the 2006 era implementation. Taking another look at the stack trace with open eyes, if a process reaper thread causes a VarHandle-using method to be invoked for the first time, then the java.lang.invoke machinery is invoked, causing a rare one-time stack growth. Maybe Paul Sandoz has advice to give here? Should we abandon the idea of small stack threadpools? I'm tempted to add "VarHandle-prelinking code" before creating the threadpool. Something like the below should do?! (Analogous to how we pre-load LockSupport.class) # HG changeset patch # User martin # Date 1502382936 25200 # Thu Aug 10 09:35:36 2017 -0700 # Node ID 8932a16e086c9d7550865ac7a6bd24c75289b881 # Parent 187af163c4044f49a469ed402ffcee0ea60365d4 [mq]: Process-VarHandle-pre-linking diff --git a/src/java.base/share/classes/java/lang/ProcessHandleImpl.java b/src/java.base/share/classes/java/lang/ProcessHandleImpl.java --- a/src/java.base/share/classes/java/lang/ProcessHandleImpl.java +++ b/src/java.base/share/classes/java/lang/ProcessHandleImpl.java @@ -78,6 +78,14 @@ private static native void initNative(); + // VarHandle pre-linking to avoid StackOverflowError in + // stack-constrained process reaper thread + static { + CompletableFuture.completedStage(null); + CompletableFuture.completedStage(Boolean.TRUE); + CompletableFuture.failedFuture(new NullPointerException()); + } + /** * The thread pool of "process reaper" daemon threads. */
10-08-2017
The stack as shown doesn't hint at any problems. How many hidden stack frames are there likely to be in that stack? Is there a way to show them? Between the lambda and the method handles is there likely to be more?
10-08-2017
Yes (in some cases) but this is not a case where we have any unexpected injected TLS usage.
08-08-2017
>> Not sure what TLS has to do with this case Martin. I believe that Linux processes with many thread local variables can be observed to have those variables eat into the stack (they occupy storage in every thread stack) and hence cause SOE in java.
08-08-2017
Not sure what TLS has to do with this case Martin. Roger: my assumption is that -Xcomp is consuming large amounts of stack in a way that is not directly observable, nor accounted for in minimum-stack-size calculations. But the intermittent nature suggests more is at play here than just simple use of Xcomp - perhaps Xcomp has a pathological case in terms of stack. There have been issues with INDY compilation in other areas and we do see MH code on the stack. But I would not expect Xcomp to consume stack between Java frames - it should dive off to do its compilation then return and continue.
08-08-2017
JDK-8184178 appears to be private. Like I always say, the specified stack size should be completely available for use by java frames, and should never be eaten up by native overhead like TLS. Currently, no specified stack size is safe if there is enough TLS in the process. Hotspot should be fixed. (But I also always say we should work on eliminating stack overflow completely for non-native code ...)
08-08-2017
Have the stack requirements changed for a minimal stack? The stack size had previously been raised to 128k. It appears only to be an issue with -Xcomp execution.
08-08-2017
The same problem has been reported in JDK-8184178.
08-08-2017
Another occurrence (JDK-10): http://aurora.us.oracle.com/functional/faces/RunDetails.xhtml?names=2358300.rbt-harold.seigel-bug_8185103-20170805-1318-42875-80&show-limit=0&filter= Options: -Xcomp -Xcomp -XX:MaxRAMFraction=8 -XX:+CreateCoredumpOnCrash -ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:+TieredCompilation
07-08-2017
With VM flags: Java HotSpot(TM) 64-Bit Server VM 9 b0 (9-internal+0-2017-04-22-101337.edvbld.8179013) Options -Xcomp -Xcomp -XX:MaxRAMFraction=8 -XX:+CreateCoredumpOnCrash -ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:+TieredCompilation
26-04-2017
Not reproducible
09-03-2017
Added a teststabilization label since it is intermittent and not reproducible and there is some doubt as to the exact cause.
10-02-2017
At Google we have noticed that the stack size specified via -Xss or Thread constructor is not entirely available for Java frames; e.g. native thread local storage, which is in principle unbounded, eats into the stack space of every thread. Every thread is thus at risk of running into SOE, though of course "good citizen threads" like the Grim Reaper are the most likely to.
07-02-2017
I suspect -Xcomp causes native stack consumption and it is that which results in the SOE.
07-02-2017
I cannot reproduce locally, nor have i observed this on mach5. A 16 frame stack trace causing a SOE seems suspicious to me. AFAICT it's hard to reduce the stack size to anything less than 140K (e.g. set a breakpoint on os::Posix::get_initial_stack_size and step through), and that should be more than enough to cope with 16 frames. (Note that linkage results in an up call from the VM to Java, i dunno if that can affect the stack size.)
07-02-2017
Can I again flag that this only seems to happen with -Xcomp runs! Has anyone seen it with normal runs?
04-02-2017
A one-time initialization (even if brittle) seems better than speculatively wasting 32k for each Reaper.
03-02-2017
I suggest we go with Roger's approach for now. We might be able to reduce it down again in 10. I presume one has to guestimate the stack size? Is it possible to monitor the actual max stack size used by a thread? There are various code paths that CF.complete can take, i am not confident explicit calls in <clinit> will process all relevant call sites. It's as if we require a "please pre-link all sig-poly call sites" hotspot annotation after <clinit> (it might be possible to rearrange CF code to make this reliable with shared secret calls). I don't know if there is anything in HotSpot we can do to further reduce memory requirements. I am about to commit a patch in 10 that should reduce the MethodType churn and therefore CMH access, but will not reduce it to zero. We do plan to replace CHM with a more focused class that may further reduce the stack size.
03-02-2017
It's terrible to have a reaper thread per sub-process, but there is no known way to avoid that (we've been thinking about it for a decade). If you give each reaper thread a typical Linux stack of 8MB, then 1000 subprocesses will require more address space than a 32-bit system has. By comparison, many C++ programs at Google try to live within a 16kb stack size limit. And we've already quadrupled the reaper thread stack size to 128kb. Maybe all we need in the <clinit> is new CompletableFuture().complete(null) ? Or maybe also completeExceptionally(someException) ? (But I agree it's brittle.)
03-02-2017
Propose to increase the reaper stack size by 32k (now 160k).
03-02-2017
Ah, so the SOE occurs because VarHandle linkage increases the stack size. Interesting! I think we should explore increasing the stack size a little bit. Ensuring the right call site is pre-linked somewhere else is a hacky. Both are fragile. The code in question is: if (completion == null) { // newCompletion has just been installed successfully completion = newCompletion; // spawn a thread to wait for and deliver the exit value processReaperExecutor.execute(() -> { int exitValue = waitForProcessExit0(pid, shouldReap); newCompletion.complete(exitValue); // remove from cache afterwards completions.remove(pid, newCompletion); }); } There may be a more general issue here with the completion: newCompletion.complete(exitValue); that might result in the triggering of dependents in the CF chain, which could result in an SOE. What if some user action is called from the reaper thread?
03-02-2017
[~rriggs] ok, phew :-)
03-02-2017
'User' actions are always handled using handleAsync() so they get a real thread not the reaper thread. See ProcessImpl.onExit().
03-02-2017
I don't know of any good way to do this apart from actually using them at least once, e.g., adding something like this to ProcessHandleImpl might be a hacky workaround to this particular issue: static { new ExitCompletion().complete(0); } [~psandoz] might have some insight into a more succinct way?
03-02-2017
Is there an recommended way to force the VarHandle initialization. That could be done in an ordinary thread before the daemon threads were needed and avoid the race.
03-02-2017
CompletableFuture hasn't been changed since 2016-07-15, where it was changed to use VarHandles (as evident in the stacktrace), and the code for the reaper thread in ProcessHandleImpl hasn't changed since 2016-04-13, so I think we're dealing with a somewhat rare race to be the first one to actually use one of the VarHandles declared in CompletableFuture (triggering the actual linking), and if one of the reaper threads draws the shortest stick this is apparently what happens.
03-02-2017
So is this related to the j.u.c updates to use MethodHandles?
02-02-2017
For many years, the reaper thread was happy with a stack size of 32k. Having a small stack size was my idea, reducing the per-subprocess overhead and avoiding address space exhaustion. In the new world of lambdas, we should ensure that no class loading or lambda initialization happens in the reaper thread. Can we push initialization eagerly into a clinit for some class not run in the reaper thread?
02-02-2017
The ProcessReaper stack has a default size of 128*1024 expecting very little use of the Java stack. The stack size can be increased but how far?
02-02-2017
Not sure why this was filed against java.util.concurrent. The stacktrace should not be inducing a StackOverflowError unless the process reaper thread has an exceedingly small stack! Interesting these are Xcomp runs that are failing. I would have to suspect there is native stack space getting consumed here.
02-02-2017