JDK-8012941 : JSR 292: too deep inlining might crash compiler because of stack overflow
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: hs24,hs25
  • Priority: P2
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2013-04-22
  • Updated: 2015-02-02
  • Resolved: 2013-10-24
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 7 JDK 8 Other
7u80Fixed 8Fixed hs25Fixed
Test vm/mlvm/meth/func/sajdi/compiledTargetInStackTrace crashes VM with no hs_err generated

RULE vm/mlvm/meth/func/sajdi/compiledTargetInStackTrace Crash Core w/o hs_err found

Regression test: http://cr.openjdk.java.net/~vlivanov/8012941/webrev.00/raw_files/new/test/compiler/jsr292/DeepInliningTest.java

I'd rate it as: ILW = HLL = P4 I = H: VM crash L = L: MH chains of such depth (thousands of nested MHs) are very unlikely in real applications (haven't seen any external reports about such issues) W = L: increase compiler stack size

ILW = LMM = P5 8-defer-request: We run out of stack space when running 292 code. LambdaForm call chains can become very deep. LF invokers have a ForceInline annotation and so inlining heuristics don't trigger. Not sure what is a good fix for that problem, yet. The workaround is to increase the compiler stack size, but this is not something we can fix reliably for all platforms. Request deferral.

7u40-deferal-request Justification: The test used for reproducing this bug is not working properly, I get the same error as in bug JDK-8019389

Hard to fix, I agree. Customers can set different compiler thread stack sizes for various reasons (perhaps they have too many threads) and then our finely tuned heuristic fails. If only C++ had an StackOverflowError...

For a semi-robust fix, I suggest having the compiler (specifically the recursive entry to the byte code parser) actually make an approximate measure of the stack size and compare it with the actual compiler thread stack size (as dictated by system config). Possible measure would be the difference between two addresses, one in the stack frame of the root parser, and one in the current parser invocation. Independently, could also increase the default compiler thread stack size. Independently, could also have a parser recursion depth limit that is always enforced. This sort of thing is hard to fix portably!

LambdaForm call chains can become very deep. LF invokers have a ForceInline annotation and so inlining heuristics don't trigger. Not sure what is a good fix for that problem, yet.

When it fails there are about 1400 invocations: [1375] GraphBuilder::invoke(0xe776f320, 0xb6, 0x1, 0x8), at 0xfe1ba9b5 [1376] GraphBuilder::iterate_bytecodes_for_block(0xe776f320, 0x0, 0x0, 0xffffff9d), at 0xfe19de8e [1377] GraphBuilder::connect_to_end(0xe776f320, 0x8226358, 0x82261f8, 0x8226358), at 0xfe3018e7 [1378] GraphBuilder::GraphBuilder(0xe776f320, 0xe776f5d0, 0x82260f0, 0xfe19ae79), at 0xfe303ff8 [1379] IR::IR(0x82260d8, 0xe776f5d0, 0x8195ce8, 0xffffffff), at 0xfe19b2c3 [1380] Compilation::compile_java_method(0xe776f5d0, 0xe776f940, 0xe776f4f8, 0xfe2f5984), at 0xfe2f4a20 [1381] Compilation::Compilation(0xe776f5d0, 0x8063308, 0xe776f940, 0x8195ce8, 0xffffffff), at 0xfe2f5cdf [1382] Compiler::compile_method(0x8063308, 0xe776f940, 0x8195ce8, 0xffffffff), at 0xfe19084e [1383] CompileBroker::invoke_compiler_on_method(0x819ee10, 0xfeac4000, 0xe776fd48, 0xfe180e1c), at 0xfe18ad49 [1384] CompileBroker::compiler_thread_loop(0x8176000, 0xfeac4000, 0xe776fdf8, 0xfe160f91, 0x8176000, 0x8176000), at 0xfe18136d [1385] compiler_thread_entry(0x8176000), at 0xfe18088f [1386] JavaThread::run(0x8176000, 0xfeb843a0, 0x0, 0xfe82271b), at 0xfe160f91 [1387] java_start(0x8176000, 0xfef02000, 0xe776ffe8, 0xfee63a89), at 0xfe8231b9 [1388] _thrp_setup(0xfde72a40), at 0xfee63adc [1389] _lwp_start(0x809b818, 0x808ecf8, 0xfe77e530, 0x0, 0x0, 0x0), at 0xfee63d80

It fails if -XX:CompilerThreadStackSize=320, which is the default value for the thread stack size from globals_solaris_x86.hpp: // ThreadStackSize 320 allows a couple of test cases to run while // keeping the number of threads that can be created high. define_pd_global(intx, ThreadStackSize, 320); define_pd_global(intx, VMThreadStackSize, 512); define_pd_global(uintx,JVMInvokeMethodSlack, 10*K); #endif // AMD64 define_pd_global(intx, CompilerThreadStackSize, 0);

If get_vmtarget() had returned correctly another call to try_inline would likely have followed so we cannot tell if the recursion is endless or not. I looked at the core file with dbx: the crash happens at a call, the call target appears to be sane, the call itself, as a side effect, pushes the return address on the stack and the stack is aligned: esp=0xfb691000. The call triggers a SIGSEGV not because of a bad target but because it overflows the stack.

Christian, why do you think that this is an endless recursion? According to the stack GraphBuilder has a long but ending chain of invocations. And failure happened in: fe18dd4f __1cPciObjectFactoryRcreate_new_object6MpnHoopDesc__pnIciObject__ (80fb8a0, d4df42d0, 87d35dc, fe18d6d1) + 5f DBX showed: terminated by signal SEGV 0xffffffffffffffff: <bad address 0xffffffffffffffff> (dbx) lwps I'm trying to reproduce it on host it failed

Could someone try to increase the stack size to see if it's an endless recursion (real bug) or just very deep?

Looks like a stack overflow in a very long chain of: GraphBuilder::try_method_handle_inline GraphBuilder::try_inline GraphBuilder::invoke GraphBuilder::iterate_bytecodes_for_block GraphBuilder::try_inline_full GraphBuilder::try_inline

It almost looks like JDK-8011138, but this is a JSR292-related test and stack shows that there were some MethdHandle related operations: __1cOciMethodHandleMget_vmtarget6kM_pnIciMethod_ So new bug was submitted