JDK-8168926 : C2: Bytecode escape analyzer crashes due to stack overflow
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 7,8,9
  • Priority: P2
  • Status: Closed
  • Resolution: Fixed
  • Submitted: 2016-10-31
  • Updated: 2017-07-26
  • Resolved: 2017-01-11
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 10 JDK 9
10Fixed 9 b156Fixed
Related Reports
Duplicate :  
Duplicate :  
Relates :  
Description
VM crash from test sample/mergesort/MergeSortTest.java

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (bcEscapeAnalyzer.cpp:106), pid=21249, tid=21295
#  guarantee(_stack_height < _max_stack) failed: stack overflow
#

Comments
verified by nightly testing
26-07-2017

Below is the text I sent out in the RFR. http://cr.openjdk.java.net/~zmajo/8168926/webrev.00/ This is a bug in C2's escape analyzer (EA) I've been chasing for more than a year now. The bug reproduces very rarely (<10 appearances since Sep '15) and in different forms/with different tests (see JDK-8135159 for a set of different manifestations of the same bug). I tried to reproduce the crash on at least five different occasions and with different tests, but did not succeed, unfortunately. So my findings (and the fix) rely only on source-code/test inspection and post-mortem analysis of crashes I've seen. The bug is caused by the EA having an inconsistent view of the number of parameters taken by a call site 'c'. If call site 'c' in a method 'm' is dynamic (i.e., 'c' is targeted by an invokehandle or invokedynamic instruction), the number of parameters taken by 'c' is different before and after 'c' is resolved. That is, after 'c' is resolved, 'c' takes one more argument than the number of arguments pushed onto the stack by 'm' (as 'c' is dynamic, it needs an extra appendix argument after resolution). In its current state, EA can have two views of 'c' for the analysis of 'c'. I.e., EA can use both a "before-resolution" and an "after-resolution" view of 'c'. As a result, EA can pop fewer elements from the stack than there were pushed onto the stack, which results in a stack overflow. Here is a detailed scenario to illustrate the problem. Let's assume the following sequence of operations to take place while EA is analyzing method 'm'. Step (1): EA obtains the method targeted by call site 'c' in 'm'. The result is saved into ciMethod 'target': http://hg.openjdk.java.net/jdk9/hs/hotspot/file/026ff073b5ad/src/share/vm/ci/bcEscapeAnalyzer.cpp#l895 Let's assume that 'c' is not yet resolved at this point of time, i.e., the number of arguments N of 'target' does not include the appendix argument (i.e., N is equal to the number of items pushed onto the stack by the the bytecodes of method 'm'). Step (2): A thread different than the compiler thread performing EA of 'm' reaches call site 'c' and executes it. As a result, 'c' is resolved (and bootstrapped) and it now points to a method taking N+1 parameters (one more parameter than before, because the parameters also include the appendix argument). Step (3): EA checks if call site 'c' has an appendix argument. http://hg.openjdk.java.net/jdk9/hs/hotspot/file/026ff073b5ad/src/share/vm/ci/bcEscapeAnalyzer.cpp#l899 As there is an appendix argument, an extra (unknown) argument is pushed onto the stack. I.e., there are N+1 elements on the stack at this point of time. Step (4): EA continues with analyzing the call site http://hg.openjdk.java.net/jdk9/hs/hotspot/file/026ff073b5ad/src/share/vm/ci/bcEscapeAnalyzer.cpp#l903 After being done with the analysis, EA removes 'arg_size' number of arguments from the stack. For example, here: http://hg.openjdk.java.net/jdk9/hs/hotspot/file/026ff073b5ad/src/share/vm/ci/bcEscapeAnalyzer.cpp#l294 The number 'arg_size' of arguments is, however, only N. The reason is that 'arg_size' is obtained from ciMethod 'target' constructed back at Step (1), i.e., from the unresolved call site, and does not include the appendix argument. Summary: If the sequence of operations is executed as outlined by Step (1)-(4), the stack can overflow after call site is analyzed, because some arguments pushed onto it are not popped EA is done with analyzing call site'c'. For the problem to appear, the resolution of call site 'c' has to happen concurrently with EA and exactly after Step (1) and before Step (3). That explains why the problem reproduces so rarely. For more information on the investigation please see [1]. The fix I propose determines if a call site 'c' needs an appendix argument solely by looking at the ciMethod 'target' and the current bytecode instruction. By that, EA has only one (consistent) view of call site 'c' (which is either resolved or not). I tested the fix with - RBT (all hotspot tests both with -Xmixed and -Xcomp); - JPRT; - locally executed all jdk/test/java/lang/invoke tests (both with -Xmixed and -Xcomp). No (new) failures appeared.
11-01-2017

The newest failure (the failure that appeared in the nightlies of 2016-12-19) gives better indication of what may cause this problem than earlier failures. Please find a summary of my investigation so far below. The newest failure appears while C2 is compiling the TestCommon::exec() method. The failure happens during escape analysis and, according to the stack trace, it happens while C2 is processing bytecodes of a direct callee of exec(). [...] V [libjvm.so+0xa7893d] report_vm_error(char const*, int, char const*, char const*, ...)+0xdd;; report_vm_error(char const*, int, char const*, char const*, ...)+0xdd V [libjvm.so+0x6908d7] BCEscapeAnalyzer::StateInfo::raw_push(BCEscapeAnalyzer::ArgumentMap)+0x37;; BCEscapeAnalyzer::StateInfo::raw_push(BCEscapeAnalyzer::ArgumentMap)+0x37 V [libjvm.so+0x68bf25] BCEscapeAnalyzer::iterate_one_block(ciBlock*, BCEscapeAnalyzer::StateInfo&, GrowableArray<ciBlock*>&)+0x2925;; V [libjvm.so+0x68d1e6] BCEscapeAnalyzer::iterate_blocks(Arena*)+0x676;; BCEscapeAnalyzer::iterate_blocks(Arena*)+0x676 V [libjvm.so+0x68dcb5] BCEscapeAnalyzer::do_analysis()+0xa5;; BCEscapeAnalyzer::do_analysis()+0xa5 V [libjvm.so+0x68dfa5] BCEscapeAnalyzer::compute_escape_info()+0x225;; BCEscapeAnalyzer::compute_escape_info()+0x225 V [libjvm.so+0x68ed6b] BCEscapeAnalyzer::BCEscapeAnalyzer(ciMethod*, BCEscapeAnalyzer*)+0x57b;; BCEscapeAnalyzer::BCEscapeAnalyzer(ciMethod*, BCEscapeAnalyzer*)+0x57b V [libjvm.so+0x8a8835] ciMethod::get_bcea()+0xd5;; ciMethod::get_bcea()+0xd5 V [libjvm.so+0xb89408] ConnectionGraph::add_call_node(CallNode*)+0x218;; ConnectionGraph::add_call_node(CallNode*)+0x218 V [libjvm.so+0xb98eb4] ConnectionGraph::compute_escape()+0x3d4;; ConnectionGraph::compute_escape()+0x3d4 V [libjvm.so+0xb9a775] ConnectionGraph::do_analysis(Compile*, PhaseIterGVN*)+0x185;; ConnectionGraph::do_analysis(Compile*, PhaseIterGVN*)+0x185 [...] The TestCommon::exec method consists of only 22 bytecodes, two of those are invocations: public static jdk.test.lib.process.OutputAnalyzer exec(java.lang.String, java.lang.String...) throws java.lang.Exception; descriptor: (Ljava/lang/String;[Ljava/lang/String;)Ljdk/test/lib/process/OutputAnalyzer; flags: (0x0089) ACC_PUBLIC, ACC_STATIC, ACC_VARARGS Code: stack=6, locals=2, args_size=2 0: aconst_null 1: aload_1 2: iconst_2 3: anewarray #24 // class java/lang/String 6: dup 7: iconst_0 8: ldc #41 // String -cp 10: aastore 11: dup 12: iconst_1 13: aload_0 14: aastore 15: invokestatic #28 // Method concat:([Ljava/lang/String;[Ljava/lang/String;[Ljava/lang/String;)[Ljava/lang/String; 18: invokestatic #51 // Method execCommon:([Ljava/lang/String;)Ljdk/test/lib/process/OutputAnalyzer; 21: areturn LineNumberTable: line 129: 0 Exceptions: throws java.lang.Exception According to the hs_err file, the compilation of one of the invoked methods, has recently completed. Event: 28.917 Thread 0x00007f4b80578800 7161 b 2 TestCommon::execCommon (72 bytes) That suggests that the escape analysis of TestCommon::execCommon in the context of TestCommon::exec's compilation and the C1 compilation (and/or the execution) of TestCommon::execCommon interact. Also, other failures I've looked at in the context of JDK-8135159 also showed this pattern (a method escape analyzed if compiled/loaded/deoptimized concurrently with the escape analysis). Here are the bytecodes of TestCommon::execCommon. public static jdk.test.lib.process.OutputAnalyzer execCommon(java.lang.String...) throws java.lang.Exception; descriptor: ([Ljava/lang/String;)Ljdk/test/lib/process/OutputAnalyzer; flags: (0x0089) ACC_PUBLIC, ACC_STATIC, ACC_VARARGS Code: stack=6, locals=4, args_size=1 0: ldc #46 // String test.timeout.factor 2: ldc #47 // String 1.0 4: invokestatic #48 // Method java/lang/System.getProperty:(Ljava/lang/String;Ljava/lang/String;)Ljava/lang/String; 7: astore_1 8: aconst_null 9: aload_0 10: bipush 6 12: anewarray #24 // class java/lang/String 15: dup 16: iconst_0 17: aload_1 18: invokedynamic #49, 0 // InvokeDynamic #4:makeConcatWithConstants:(Ljava/lang/String;)Ljava/lang/String; 23: aastore 24: dup 25: iconst_1 26: invokestatic #36 // Method getCurrentArchiveName:()Ljava/lang/String; 29: invokedynamic #37, 0 // InvokeDynamic #2:makeConcatWithConstants:(Ljava/lang/String;)Ljava/lang/String; 34: aastore 35: dup 36: iconst_2 37: ldc #35 // String -XX:+UnlockCommercialFeatures 39: aastore 40: dup 41: iconst_3 42: ldc #38 // String -XX:+UseAppCDS 44: aastore 45: dup 46: iconst_4 47: ldc #50 // String -Xshare:on 49: aastore 50: dup 51: iconst_5 52: ldc #25 // String -showversion 54: aastore 55: invokestatic #28 // Method concat:([Ljava/lang/String;[Ljava/lang/String;[Ljava/lang/String;)[Ljava/lang/String; 58: astore_2 59: iconst_1 60: aload_2 61: invokestatic #29 // Method createJavaProcessBuilder:(Z[Ljava/lang/String;)Ljava/lang/ProcessBuilder; 64: astore_3 65: aload_3 66: ldc #33 // String exec 68: invokestatic #31 // Method executeAndLog:(Ljava/lang/ProcessBuilder;Ljava/lang/String;)Ljdk/test/lib/process/OutputAnalyzer; 71: areturn LineNumberTable: line 115: 0 line 117: 8 line 119: 26 line 117: 55 line 124: 59 line 125: 65 Exceptions: throws java.lang.Exception All bytecodes belong to a single basic block. So the question is how does escape-analysis (mis-)manage its simulated stack so that the stack overflows during analysis. After looking at how the stack is initialized and also at how items are pushed/popped from the stack, the following code snippet in BCEscapeAnalyzer::iterate_one_block() got my attention: ... case Bytecodes::_invokevirtual: case Bytecodes::_invokespecial: case Bytecodes::_invokestatic: case Bytecodes::_invokedynamic: case Bytecodes::_invokeinterface: { bool ignored_will_link; ciSignature* declared_signature = NULL; ciMethod* target = s.get_method(ignored_will_link, &declared_signature); ciKlass* holder = s.get_declared_method_holder(); assert(declared_signature != NULL, "cannot be null"); // Push appendix argument, if one. if (s.has_appendix()) { state.apush(unknown_obj); } ... http://hg.openjdk.java.net/jdk9/hs/hotspot/file/5fa1aab53b6c/src/share/vm/ci/bcEscapeAnalyzer.cpp#l888 If the target of an invoke instruction has an appendix argument, one more element (the appendix argument) is pushed onto the stack. The appendix argument is added to the constant pool when the invokedynamic/invokehandle instruction is bootstrapped. That means that for a boostrapped invokedynamic/invokehandle call site the one more argument is pushed onto the stack than for a non-boostrapped call site. To accomodate the additional element on the stack, the analysis initializes the stack size to a value one larger than the value given by the bytecodes. (E.g., in case of execCommon, the stack size is initialized to 7 instead of 6.) The TestCommon::execCommon method contains two invokedynamic call sites. 18: invokedynamic #49, 0 // InvokeDynamic #4:makeConcatWithConstants:(Ljava/lang/String;)Ljava/lang/String; ... 29: invokedynamic #37, 0 // InvokeDynamic #2:makeConcatWithConstants:(Ljava/lang/String;)Ljava/lang/String; Let's assume the first invokedynamic instruction pushes an appendix argument onto the stack but does not pop it. The stack height after the call will be 7 (instead of 6) and the stack will overflow when the appendix argument for the second invokedynamic call is pushed. The question is: how is it possible that BCEscapeAnalyzer::invoke does not pop the appendix argument? http://hg.openjdk.java.net/jdk9/hs/hotspot/file/5fa1aab53b6c/src/share/vm/ci/bcEscapeAnalyzer.cpp#l249 BCEscapeAnalyzer::invoke() calculates the argument count based on the target of the call. The target is obtained before the check for the appendix argument is executed http://hg.openjdk.java.net/jdk9/hs/hotspot/file/5fa1aab53b6c/src/share/vm/ci/bcEscapeAnalyzer.cpp#l895 If at that point the call site is not bootstrapped, the argument count is one less than after boostrapping. Now, let's assume that bootstrapping the call site completes after the target was obtained (but before the presence of the appendix argument is checked). In this case the argument count is one less than the number of arguments pushed onto the stack, which will result in the failure described above. The current issue to happen requires events (bootstrapping and escape analysis) to happen in a well-defined sequence. That explain why the current bug (as well as JDK-8135159) happen rarely and are difficult to reproduce. The next step towards fixing this issue is to figure out how to offer a consistent view of the presence of appendix arguments/number of arguments.
23-12-2016

The machine where the failure has originally appeared has now a different kernel version as when the failure has appeared (3.8.13-35.3.1.el7uek.x86_64 vs. 3.8.13-55.1.6.el7uek.x86_64). Therefore, I tried to reproduce the problem only locally on my linux-x64 machine. I used the same build and executed all jdk tests. The failure did not reproduce. Also, kernel version 3.8.13-55.1.6.el7uek.x86_64 is affected by the XMM corruption bug on Linux (see JDK-8064919 for more details). Any testing results coming from machines with a broken kernel should be treated with suspicion. I'll close the issue as cannot reproduce for now. Please reopen if the problem shows up again.
22-11-2016

This issue looks very similar to JDK-8135159. The failing assert/guarantee is different in the case of this bug, however, in both cases the issue is related to the stack height computed by the bytecode escape analyzer. I've closed JDK-8135159 as "Cannot reproduce". Let's keep it that way; I'll do the investigation on this bug for now. If the investigation reveals more about JDK-8135159, I'll reopen that issue (or close it as a duplicate of this bug).
10-11-2016

ILW=crash (with failing assert),rare,no workaround=HLH=P2
01-11-2016