Bug ID: JDK-8024830 SEGV in org.apache.lucene.codecs.compressing.CompressingTermVectorsReader.get

JDK 7	JDK 8	Other
7u55Fixed	8Fixed	hs25,port-stage-ppc-aixFixed

SQE OK to take the fix into CPU14_02
17-01-2014
Customer verified the fix: I confirm that this patch fixes the problem encountered in LUCENE-5212. I've tested svn rev. 1523179 (trunk) against jdk8-b114 with and without Vladimir's patch. Without the patch the test sequence ends about 50% of the time in a sigsegv. With the patch all executions ended without any errors.
11-11-2013
It is very hard to get small test (to make vector spill in particular stack slot). I used lucene tests to verify the fix. After installing lucene test suite and setting up proxies as described above I run next script in lucene/core directory. Note, you need to run 64-bit PRODUCT VM on SandyBridge x86 (it is important for vectors generation): $ cat run.sh #!/bin/sh COUNT=1 while [ $COUNT -le 20 ] do echo "COUNT: " $COUNT $JAVA_HOME/bin/java -d64 -version \rm -rf ../../.caches/ ant test -Dtests.seed=43A1116E7F98BED4 -Dtests.jvms=1 -Dargs="-d64 -XX:-UseCompressedOops -XX:+UseParallelGC -XX:-TieredCompilation" if [ -d ../build/core/test/J0 ] then echo "FAILED!!!!!!!!!!!!!!!!" echo "COUNT: " $COUNT break fi COUNT=`expr $COUNT + 1` done You should hit one of the failure (crush or "CheckIndex failed" test error) in few iterations.
09-11-2013
Next is the fix with which lucene test does not fail anymore (I will run it with more iterations). Previous version of the fix had a typo which caused problems in fastdebug VM. @@ -464,17 +464,17 @@ C->FIRST_STACK_mask().Clear(); // Add in the incoming argument area - OptoReg::Name init = OptoReg::add(_old_SP, C->out_preserve_stack_slots()); - for (i = init; i < _in_arg_limit; i = OptoReg::add(i,1)) + OptoReg::Name init_in = OptoReg::add(_old_SP, C->out_preserve_stack_slots()); + for (i = init_in; i < _in_arg_limit; i = OptoReg::add(i,1)) { C->FIRST_STACK_mask().Insert(i); - + } // Add in all bits past the outgoing argument area guarantee(RegMask::can_represent_arg(OptoReg::add(_out_arg_limit,-1)), "must be able to represent all call arguments in reg mask"); - init = _out_arg_limit; - for (i = init; RegMask::can_represent(i); i = OptoReg::add(i,1)) + OptoReg::Name init = _out_arg_limit; + for (i = init; RegMask::can_represent(i); i = OptoReg::add(i,1)) { C->FIRST_STACK_mask().Insert(i); - + } // Finally, set the "infinite stack" bit. C->FIRST_STACK_mask().set_AllStack(); @@ -506,16 +506,36 @@ idealreg2spillmask[Op_VecS]->OR(C->FIRST_STACK_mask()); } if (Matcher::vector_size_supported(T_FLOAT,2)) { + // For VecD we need dual alignment and 8 bytes (2 slots) for spills. + // RA guarantee such alignment since it is needed for Double and Long values. idealreg2spillmask[Op_VecD] = idealreg2regmask[Op_VecD]; idealreg2spillmask[Op_VecD]->OR(aligned_stack_mask); } if (Matcher::vector_size_supported(T_FLOAT,4)) { + // For VecX we need quadro alignment and 16 bytes (4 slots) for spills. + // + // RA can use input arguments stack slots for spills but until RA + // we don't know frame size and offset of input arg stack slots. + // + // Exclude last arg stack slots to get enough space in all cases + // otherwise vector spills could stomp over stack slots in caller frame. + OptoReg::Name in = OptoReg::add(_in_arg_limit, -1); + for (int k = 1; (in >= init_in) && (k < RegMask::SlotsPerVecX); k++) { + aligned_stack_mask.Remove(in); + in = OptoReg::add(in, -1); + } aligned_stack_mask.clear_to_sets(RegMask::SlotsPerVecX); assert(aligned_stack_mask.is_AllStack(), "should be infinite stack"); idealreg2spillmask[Op_VecX] = idealreg2regmask[Op_VecX]; idealreg2spillmask[Op_VecX]->OR(aligned_stack_mask); } if (Matcher::vector_size_supported(T_FLOAT,8)) { + // For VecY we need octo alignment and 32 bytes (8 slots) for spills. + OptoReg::Name in = OptoReg::add(_in_arg_limit, -1); + for (int k = 1; (in >= init_in) && (k < RegMask::SlotsPerVecY); k++) { + aligned_stack_mask.Remove(in); + in = OptoReg::add(in, -1); + } aligned_stack_mask.clear_to_sets(RegMask::SlotsPerVecY); assert(aligned_stack_mask.is_AllStack(), "should be infinite stack"); idealreg2spillmask[Op_VecY] = idealreg2regmask[Op_VecY];
09-11-2013
Huraaa! :) Found it, next 16-bytes vector spill to stack in readPositions() corrupted the stack slot (with 0) in the caller frame: 0xfffffd7ff6dd0191: vmovdqu %xmm0,0xe0(%rsp) # {method} 'readPositions' '(IILorg/apache/lucene/util/packed/PackedInts$Reader;Lorg/apache/lucene/util/packed/PackedInts$Reader;[III[[I)[[I' in 'org/apache/lucene/codecs/compressing/CompressingTermVectorsReader' # this: rsi:rsi = 'org/apache/lucene/codecs/compressing/CompressingTermVectorsReader' # parm0: rdx = int # parm1: rcx = int # parm2: r8:r8 = 'org/apache/lucene/util/packed/PackedInts$Reader' # parm3: r9:r9 = 'org/apache/lucene/util/packed/PackedInts$Reader' # parm4: rdi:rdi = '[I' # parm5: [sp+0xd0] = int (sp of caller) # parm6: [sp+0xd8] = int # parm7: [sp+0xe0] = '[[I' Last 3 parameters are passed on stack. Last one 0xe0(rsp) is located just before the stack slot with 0. We incorrectly calculate frame size in method with vectors.
08-11-2013
There are 2 call to readPositions() in the java code: startOffsets = readPositions(skip, numFields, flags, numTerms, termFreqs, OFFSETS, totalOffsets, positionIndex); lengths = readPositions(skip, numFields, flags, numTerms, termFreqs, OFFSETS, totalOffsets, positionIndex); Stack was corrupted after first call.
08-11-2013
r11d is loop's increment variable (next assembler from other output so addresses do not match but code is the same): 0xfffffd7ff92e57a9: mov 0x11c(%rsp),%r11d 0xfffffd7ff92e57b1: inc %r11d ;iinc ; - org.apache.lucene.codecs.compressing.CompressingTermVectorsReader::get@1822 (line 388) 0xfffffd7ff92e57b4: mov %r11d,0x11c(%rsp) 0xfffffd7ff92e57bc: cmp 0x68(%rsp),%r11d 0xfffffd7ff92e57c1: jl 0xfffffd7ff92e5143 ;if_icmpge And r11d contains 0: R11=0x0000000000000000 is an unknown value Which means it is on the entry of the loop. Next I will check stack state before and after call.
08-11-2013
I instrumented compiled code to check state of stack slot which should have 0 at merge points (there is corresponding Phi node in the block which have type int:0): // Check stack spills which are holding constant zero. for( uint i=0; i<_cfg->_num_blocks; i++ ) { Block b = _cfg->_blocks[i]; for (uint j = 0; j < b->_nodes.size(); j++) { Node n = b->_nodes[j]; if (n != NULL && n->is_Phi() && (n->as_Phi()->find_int_con(-1) == 0)) { OptoReg::Name reg = _regalloc->get_reg_first(n); if (!OptoReg::is_reg(reg)) { // Stack int offset = _regalloc->reg2offset(reg); tty->print_cr("[SP + %d]: test for 0", offset); MachNode* test = new (this) MachTestSpillNode(offset); b->_nodes.insert(j++, test); _cfg->_bbs.map(test->_idx, b); } } } } and void MachTestSpillNode::emit(CodeBuffer &cbuf, PhaseRegAlloc*) const { MacroAssembler _masm(&cbuf); Label good; __ cmpl(Address(rsp, _offset), 0); __ jccb(Assembler::zero, good); __ hlt(); __ bind(good); } And I hit it during run: fffffd7ff6eb986f e8 7c 0d f0 ff callq 0xfffffffff6dba5f0 fffffd7ff6eb9874 48 89 84 24 c8 00 00 00 mov %rax,0xc8(%rsp,1) fffffd7ff6eb987c 44 8b 5c 24 68 mov 0x68(%rsp,1),%r11d fffffd7ff6eb9881 45 85 db test %r11d,%r11d fffffd7ff6eb9884 7f 15 jg 0xfffffffff6eb989b fffffd7ff6eb9886 e9 ed 06 00 00 jmpq 0xfffffffff6eb9f78 fffffd7ff6eb988b 45 8b da mov %r10d,%r11d fffffd7ff6eb988e 44 89 9c 24 28 01 00 00 mov %r11d,0x128(%rsp,1) fffffd7ff6eb9896 e9 51 05 00 00 jmpq 0xfffffffff6eb9dec fffffd7ff6eb989b 45 33 db xor %r11d,%r11d fffffd7ff6eb989e 44 89 9c 24 1c 01 00 00 mov %r11d,0x11c(%rsp,1) fffffd7ff6eb98a6 81 7c 24 18 00 00 00 00 cmpl $0x0,0x18(%rsp,1) fffffd7ff6eb98ae 74 01 je 0xfffffffff6eb98b1 --------------- fffffd7ff6eb98b0 f4 hlt fffffd7ff6eb98b1 4c 8b 54 24 38 mov 0x38(%rsp,1),%r10 fffffd7ff6eb98b6 45 8b 52 10 mov 0x10(%r10),%r10d fffffd7ff6eb98ba 44 8b 9c 24 1c 01 00 00 mov 0x11c(%rsp,1),%r11d fffffd7ff6eb98c2 45 3b da cmp %r10d,%r11d fffffd7ff6eb98c5 0f 83 9c 43 00 00 jae 0xfffffffff6ebdc67 The call is: call,static org.apache.lucene.codecs.compressing.CompressingTermVectorsReader::readPositions # org.apache.lucene.codecs.compressing.CompressingTermVectorsReader::get @ bci:1588 The value on stack sp+0x18 is not 0 but 0xfffffffb: Top of Stack: (sp=0xfffffd7ff342e130) 0xfffffd7ff342e130: fffffd7fdff212f8 fffffd7fdff20af0 0xfffffd7ff342e140: fffffd7fdff24cd0 fffffffffffffffb 0xfffffd7ff342e150: 0000000100000001 fffffd7fdff20af0 0xfffffd7ff342e160: fffffd7fdff24d58 fffffd7fdff24d78 And the block fffffd7ff6eb98a6 is the top of the loop. So it is not clear now was it modified by call or during loop execution. I will continue investigate.
08-11-2013
I spent some time building infrastructure (based on ciReplay) to replay Inlining to be able reproduce the failure in debugger. Unfortunately it did not help :( I'm returning back to vectorization changes for 6340864. With global flag -XX:-UseSuperWord OFF but switched ON vectorization only for 2 not inlined methods which are called from CompressingTermVectorsReader.get() the crush is reproduced (7u40): -XX:-UseSuperWord -XX:CompileCommand=option,org.apache.lucene.util.packed.BlockPackedReaderIterator::next,UseSuperWord -XX:CompileCommand=option,org.apache.lucene.codecs.compressing.CompressingTermVectorsReader::readPositions,UseSuperWord 461 org.apache.lucene.util.packed.BlockPackedReaderIterator::next (64 bytes) (Max vector: 16) 1642 org.apache.lucene.codecs.compressing.CompressingTermVectorsReader::readPositions (292 bytes)(Max vector: 16) When I disabled vectorization for any of these two methods the failure does not happen. So I am back to the theory of memory stomp (which happens to be stack frame of caller method get()). I found that crush reproduced more frequently with -XX:-UseCISCSpill so I added to my run script. Another thing is the crush is reproduced only when running standart lucene testing using ant which fork java process and communicates with it. I can't reproduce failure when run tests with standalone java process (with huge command line).
07-11-2013
Reproduced the failure with latest jdk8 build: [junit4] # A fatal error has been detected by the Java Runtime Environment: [junit4] # [junit4] # SIGSEGV (0xb) at pc=0xfffffd7ff9185ddd, pid=10052, tid=342 [junit4] # [junit4] # JRE version: Java(TM) SE Runtime Environment (8.0-b108) (build 1.8.0-ea-b108) [junit4] # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.0-b50 mixed mode solaris-amd64 ) [junit4] # Problematic frame: [junit4] # J 2123 C2 org.apache.lucene.codecs.compressing.CompressingTermVectorsReader.get(I)Lorg/apache/lucene/index/Fields; (2818 bytes) @ 0xfffffd7ff9185ddd [0xfffffd7ff9184440+0x199d] [junit4] #
25-09-2013
It stop failing in hs25-b21 after 8007439 changes ("C2: adding successful message of inlining") because of 8010222 "8007439 disabled inlining of cold accessor methods" fixed in b83.
25-09-2013
jdk8 stops failing after b80 (hs25-b21). I verified that it was Hotspot changes because jdk8-b80 + hs25-b20 fails (and I am testing reverse combination). I looked through Hotspot changes and all Compiler changes were backported into 7u40. I will do more investigation.
21-09-2013
I am suspecting that vectorization (which is not present in compiled method) affects inlining. I am looking on inlining decisions during compilation and will try to reproduce the problem by replaying the same inlining.
20-09-2013
Attached hs_err_pid26503.log file and assembl_crush.txt with assembler code of compiled get() method in which crush happened.
19-09-2013
It also pass with -XX:MaxVectorSize=8. I am still thinking that the problem somewhere in Register Allocator.
19-09-2013
Looks like my changes for vector arithmetic 6340864. Test runs with switched off vectors: -XX:-UseSuperWord. Currently I don't know why it has such effect. http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/006050192a5a
19-09-2013
Start failing around hs24-b21: [junit4] # SIGSEGV (0xb) at pc=0xfffffd7ff91d9f7d, pid=23810, tid=343 [junit4] # [junit4] # JRE version: Java(TM) SE Runtime Environment (8.0-b54) [junit4] # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.0-b21 mixed mode solaris-amd64 ) [junit4] # Problematic frame: [junit4] # J org.apache.lucene.codecs.compressing.CompressingTermVectorsReader.get(I)Lorg/apache/lucene/index/Fields; [junit4] # Note, first 7u40 build b01 has hs24-b24. Next, I will try to find changeset.
19-09-2013
I am now confident that it is the failing place. I was able to dump more assembler code around failing instruction and it is matching code in optoassembler output at that place. From what I see it could Register Allocator problem, specifically post_allocate_copy_removal(). Because I don't see zero initialization (xorl) for loop's iteration variable 0x18(%rsp). That could be why it contains some strange value (-5). An other issue is GCM - why it scheduled zeroing far away from the loop so it needs to be spilled on stack? Next I will try to find changeset which introduced regression because I was not able to reproduce the problem in debug VM so I can't debug the problem.
18-09-2013
I reproduced "CheckIndex failed" from LUCENE-5212 report and : [junit4] 1> java.lang.NegativeArraySizeException [junit4] 1> at org.apache.lucene.codecs.compressing.CompressingTermVectorsReader.get(CompressingTermVectorsReader.java:433) [junit4] 1> at org.apache.lucene.index.SegmentReader.getTermVectors(SegmentReader.java:169) [junit4] 1> at org.apache.lucene.index.CheckIndex.testTermVectors(CheckIndex.java:1524) [junit4] 1> at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:611) [junit4] 1> at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:223) It was run with fastdebug VM and no asserts were triggered.
18-09-2013
I was able to reproduce it on Solaris box but only with product VM. Looking on code pattern I believe it is sum() method inlined into get() at line 526 in CompressingTermVectorsReader.java private static int sum(int[] arr) { int sum = 0; for (int el : arr) { sum += el; } return sum; } public Fields get(int doc) throws IOException { 526: assert sum(fieldLengths) == docLen : sum(fieldLengths) + " != " + docLen; It is the only place where result of sum() is compared with other value (assembler code has such compare after loop). The interesting thing the loop (in crush case) is not unrolled even so it is definitely counted. In optoassembler output from fastdebug VM run I see the loop is unrolled.
17-09-2013
core files are useless since you can't open them on other systems. I will try to reproduce it myself since I have instructions now. Thank you, Igor!
17-09-2013
I cant reproduce it on fastdebug build or w/ enabled coops.
17-09-2013
I was able to reproduce SEGV on product build on my laptop - hs_err_pid12412.log
17-09-2013
From Apache bug report, the instructions to run the set of tests that generated this is: 1. svn checkout http://svn.apache.org/repos/asf/lucene/dev/trunk <your repo> 2. cd <your repo>/lucene/core 3. ant test -Dtests.seed=43A1116E7F98BED4 -Dtests.jvms=1 -Dtests.dynamicAssignmentRatio=0 -Dargs="-XX:-UseCompressedOops -XX:+UseParallelGC" I tried this on Mac 10.8.4 using 7u40-b43 and was not able to reproduce either the crash or corruption of index.
17-09-2013
I can suggest -XX:-UseLoopPredicate to switch off predicates. Igor, can you give me instructions how to run it? Thanks.
17-09-2013
from hs_err: 0x68(%rsp) : 0x0000001 which matches the arrays size (see R10 description above). Nothing wrong with loop code. The problem is arrays index which is negative -5: 0x18(%rsp) : 0xfffffffb So it is not clear why range check did not catch it. Note, range check is moved from the loop and should be before the loop.
16-09-2013
So it is small loop: jae .+0x2e88 xorl %r11d,%r11d adress_NOP(9 bytes to align loop) loop: // 0x00007f163d2d34d0 movq 0x80(%rsp),%r10 movl 0x18(%rsp),%r8d addl 0x18(%r10,%r8,4),%r11d incl %r8d movl %r8d,0x18(%rsp) cmpl 0x68(%rsp),%r8d jl loop // .-0x1f cmpl 0xd8(%rsp),%r11d jne .+0x32e0
16-09-2013
Disassembler at crush site: --------------- 00007f163d2d34dd 47 03 5c 82 18 add 0x18(%r10,%r8,4),%r11d 00007f163d2d34e2 41 ff c0 inc %r8d 00007f163d2d34e5 44 89 44 24 18 mov %r8d,0x18(%rsp,1) 00007f163d2d34ea 44 3b 44 24 68 cmp 0x68(%rsp,1),%r8d 00007f163d2d34ef 7c df jl 0x000000003d2d34d0 00007f163d2d34f1 44 3b 9c 24 d8 00 00 00 cmp 0xd8(%rsp,1),%r11d 00007f163d2d34f9 0f 85 da 32 ff ff jne 0x000000003d2c67d9 R10=0x00007f1633fc37d8 is an oop [I - klass: {type array int} - length: 1 R8 =0x00000000fffffffb is an unknown value
16-09-2013
From the Lucent bug report.
16-09-2013
I cant reproduce it. I have tried on 3 different hosts: - 'Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz' - 'Intel(R) Xeon(R) CPU X5675 @ 3.07GHz' - 'Intel(R) Core(TM) i7 CPU M 620 @ 2.67GHz ' '/proc/cpuinfo's are attached.
16-09-2013