United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
JDK-8024830 : SEGV in org.apache.lucene.codecs.compressing.CompressingTermVectorsReader.get

Submit Date:
Updated Date:
Project Name:
Resolved Date:
Affected Versions:
Fixed Versions:
hs25 (b59)

Related Reports

Sub Tasks

Someone tweeted a link to a Lucene crash with 7u40 (hs24.0-b56):


The issue in the Apache JIRA has a hs_err attached.


SQE OK to take the fix into CPU14_02
Customer verified the fix:

I confirm that this patch fixes the problem encountered in
LUCENE-5212. I've tested svn rev. 1523179 (trunk) against jdk8-b114
with and without Vladimir's patch. Without the patch the test sequence
ends about 50% of the time in a sigsegv. With the patch all executions
ended without any errors.

It is very hard to get small test (to make vector spill in particular stack slot). I used lucene tests to verify the fix.
After installing lucene test suite and setting up proxies as described above I run next script in lucene/core directory.
Note, you need to run 64-bit PRODUCT VM on SandyBridge x86 (it is important for vectors generation):

$ cat run.sh 
while [ $COUNT -le 20 ]
  echo "COUNT: " $COUNT
  $JAVA_HOME/bin/java -d64 -version
  \rm -rf ../../.caches/
  ant test -Dtests.seed=43A1116E7F98BED4 -Dtests.jvms=1 -Dargs="-d64 -XX:-UseCompressedOops -XX:+UseParallelGC -XX:-TieredCompilation"
  if [ -d ../build/core/test/J0 ]
    echo "FAILED!!!!!!!!!!!!!!!!"
    echo "COUNT: " $COUNT
  COUNT=`expr $COUNT + 1`

You should hit one of the failure (crush or "CheckIndex failed" test error) in few iterations.

Next is the fix with which lucene test does not fail anymore (I will run it with more iterations). Previous version of the fix had a typo which caused problems in fastdebug VM.

@@ -464,17 +464,17 @@
   // Add in the incoming argument area
-  OptoReg::Name init = OptoReg::add(_old_SP, C->out_preserve_stack_slots());
-  for (i = init; i < _in_arg_limit; i = OptoReg::add(i,1))
+  OptoReg::Name init_in = OptoReg::add(_old_SP, C->out_preserve_stack_slots());
+  for (i = init_in; i < _in_arg_limit; i = OptoReg::add(i,1)) {
+  }
   // Add in all bits past the outgoing argument area
             "must be able to represent all call arguments in reg mask");
-  init = _out_arg_limit;
-  for (i = init; RegMask::can_represent(i); i = OptoReg::add(i,1))
+  OptoReg::Name init = _out_arg_limit;
+  for (i = init; RegMask::can_represent(i); i = OptoReg::add(i,1)) {
+  }
   // Finally, set the "infinite stack" bit.
@@ -506,16 +506,36 @@
   if (Matcher::vector_size_supported(T_FLOAT,2)) {
+    // For VecD we need dual alignment and 8 bytes (2 slots) for spills.
+    // RA guarantee such alignment since it is needed for Double and Long values.
     *idealreg2spillmask[Op_VecD] = *idealreg2regmask[Op_VecD];
   if (Matcher::vector_size_supported(T_FLOAT,4)) {
+    // For VecX we need quadro alignment and 16 bytes (4 slots) for spills.
+    //
+    // RA can use input arguments stack slots for spills but until RA
+    // we don't know frame size and offset of input arg stack slots.
+    //
+    // Exclude last arg stack slots to get enough space in all cases
+    // otherwise vector spills could stomp over stack slots in caller frame.
+    OptoReg::Name in = OptoReg::add(_in_arg_limit, -1);
+    for (int k = 1; (in >= init_in) && (k < RegMask::SlotsPerVecX); k++) {
+      aligned_stack_mask.Remove(in);
+      in = OptoReg::add(in, -1);
+    }
      assert(aligned_stack_mask.is_AllStack(), "should be infinite stack");
     *idealreg2spillmask[Op_VecX] = *idealreg2regmask[Op_VecX];
   if (Matcher::vector_size_supported(T_FLOAT,8)) {
+    // For VecY we need octo alignment and 32 bytes (8 slots) for spills.
+    OptoReg::Name in = OptoReg::add(_in_arg_limit, -1);
+    for (int k = 1; (in >= init_in) && (k < RegMask::SlotsPerVecY); k++) {
+      aligned_stack_mask.Remove(in);
+      in = OptoReg::add(in, -1);
+    }
      assert(aligned_stack_mask.is_AllStack(), "should be infinite stack");
     *idealreg2spillmask[Op_VecY] = *idealreg2regmask[Op_VecY];

Huraaa! :)  Found it, next 16-bytes vector spill to stack in readPositions() corrupted the stack slot (with 0) in the caller frame:

0xfffffd7ff6dd0191: vmovdqu %xmm0,0xe0(%rsp)

  # {method} 'readPositions' '(IILorg/apache/lucene/util/packed/PackedInts$Reader;Lorg/apache/lucene/util/packed/PackedInts$Reader;[III[[I)[[I' in 'org/apache/lucene/codecs/compressing/CompressingTermVectorsReader'
  # this:     rsi:rsi   = 'org/apache/lucene/codecs/compressing/CompressingTermVectorsReader'
  # parm0:    rdx       = int
  # parm1:    rcx       = int
  # parm2:    r8:r8     = 'org/apache/lucene/util/packed/PackedInts$Reader'
  # parm3:    r9:r9     = 'org/apache/lucene/util/packed/PackedInts$Reader'
  # parm4:    rdi:rdi   = '[I'
  # parm5:    [sp+0xd0]   = int  (sp of caller)
  # parm6:    [sp+0xd8]   = int
  # parm7:    [sp+0xe0]   = '[[I'

Last 3 parameters are passed on stack. Last one 0xe0(rsp) is located just before the stack slot with 0.

We incorrectly calculate frame size in method with vectors.

There are 2 call to readPositions() in the java code:

      startOffsets = readPositions(skip, numFields, flags, numTerms, termFreqs, OFFSETS, totalOffsets, positionIndex);
      lengths = readPositions(skip, numFields, flags, numTerms, termFreqs, OFFSETS, totalOffsets, positionIndex);

Stack was corrupted after first call.

r11d is loop's increment variable (next assembler from other output so addresses do not match but code is the same):

  0xfffffd7ff92e57a9: mov    0x11c(%rsp),%r11d
  0xfffffd7ff92e57b1: inc    %r11d              ;*iinc
                                                ; - org.apache.lucene.codecs.compressing.CompressingTermVectorsReader::get@1822 (line 388)
  0xfffffd7ff92e57b4: mov    %r11d,0x11c(%rsp)
  0xfffffd7ff92e57bc: cmp    0x68(%rsp),%r11d
  0xfffffd7ff92e57c1: jl     0xfffffd7ff92e5143  ;*if_icmpge

And r11d contains 0:

R11=0x0000000000000000 is an unknown value

Which means it is on the entry of the loop. Next I will check stack state before and after call.
I instrumented compiled code to check state of stack slot which should have 0 at merge points (there is corresponding Phi node in the block which have type int:0):

    // Check stack spills which are holding constant zero.
    for( uint i=0; i<_cfg->_num_blocks; i++ ) {
      Block *b = _cfg->_blocks[i];
      for (uint j = 0; j < b->_nodes.size(); j++) {
        Node *n = b->_nodes[j];
        if (n != NULL && n->is_Phi() && (n->as_Phi()->find_int_con(-1) == 0)) {
          OptoReg::Name reg = _regalloc->get_reg_first(n);
          if (!OptoReg::is_reg(reg)) { // Stack
            int offset = _regalloc->reg2offset(reg);
            tty->print_cr("[SP + %d]: test for 0", offset);

            MachNode* test = new (this) MachTestSpillNode(offset);
            b->_nodes.insert(j++, test);
            _cfg->_bbs.map(test->_idx, b);


  void MachTestSpillNode::emit(CodeBuffer &cbuf, PhaseRegAlloc*) const {
    MacroAssembler _masm(&cbuf);
    Label good;
    __ cmpl(Address(rsp, _offset), 0);
    __ jccb(Assembler::zero, good);
    __ hlt();
    __ bind(good);

And I hit it during run:

fffffd7ff6eb986f e8 7c 0d f0 ff          callq  0xfffffffff6dba5f0
fffffd7ff6eb9874 48 89 84 24 c8 00 00 00 mov    %rax,0xc8(%rsp,1)
fffffd7ff6eb987c 44 8b 5c 24 68          mov    0x68(%rsp,1),%r11d
fffffd7ff6eb9881 45 85 db                test   %r11d,%r11d
fffffd7ff6eb9884 7f 15                   jg     0xfffffffff6eb989b
fffffd7ff6eb9886 e9 ed 06 00 00          jmpq   0xfffffffff6eb9f78
fffffd7ff6eb988b 45 8b da                mov    %r10d,%r11d
fffffd7ff6eb988e 44 89 9c 24 28 01 00 00 mov    %r11d,0x128(%rsp,1)
fffffd7ff6eb9896 e9 51 05 00 00          jmpq   0xfffffffff6eb9dec
fffffd7ff6eb989b 45 33 db                xor    %r11d,%r11d
fffffd7ff6eb989e 44 89 9c 24 1c 01 00 00 mov    %r11d,0x11c(%rsp,1)
fffffd7ff6eb98a6 81 7c 24 18 00 00 00 00 cmpl   $0x0,0x18(%rsp,1)
fffffd7ff6eb98ae 74 01                   je     0xfffffffff6eb98b1
fffffd7ff6eb98b0 f4                      hlt    
fffffd7ff6eb98b1 4c 8b 54 24 38          mov    0x38(%rsp,1),%r10
fffffd7ff6eb98b6 45 8b 52 10             mov    0x10(%r10),%r10d
fffffd7ff6eb98ba 44 8b 9c 24 1c 01 00 00 mov    0x11c(%rsp,1),%r11d
fffffd7ff6eb98c2 45 3b da                cmp    %r10d,%r11d
fffffd7ff6eb98c5 0f 83 9c 43 00 00       jae    0xfffffffff6ebdc67

The call is:

       call,static  org.apache.lucene.codecs.compressing.CompressingTermVectorsReader::readPositions
        # org.apache.lucene.codecs.compressing.CompressingTermVectorsReader::get @ bci:1588

The value on stack sp+0x18 is not 0 but 0xfffffffb:

Top of Stack: (sp=0xfffffd7ff342e130)
0xfffffd7ff342e130:   fffffd7fdff212f8 fffffd7fdff20af0
0xfffffd7ff342e140:   fffffd7fdff24cd0 fffffffffffffffb
0xfffffd7ff342e150:   0000000100000001 fffffd7fdff20af0
0xfffffd7ff342e160:   fffffd7fdff24d58 fffffd7fdff24d78

And the block fffffd7ff6eb98a6 is the top of the loop. So it is not clear now was it modified by call or during loop execution. I will continue investigate.

I spent some time building infrastructure (based on ciReplay) to replay Inlining to be able reproduce the failure in debugger. Unfortunately it did not help :(

I'm returning back to vectorization changes for 6340864. With global flag -XX:-UseSuperWord OFF but switched ON vectorization only for 2 not inlined methods which are called from CompressingTermVectorsReader.get() the crush is reproduced (7u40):

-XX:-UseSuperWord -XX:CompileCommand=option,org.apache.lucene.util.packed.BlockPackedReaderIterator::next,UseSuperWord -XX:CompileCommand=option,org.apache.lucene.codecs.compressing.CompressingTermVectorsReader::readPositions,UseSuperWord

  461 org.apache.lucene.util.packed.BlockPackedReaderIterator::next (64 bytes) (Max vector: 16)
1642 org.apache.lucene.codecs.compressing.CompressingTermVectorsReader::readPositions (292 bytes)(Max vector: 16)

When I disabled vectorization for any of these two methods the failure does not happen.
So I am back to the theory of memory stomp (which happens to be stack frame of caller method get()).

I found that crush reproduced more frequently with -XX:-UseCISCSpill so I added to my run script. Another thing is the crush is reproduced only when running standart lucene testing using ant which fork java process and communicates with it. I can't reproduce failure when run tests with standalone java process (with huge command line).

Reproduced the failure with latest jdk8 build:

   [junit4] # A fatal error has been detected by the Java Runtime Environment:
   [junit4] #
   [junit4] #  SIGSEGV (0xb) at pc=0xfffffd7ff9185ddd, pid=10052, tid=342
   [junit4] #
   [junit4] # JRE version: Java(TM) SE Runtime Environment (8.0-b108) (build 1.8.0-ea-b108)
   [junit4] # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.0-b50 mixed mode solaris-amd64 )
   [junit4] # Problematic frame:
   [junit4] # J 2123 C2 org.apache.lucene.codecs.compressing.CompressingTermVectorsReader.get(I)Lorg/apache/lucene/index/Fields; (2818 bytes) @ 0xfffffd7ff9185ddd [0xfffffd7ff9184440+0x199d]
   [junit4] #

It stop failing in hs25-b21 after 8007439 changes ("C2: adding successful message of inlining") because of 8010222 "8007439 disabled inlining of cold accessor methods" fixed in b83.
jdk8 stops failing after b80 (hs25-b21). I verified that it was Hotspot changes because jdk8-b80 + hs25-b20 fails (and I am testing reverse combination).
I looked through Hotspot changes and all Compiler changes were backported into 7u40. I will do more investigation.
I am suspecting that vectorization (which is not present in compiled method) affects inlining. I am looking on inlining decisions during compilation and will try to reproduce the problem by replaying the same inlining.
Attached hs_err_pid26503.log file and assembl_crush.txt with assembler code of compiled get() method in which crush happened.
It also pass with -XX:MaxVectorSize=8. I am still thinking that the problem somewhere in Register Allocator.

Looks like my changes for vector arithmetic 6340864. Test runs with switched off vectors: -XX:-UseSuperWord. Currently I don't know why it has such effect.

Start failing around hs24-b21:

   [junit4] #  SIGSEGV (0xb) at pc=0xfffffd7ff91d9f7d, pid=23810, tid=343
   [junit4] #
   [junit4] # JRE version: Java(TM) SE Runtime Environment (8.0-b54)
   [junit4] # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.0-b21 mixed mode solaris-amd64 )
   [junit4] # Problematic frame:
   [junit4] # J  org.apache.lucene.codecs.compressing.CompressingTermVectorsReader.get(I)Lorg/apache/lucene/index/Fields;
   [junit4] #

Note, first 7u40 build b01 has hs24-b24.

Next, I will try to find changeset.
I am now confident that it is the failing place. I was able to dump more assembler code around failing instruction and it is matching code in optoassembler output at that place.
From what I see it could Register Allocator problem, specifically post_allocate_copy_removal(). Because I don't see zero initialization (xorl) for loop's iteration variable 0x18(%rsp). That could be why it contains some strange value (-5). An other issue is GCM - why it scheduled zeroing far away from the loop so it needs to be spilled on stack?
Next I will try to find changeset which introduced regression because I was not able to reproduce the problem in debug VM so I can't debug the problem.

I reproduced "CheckIndex failed" from LUCENE-5212 report and :

   [junit4]   1> java.lang.NegativeArraySizeException
   [junit4]   1>        at org.apache.lucene.codecs.compressing.CompressingTermVectorsReader.get(CompressingTermVectorsReader.java:433)
   [junit4]   1>        at org.apache.lucene.index.SegmentReader.getTermVectors(SegmentReader.java:169)
   [junit4]   1>        at org.apache.lucene.index.CheckIndex.testTermVectors(CheckIndex.java:1524)
   [junit4]   1>        at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:611)
   [junit4]   1>        at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:223)

It was run with fastdebug VM and no asserts were triggered.

I was able to reproduce it on Solaris box but only with product VM. Looking on code pattern I believe it is sum() method inlined into get() at line 526 in CompressingTermVectorsReader.java

  private static int sum(int[] arr) {
    int sum = 0;
    for (int el : arr) {
      sum += el;
    return sum;

  public Fields get(int doc) throws IOException {

526:    assert sum(fieldLengths) == docLen : sum(fieldLengths) + " != " + docLen;

It is the only place where result of sum() is compared with other value (assembler code has such compare after loop).
The interesting thing the loop (in crush case) is not unrolled even so it is definitely counted. In optoassembler output from fastdebug VM run I see the loop is unrolled.

core files are useless since you can't open them on other systems. I will try to reproduce it myself since I have instructions now. Thank you, Igor!

I cant reproduce it on fastdebug build or w/ enabled coops.
I was able to reproduce SEGV on product build on my laptop - hs_err_pid12412.log
From Apache bug report, the instructions to run the set of tests that generated this is:

1. svn checkout http://svn.apache.org/repos/asf/lucene/dev/trunk <your repo>
2. cd <your repo>/lucene/core
3. ant test -Dtests.seed=43A1116E7F98BED4 -Dtests.jvms=1 -Dtests.dynamicAssignmentRatio=0 -Dargs="-XX:-UseCompressedOops -XX:+UseParallelGC"

I tried this on Mac 10.8.4 using 7u40-b43 and was not able to reproduce either the crash or corruption of index.
I can suggest -XX:-UseLoopPredicate to switch off predicates.

Igor, can you give me instructions how to run it? Thanks.

from hs_err:  0x68(%rsp) : 0x0000001  which matches the arrays size (see R10 description above).

Nothing wrong with loop code. The problem is arrays index which is negative -5:   0x18(%rsp) : 0xfffffffb
So it is not clear why range check did not catch it. Note, range check is moved from the loop and should be before the loop.

So it is small loop:

      jae      .+0x2e88
      xorl     %r11d,%r11d
      adress_NOP(9 bytes to align loop)

loop: // 0x00007f163d2d34d0
      movq     0x80(%rsp),%r10
      movl     0x18(%rsp),%r8d
      addl     0x18(%r10,%r8,4),%r11d
      incl     %r8d
      movl     %r8d,0x18(%rsp)
      cmpl     0x68(%rsp),%r8d
      jl       loop // .-0x1f

      cmpl     0xd8(%rsp),%r11d
      jne      .+0x32e0

Disassembler at crush site:

00007f163d2d34dd 47 03 5c 82 18          add    0x18(%r10,%r8,4),%r11d
00007f163d2d34e2 41 ff c0                inc    %r8d
00007f163d2d34e5 44 89 44 24 18          mov    %r8d,0x18(%rsp,1)
00007f163d2d34ea 44 3b 44 24 68          cmp    0x68(%rsp,1),%r8d
00007f163d2d34ef 7c df                   jl     0x000000003d2d34d0
00007f163d2d34f1 44 3b 9c 24 d8 00 00 00 cmp    0xd8(%rsp,1),%r11d
00007f163d2d34f9 0f 85 da 32 ff ff       jne    0x000000003d2c67d9

R10=0x00007f1633fc37d8 is an oop
- klass: {type array int}
- length: 1

R8 =0x00000000fffffffb is an unknown value

From the Lucent bug report.
I cant reproduce it. I have tried on 3 different hosts:
 - 'Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz'
 - 'Intel(R) Xeon(R) CPU           X5675  @ 3.07GHz'
 - 'Intel(R) Core(TM) i7 CPU       M 620  @ 2.67GHz '

'/proc/cpuinfo's are attached.

Hardware and Software, Engineered to Work Together