United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
Bug ID: JDK-6915005 G1: Hang in PtrQueueSet::completed_buffers_list_length with gcl001
JDK-6915005 : G1: Hang in PtrQueueSet::completed_buffers_list_length with gcl001

Details
Type:
Bug
Submit Date:
2010-01-07
Status:
Closed
Updated Date:
2011-03-08
Project Name:
JDK
Resolved Date:
2011-03-08
Component:
hotspot
OS:
generic
Sub-Component:
gc
CPU:
generic
Priority:
P3
Resolution:
Fixed
Affected Versions:
7
Fixed Versions:
hs17 (b08)

Related Reports
Backport:
Backport:
Relates:

Sub Tasks

Description
During PIT testing for jdk7 b79, the test case mentioned in 6899058 failed. As a result the fixes for 6899058 and 6908215 (which used the same test case) were recorded as PIT failures with G1.

An email from Leonid Mesnik indicated that the test case mentioned in 6899058 (gcl001) had failed with the same assertion as that reported in 6899058. More frequently, however, the test case would hang.

Here is Leonid's email:

Hi John

I am trying to verify the 6899058 G1: Internal error in ptrQueue.cpp:201 in nightly tests
for hsx 17 PIT b06 jdk7b79 with runnig test gcl001 on the vm-x2270-01 (I tried other machines and solaris but situation is the same) with command:


/net/sqenfs-1.sfbay/export1/comp/vm/jdk/hsx/17/pit/b06/jdk7b79/fastdebug/linux-amd64/bin/java -d64 -server -Xmixed -XX:+PrintGCDetails -XX:-UseCompressedOops -XX:-PrintVMOptions -XX:+UnlockExperimentalVMOptions -XX:+UseG1GC -Djava.library.path=/net/sqenfs-1.sfbay/export1/comp/vm/testbase/sqe/vm/6/build/execution/vm//bin/lib/linux-amd64/nsk/stress/jni/gclocker  -cp /net/sqenfs-1.sfbay/export1/comp/vm/testbase/sqe/vm/6/build/execution/vm//bin/classes nsk.stress.jni.gclocker.gcl001


Unfortunately sometimes I get the same crash:
[GC pause (young)# To suppress the following error report, specify this argument
# after -XX: or in .hotspotrc:  SuppressErrorAt=/ptrQueue.cpp:217
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/tmp/jprt/P1/B/141120.et151817/source/src/share/vm/gc_implementation/g1/ptrQueue.cpp:217), pid=7322, tid=139670985931088
#  Error: guarantee(completed_buffers_list_length() == _n_completed_buffers,"Completed buffer length is wrong.")
#
# JRE version: 7.0-b78
# Java VM: Java HotSpot(TM) 64-Bit Server VM (17.0-b06-2009-12-23-141120.et151817.hs17b06-fastdebug mixed mode linux-amd64 )
# An error report file with more information is saved as:
# /home/lm153972/hs_err_pid7322.log
#
# If you would like to submit a bug report, please visit:
#   http://java.sun.com/webapps/bugreport/crash.jsp
#
Current thread is 139670985931088


or more often VM just "hangs" during and I am not able to get any stack info. When I use  -XX:G1PolicyVerbose=2 the last messages which it prints are:

   (3955928 KB left in heap.)
 Added [0x00007f763c800000, 0x00007f763c900000) to CS.
   (3955752 KB left in heap.)
New alloc region [0x00007f763a600000, 0x00007f763a600000, 0x00007f763a700000) for survivors:    A     F    13 space 1024K,   0% used [0x00007f763a600000, 0x00007f763a600000, 0x00007f763a700000)
New alloc region [0x00007f763a500000, 0x00007f763a500000, 0x00007f763a600000) for survivors:    A  SU F    13 space 1024K,   0% used [0x00007f763a500000, 0x00007f763a500000, 0x00007f763a600000)

after it VM just hangs.

Could you please check if I does not run it correctly or something like this? or should I file new CR for this issue?

Thanks
Leonid

                                    

Comments
EVALUATION

http://hg.openjdk.java.net/jdk7/hotspot/hotspot/rev/09646c4656ca
                                     
2010-01-17
EVALUATION

Test case fails PIT testing.
                                     
2010-01-07
SUGGESTED FIX

The code which adds a completed buffer looks like:

void PtrQueue::enqueue_known_active(void* ptr) {
  assert(0 <= _index && _index <= _sz, "Invariant.");
  assert(_index == 0 || _buf != NULL, "invariant");

  while (_index == 0) {
    handle_zero_index();
  }

  assert(_index > 0, "postcondition");
  _index -= oopSize;
  _buf[byte_index_to_index((int)_index)] = ptr;
  assert(0 <= _index && _index <= _sz, "Invariant.");
}
void PtrQueue::handle_zero_index() {
  assert(0 == _index, "Precondition.");
  // This thread records the full buffer and allocates a new one (while
  // holding the lock if there is one).
  if (_buf != NULL) {
    if (_lock) {
      locking_enqueue_completed_buffer(_buf);
    } else {
      if (qset()->process_or_enqueue_complete_buffer(_buf)) {
        // Recycle the buffer. No allocation.
        _sz = qset()->buffer_size();
        _index = _sz;
        return;
      }
    }
  }
  // Reallocate the buffer
  _buf = qset()->allocate_buffer();
  _sz = qset()->buffer_size();
  _index = _sz;
  assert(0 <= _index && _index <= _sz, "Invariant.");
}

void PtrQueue::locking_enqueue_completed_buffer(void** buf) {
  assert(_lock->owned_by_self(), "Required.");
  _lock->unlock();
  qset()->enqueue_complete_buffer(buf);
  // We must relock only because the caller will unlock, for the normal
  // case.
  _lock->lock_without_safepoint_check();
}

Since the read of _buf in handle_zero_index is done while holding _lock, then if _buf is set to NULL while holding _lock and a local pointer to the buffer is passed to locking_enqueue_completed_buffer then the thread that is holding _lock effectively "claims" the completed buffer. When another worker thread grabs _lock, it will see that _buf is NULL and install a new buffer and start writing card pointers into the buffer while holding _lock.
                                     
2010-01-07
EVALUATION

http://hg.openjdk.java.net/jdk7/hotspot-gc/hotspot/rev/09646c4656ca
                                     
2010-01-15



Hardware and Software, Engineered to Work Together