JDK-6915005 : G1: Hang in PtrQueueSet::completed_buffers_list_length with gcl001
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 7
  • Priority: P3
  • Status: Closed
  • Resolution: Fixed
  • OS: generic
  • CPU: generic
  • Submitted: 2010-01-07
  • Updated: 2013-09-18
  • Resolved: 2011-03-08
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 6 JDK 7 Other
6u21Fixed 7Fixed hs17Fixed
Related Reports
Relates :  
During PIT testing for jdk7 b79, the test case mentioned in 6899058 failed. As a result the fixes for 6899058 and 6908215 (which used the same test case) were recorded as PIT failures with G1.

An email from Leonid Mesnik indicated that the test case mentioned in 6899058 (gcl001) had failed with the same assertion as that reported in 6899058. More frequently, however, the test case would hang.

Here is Leonid's email:

Hi John

I am trying to verify the 6899058 G1: Internal error in ptrQueue.cpp:201 in nightly tests
for hsx 17 PIT b06 jdk7b79 with runnig test gcl001 on the vm-x2270-01 (I tried other machines and solaris but situation is the same) with command:

/net/sqenfs-1.sfbay/export1/comp/vm/jdk/hsx/17/pit/b06/jdk7b79/fastdebug/linux-amd64/bin/java -d64 -server -Xmixed -XX:+PrintGCDetails -XX:-UseCompressedOops -XX:-PrintVMOptions -XX:+UnlockExperimentalVMOptions -XX:+UseG1GC -Djava.library.path=/net/sqenfs-1.sfbay/export1/comp/vm/testbase/sqe/vm/6/build/execution/vm//bin/lib/linux-amd64/nsk/stress/jni/gclocker  -cp /net/sqenfs-1.sfbay/export1/comp/vm/testbase/sqe/vm/6/build/execution/vm//bin/classes nsk.stress.jni.gclocker.gcl001

Unfortunately sometimes I get the same crash:
[GC pause (young)# To suppress the following error report, specify this argument
# after -XX: or in .hotspotrc:  SuppressErrorAt=/ptrQueue.cpp:217
# A fatal error has been detected by the Java Runtime Environment:
#  Internal Error (/tmp/jprt/P1/B/141120.et151817/source/src/share/vm/gc_implementation/g1/ptrQueue.cpp:217), pid=7322, tid=139670985931088
#  Error: guarantee(completed_buffers_list_length() == _n_completed_buffers,"Completed buffer length is wrong.")
# JRE version: 7.0-b78
# Java VM: Java HotSpot(TM) 64-Bit Server VM (17.0-b06-2009-12-23-141120.et151817.hs17b06-fastdebug mixed mode linux-amd64 )
# An error report file with more information is saved as:
# /home/lm153972/hs_err_pid7322.log
# If you would like to submit a bug report, please visit:
#   http://java.sun.com/webapps/bugreport/crash.jsp
Current thread is 139670985931088

or more often VM just "hangs" during and I am not able to get any stack info. When I use  -XX:G1PolicyVerbose=2 the last messages which it prints are:

   (3955928 KB left in heap.)
 Added [0x00007f763c800000, 0x00007f763c900000) to CS.
   (3955752 KB left in heap.)
New alloc region [0x00007f763a600000, 0x00007f763a600000, 0x00007f763a700000) for survivors:    A     F    13 space 1024K,   0% used [0x00007f763a600000, 0x00007f763a600000, 0x00007f763a700000)
New alloc region [0x00007f763a500000, 0x00007f763a500000, 0x00007f763a600000) for survivors:    A  SU F    13 space 1024K,   0% used [0x00007f763a500000, 0x00007f763a500000, 0x00007f763a600000)

after it VM just hangs.

Could you please check if I does not run it correctly or something like this? or should I file new CR for this issue?


EVALUATION http://hg.openjdk.java.net/jdk7/hotspot/hotspot/rev/09646c4656ca

EVALUATION http://hg.openjdk.java.net/jdk7/hotspot-gc/hotspot/rev/09646c4656ca

SUGGESTED FIX The code which adds a completed buffer looks like: void PtrQueue::enqueue_known_active(void* ptr) { assert(0 <= _index && _index <= _sz, "Invariant."); assert(_index == 0 || _buf != NULL, "invariant"); while (_index == 0) { handle_zero_index(); } assert(_index > 0, "postcondition"); _index -= oopSize; _buf[byte_index_to_index((int)_index)] = ptr; assert(0 <= _index && _index <= _sz, "Invariant."); } void PtrQueue::handle_zero_index() { assert(0 == _index, "Precondition."); // This thread records the full buffer and allocates a new one (while // holding the lock if there is one). if (_buf != NULL) { if (_lock) { locking_enqueue_completed_buffer(_buf); } else { if (qset()->process_or_enqueue_complete_buffer(_buf)) { // Recycle the buffer. No allocation. _sz = qset()->buffer_size(); _index = _sz; return; } } } // Reallocate the buffer _buf = qset()->allocate_buffer(); _sz = qset()->buffer_size(); _index = _sz; assert(0 <= _index && _index <= _sz, "Invariant."); } void PtrQueue::locking_enqueue_completed_buffer(void** buf) { assert(_lock->owned_by_self(), "Required."); _lock->unlock(); qset()->enqueue_complete_buffer(buf); // We must relock only because the caller will unlock, for the normal // case. _lock->lock_without_safepoint_check(); } Since the read of _buf in handle_zero_index is done while holding _lock, then if _buf is set to NULL while holding _lock and a local pointer to the buffer is passed to locking_enqueue_completed_buffer then the thread that is holding _lock effectively "claims" the completed buffer. When another worker thread grabs _lock, it will see that _buf is NULL and install a new buffer and start writing card pointers into the buffer while holding _lock.

EVALUATION Test case fails PIT testing.