United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
Bug ID: JDK-6980838 G1: guarantee(false) failed: thread has an unexpected active value in its SATB queue
JDK-6980838 : G1: guarantee(false) failed: thread has an unexpected active value in its SATB queue

Details
Type:
Bug
Submit Date:
2010-08-30
Status:
Closed
Updated Date:
2011-03-08
Project Name:
JDK
Resolved Date:
2011-03-08
Component:
hotspot
OS:
generic
Sub-Component:
gc
CPU:
generic
Priority:
P3
Resolution:
Fixed
Affected Versions:
hs19
Fixed Versions:
hs20 (b02)

Related Reports
Backport:
Backport:
Relates:

Sub Tasks

Description
Application crashes on solaris-i586 with G1 and -XX:+AggressiveOpts (JDK 7 b106/HS 19/b05):

#  Internal Error (/BUILD_AREA/jdk7/hotspot/src/share/vm/gc_implementation/g1/satbQueue.cpp:133), pid=11573, tid=11
#  guarantee(false) failed: thread has an unexpected active value in its SATB queue

Stack trace (full hs_err is attached):

Stack: [0xa717f000,0xa71ff000],  sp=0xa71fe1a0,  free space=1fca71ff000k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x1662b7f];;  void VMError::report(outputStream*)+0x70b
V  [libjvm.so+0x1663d06];;  void VMError::report_and_die()+0x582
V  [libjvm.so+0x7bb5f9];;  void report_vm_error(const char*,int,const char*,const char*)+0x5ad
V  [libjvm.so+0x13ce6f8];;  void SATBMarkQueueSet::set_active_all_threads(bool,bool)+0x204
V  [libjvm.so+0x6fc473];;  void ConcurrentMark::abort()+0x193
V  [libjvm.so+0x88e08b];;  void G1CollectedHeap::do_collection(bool,bool,unsigned)+0x41f
V  [libjvm.so+0x88eaf7];;  void G1CollectedHeap::do_full_collection(bool)+0x17
V  [libjvm.so+0x16a3bc0];;  void VM_G1CollectFull::doit()+0x9c
V  [libjvm.so+0x16a16c9];;  void VM_Operation::evaluate()+0xe1
V  [libjvm.so+0x169f3e1];;  void VMThread::evaluate_operation(VM_Operation*)+0x145
V  [libjvm.so+0x169fb9d];;  void VMThread::loop()+0x5c5
V  [libjvm.so+0x169efc1];;  void VMThread::run()+0xc1
V  [libjvm.so+0x1291ea4];;  java_start+0x708
C  [libc.so.1+0xbc203]  _thrp_setup+0x9b;;  _thrp_setup+0x9b
C  [libc.so.1+0xbc490]  _lwp_start+0x0;;  _lwp_start+0x0

VM_Operation (0xa665e4a0): G1CollectFull, mode: safepoint, requested by thread 0x09884800

                                    

Comments
EVALUATION

In G1, each Java thread has a local flag telling it whether its SATB buffer is activate or not (it is set to true during concurrent marking and false outside concurrent marking). This failure is telling us that there is at least one thread that has an "active" value that is inconsistent with the global active value. Thankfully, the code that does the consistency check dumps more information in the GC log when such a failure is detected:

SATB queue active values for Java Threads
 SATB queue set: active is FALSE
 expected_active is FALSE
  thread DestroyJavaVM, active is TRUE
  thread Thread-1, active is FALSE
  thread Thread-0, active is FALSE
  thread Low Memory Detector, active is FALSE
  thread CompilerThread0, active is FALSE
  thread Signal Dispatcher, active is FALSE
  thread Surrogate Locker Thread (CMS), active is FALSE
  thread Finalizer, active is FALSE
  thread Reference Handler, active is FALSE

So, it looks as if one thread does have a different active value than the rest. However, said thread is the DestroyJavaVM thread. It's unclear from the log whether the problem only happens for the DesotroyJavaVM thread (in which case this failure would be a non-issue for a product build) or for any Java thread (in which case the issue might be more serious). For completeness here's the stack trace:

Stack: [0xa717f000,0xa71ff000],  sp=0xa71fe1a0,  free space=1fca71ff000k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x1662b7f];;  __1cHVMErrorGreport6MpnMoutputStream__v_+0x70b
V  [libjvm.so+0x1663d06];;  __1cHVMErrorOreport_and_die6M_v_+0x582
V  [libjvm.so+0x7bb5f9];;  __1cPreport_vm_error6Fpkci11_v_+0x5ad
V  [libjvm.so+0x13ce6f8];;  __1cQSATBMarkQdDueueSetWset_active_all_threads6Mbb_v_+0x204
V  [libjvm.so+0x6fc473];;  __1cOConcurrentMarkFabort6M_v_+0x193
V  [libjvm.so+0x88e08b];;  __1cPG1CollectedHeapNdo_collection6MbbI_v_+0x41f
V  [libjvm.so+0x88eaf7];;  __1cPG1CollectedHeapSdo_full_collection6Mb_v_+0x17
V  [libjvm.so+0x16a3bc0];;  __1cQVM_G1CollectFullEdoit6M_v_+0x9c
V  [libjvm.so+0x16a16c9];;  __1cMVM_OperationIevaluate6M_v_+0xe1
V  [libjvm.so+0x169f3e1];;  __1cIVMThreadSevaluate_operation6MpnMVM_Operation__v_+0x145
V  [libjvm.so+0x169fb9d];;  __1cIVMThreadEloop6M_v_+0x5c5
V  [libjvm.so+0x169efc1];;  __1cIVMThreadDrun6M_v_+0xc1
V  [libjvm.so+0x1291ea4];;  java_start+0x708
C  [libc.so.1+0xbc203]  _thrp_setup+0x9b;;  _thrp_setup+0x9b
C  [libc.so.1+0xbc490]  _lwp_start+0x0;;  _lwp_start+0x0

So, we are about to do a Full GC and we abort the concurrent marking phase which is in progress. In this case, the call to set_active_all_threads() looks like this (from ConcurrentMark::abort()):

  satb_mq_set.set_active_all_threads(
                                 false, /* new active value */
                                 satb_mq_set.is_active() /* expected_active */);

So, we are de-activating the active flags just before the Full GC, but most likely the expected active value (read from satb_mq_set.is_active()) is also false (i.e., the buffers are already inactive. But we don't check in this case and go ahead and de-active them anyway).
                                     
2010-08-30
EVALUATION

This issue is related to the fix for 6935821: G1: threads created during marking do not active their SATB queues. It turns out the fix was not covering a subtle case.

When threads get attached to the JVM, a safepoint might happen between the JavaThread constructor being called and the thread being added to the Java thread list. If that happens, then we might have a safepoint that activates / deactivates the active values of the SATB buffers without the new thread that is being attached being on the Java thread list. This can result in its active value being inconsistent compared to that of all the other threads. Here's an example.

- new JavaThread is created with SATB active value being false (no marking cycle is taking place)
- a safepoint happens and the SATB active values are set to true for all threads as a marking cycle is starting
- the newly-created JavaThread is then added to the Java thread list with an inconsistent active value compared to all the other threads
- we will catch this when we try to set the SATB active values back to false when the marking cycle is complete

The DestroyJavaVM thread is an example of such a thread. Whether its SATB queue is activated or not it does not matter (it would not do any updates). So, minus the assert failure, the issue is benign for that thread. However, the same problem could arise for native threads being attached to the JVM (through, say, JNI), for which the issue would not be benign. (Thanks to Ramki for this observation.)
                                     
2010-09-30
SUGGESTED FIX

The first attempt was to introduce a flag to the PtrQueue class to force specific instances of its subblcasses to be ignored from being activated / de-activated. The DestroyJavaVM thread doesn't really need the buffers so that mechanism would have caused its queues never to be activated and hence the problem will disappear for that thread. This would solve this specific failure, but not the fact that other threads that are attached to the JVM might hit the same issue.

The real fix is to not set the active field of the SATB buffer when the JavaThread object is created. Instead, we set it in the Threads::add() method just before the thread is added to the Java thread list. That method holds the Threads_lock which ensures that no safepoint is in progress, so we can safely read the active field of the SATB queue set. (Thanks to Igor for suggesting it.)
                                     
2010-09-30
EVALUATION

I wrote up a test (AttachDetachTest) that launches several native threads which attach / detach to the JVM while a Java thread is generating concurrent marking cycles (with System.gc() and +ExplicitGCInvokesConcurrent). This test also hits the assert which proves that, indeed, the thread attaching operation was also prone to the same issue. The fix allows the test to run with no failures.

I've given the test to SQE for integration into the VM testbase (see CR 6989659).
                                     
2010-10-05
EVALUATION

http://hg.openjdk.java.net/jdk7/hotspot-gc/hotspot/rev/6e0aac35bfa9
                                     
2010-10-06
EVALUATION

http://hg.openjdk.java.net/jdk7/hotspot/hotspot/rev/6e0aac35bfa9
                                     
2010-10-09



Hardware and Software, Engineered to Work Together