JDK-4243978 : (ref) Race condition in Reference.enqueue()
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.lang
  • Affected Version: 1.2.0,7
  • Priority: P4
  • Status: Closed
  • Resolution: Fixed
  • OS: generic,windows_95,windows_nt
  • CPU: generic,x86
  • Submitted: 1999-06-04
  • Updated: 2013-05-21
  • Resolved: 2012-05-07
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 8
8 b14Fixed
Related Reports
Duplicate :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Description
Name: mf23781			Date: 06/04/99



     Test Case and Failure Data:

         Description of Problem:
      
                The problem is that the (private) next field is used     
                for two different linked lists. The first list is the    
                "Pending" list of reference objects, which the garbage   
                collector prepares for the referenceHandler thread.      
                The second linked list is the list that implements the   
                queue of the reference objects. The user program may     
                invoke the enqueue method on an object in the Pending    
                list. If that happens, the Pending list and/or the queue 
                may be destroyed. They indeed are destroyed as the       
                following test shows.                                    
                       
                This problem was found following a code review which 
                was a part of a porting exercise.
                
        Test Case description
                
                
                The purpose of this test is to let the program enqueue   
                a reference object that is currently in the Pending      
                list. In order to do this, We create "iterations" number 
                of objects with referenced objects referecing them .     
                We then make the objects become weakly reachable so that 
                the garbage collector will put the reference objects in  
                Pending list. At the same time, the program enqueues     
                some of these reference objects (all the ones with odd   
                numbers). The test shows that most of the objects are    
                not properly queued. The number of queued object is      
                somewhat arbitrary as expected. (If we don't let the     
                program enqueue reference objects explicitly, but let    
                the collector do all the work by itself, then all the    
                reference objects are properly queued).                  
                                                         
        Test Case Source Code
           ---- test case code starts here ----
/**************************************************************
 *   This is a test program meant to expose a bug in the      *
 *   current implementation of the reference objects.         *
 *                                                            *
 *   A description of the bug:                                *
 *   -------------------------                                *
 *   The problem is that the (private) next field is used     *
 *   for two different linked lists. The first list is the    *
 *   "Pending" list of reference objects, which the garbage   *
 *   collector prepares for the referenceHandler thread.      *
 *   The second linked list is the list that implements the   *
 *   queue of the reference objects. The user program may     *
 *   invoke the enqueue method on an object in the Pending    *
 *   list. If that happens, the Pending list and/or the queue *
 *   may be destroyed. They indeed are destroyed as the       *
 *   following test shows.                                    *
 *                                                            *
 *   A descritpion of the test:                               *
 *   --------------------------                               *
 *   The purpose of this test is to let the program enqueue   *
 *   a reference object that is currently in the Pending      * 
 *   list. In order to do this, We create "iterations" number *
 *   of objects with referenced objects referecing them .     *
 *   We then make the objects become weakly reachable so that *
 *   the garbage collector will put the reference objects in  *
 *   Pending list. At the same time, the program enqueues     *
 *   some of these reference objects (all the ones with odd   *
 *   numbers). The test shows that most of the objects are    *
 *   not properly queued. The number of queued object is      *
 *   somewhat arbitrary as expected. (If we don't let the     *
 *   program enqueue reference objects explicitly, but let    *
 *   the collector do all the work by itself, then all the    *
 *   reference objects are properly queued).                  *
 *                                                            *
 ************************************************************** */


import java.lang.ref.*;
import java.lang.*;


public class ref_check {

  final static boolean debug = true;
  final static int iterations = 100;
  final static int gc_trigger = 10;
  static int[] a = new int[2 * iterations];
  // Keep all weak references alive with the following array. 
  static NumberedWeakReference[] b = new NumberedWeakReference[iterations];
  static int length;

  public static void main(String[] argv){

    // Get the runtime "object" so that we can force GC.
    Runtime RunT = Runtime.getRuntime();
    if (debug) System.out.println("Starting the test.");
    // Raise thread priority to match the referenceHandler 
    // priority, so that they can race also on a uniprocessor. 
    raisePriority();

    int i;
    ReferenceQueue Q = new ReferenceQueue();

    // Create many reference objects.
    // Each points to a unique integer object. 
    // Then, get the integers collected, and race with the 
    // collector on queuing (half of) the reference objects. 
    // The weak references are numbered, so we can later 
    // check which of them are queued. 
    Integer Obj = new Integer(0); 
    NumberedWeakReference weaky = new NumberedWeakReference(Obj,Q,0);
    for (i=1; i<iterations; i++) {
       // Create one object and kill another. 
       Obj = new Integer(i);
       // Trigger gc each gc_trigger iterations.
       if ( (i % gc_trigger) == 0 ) RunT.gc();
       // Enqueue the odd objects (from previous iteration). 
       if ( (i % 2) == 0 ) weaky.enqueue();
       // Keep previous weaky alive.
       b[i-1] = weaky;
       // Get a new weaky for the new object. 
       weaky = new NumberedWeakReference(Obj,Q,i);
    }


    //  Now do a final collection to round up. 
        RunT.gc();


    // Now, after everything has hopefully been queued, let's check 
    // that everything is indeed in the queue.
    if (debug) System.out.println("Reading the queue");
    //  Empty queue and record numbers into a[];
    i = 0;
    NumberedWeakReference weakRead = (NumberedWeakReference) Q.poll();
    while (weakRead != null) {
         a[i++] = weakRead.number;
         weakRead = (NumberedWeakReference) Q.poll();
    }
    length = i;
    if (debug) System.out.println("Number of elements in the queue = " + length);
    // Use the last object or the comipler kills it. 
    System.out.println("Sorry, I must write " + Obj + " to prevent compiler optimizations.");

    

    // Sort the first "length" elements in array "a[]".
    sort();
    
    // Check results: Are all enqueued? 
    if (debug) System.out.println("Start of final check");
    boolean fail = (length != (iterations -1));
    for (i=0; i<length; i++) {
      if (a[i] != i) {
	if (debug) System.out.println("a["+i+"] is not "+i+" but "+a[i]);
        fail = true;
      }
    }
    if (fail) {
       System.out.println("********   Test failed.   *********");
       int shouldBe = iterations - 1;
       System.out.println("Only "+length+" reference objects have been queued out of "+shouldBe+".");
       System.out.println(" The following numbers have not been queued: ");
       int missing = 0;
       int element = 0;
       for (i=0; i<length; i++) {
          while ( (a[i] != element) & (element < shouldBe) ) {
               System.out.print(element+" ");
               if (missing%20 == 19) System.out.println(" ");
               missing++; 
               element++;
          }
          element++;
       }
       System.out.print("\n");
    }
    else {
       System.out.println(" ********   Test succeeded.   *********");
    }

    if (debug) System.out.println("End of test");
    
  }

  // This bubble sorts the first "length" elements in array "a".
  public static void sort(){
    int hold, pass, i;
    if (debug) System.out.println("Sorting. Length="+length);
    for (pass = 1; pass < length; pass++) { // passes over the array
      for (i=0; i < length-pass ; i++){  //  a single pass
        if ( a[i] > a[i+1] ) {  // then swap
           hold = a[i];
           a[i] = a[i+1];
           a[i+1] = hold;
        }

      }  // End of i loop 
    } // End of pass loop
  }


  // Raise thread priority to compete with the referce handler. 
  // This is (probably) only required for a uniprocessor.
  static void raisePriority(){
        	Thread tr = Thread.currentThread();
        	tr.setPriority(Thread.MAX_PRIORITY);
  }


}   // ENd of class ref_check 

class NumberedWeakReference extends WeakReference {
  //  Add an integer to identify the weak reference object.
  int number;
  
  public NumberedWeakReference(Object referent, ReferenceQueue q, int i){
    super(referent,q);
    number = i;
  }
  
}

           ---- test case code ends here ----
        
        

        
(4) Targeted FCS Release:
     
       1.2.2? As this was found as part of a porting exercise a "description
       of how it is to be fixed" is more important at this stage than the fix.

(5) Operational/Business Justification:

       Impact of bug on affected product:
           Will affect the design of a JVM port.
           This design (and others that use concurrent GC) could be affected
           by this problem.
       
(6) Suggested Fix:

        Suggested Fix:
           The problem arises when a enqueue requested is made to the reference
           object.  As this could be in the pending state, the chain of
           pending references/reference queue could be damaged.
           
           Could a doubly linked list be implemented that could 
           potentially remove an element from the pending list
           without losing any other information.
           
           Are these lists interacted with in any other location other
           than the java.lang.ref package?
        
        Documentation of how root cause was found:
            Found as part of code review.  Test case written to expose the problem.
        
        Alternative Fixes (advantages/disadvntages):
            A large fix is to use two different queues for the pending
            and the enqueue lists.  This would prevent any difficulties
            with confusing the two lists.
        
        Results of testing in application/customer environment:
            Test case only provided.  Attempting to incorporate the
            doubly linked list
        
        Regression Test Run Status/Results:
            Regression test case provided.  This indicates the results
            of what reference were able to be recovered from the queue.
        
        JCK Test run status:
           n/a
        


(Review ID: 83933)

======================================================================

Also from the Licensee :
Solaris does also show the problem based on the test case!  It tends
towards passing, but if you reduce the heap size to 2048kb then it fails
regularly.
On NT increasing the heap size, to the maximum 65Mb.  then it will pass -
the trend though is towards failure.

I'm beginning to suspect that the test case may be mistaken in what it is
testing.  Can you confirm what is actually happening to items on the
reference queue ?

Items are added to the reference queue when there is a change to its
reachability, eg a change from strongly reference to weak referenced.  How
do items get removed from the ReferenceQueue.  The api spec seems to
suggest that the application program generally would take responsbility for
clearing the queue.
But it does suggest that this may be required for weak or soft references.
When do the refereants actually get reclaimed for each of the weak, soft,
phantom refs?

mick.fleming@Ireland 1999-06-09

Name: akC45999			Date: 10/04/99


The following JCK-13beta test is failed due to this bug:
api/java_lang/ref/ReferenceQueue/index.html#ReferenceQueue0600


======================================================================
###@###.### 10/22/04 14:02 GMT

Comments
EVALUATION http://hg.openjdk.java.net/jdk8/tl/jdk/rev/5f2838744544
02-11-2011

SUGGESTED FIX The code below shows the diffs on the Reference.Enqueue method. This code ensures that the integrity of the pending queue is kepth. public boolean enqueue() { ! //ONE POSSIBLE SUGGESTED FIX - fix the list up when user does an enqueue ! ! if (this.next == null) { // enhancement #1 ! // if this reference is in active state then ok to enqueue ! return this.queue.enqueue(this); ! ! } else if (pending==null) { // used to be second ! // if there is nothing in the list then can't search it! ! return this.queue.enqueue(this); ! ! ! else if (this == pending) { // oriignal first check ! // item is at the head of the list ! ! synchronized(lock) // as we'll be playing with the pending list get the lock ! { ! pending = this.next; ! this.next = this; ! return this.queue.enqueue(this) ! } ! } ! ! else { ! // need to search for this in the list ! synchronized(lock) // as we'll be playing with the pending list get the lock ! { ! ! boolean found = false; ! Reference current = pending.next; ! Reference last = pending; ! while (!found) { ! // check for eol both current and its next will be the same ! if (current==current.next) { ! if (current == this) { ! found= true; ! // have found the item at the end of the list ! last.next=last; // end the list properly ! return this.queue.enqueue(this); ! ! } else { ! // haven't found the item in the list ! // do as would anyway ! return this.queue.enqueue(this); ! } ! } else { // ok somewhere in the middle of the list ! if (current == this) { ! // found the item at this location ! ! last.next = current.next; // maintain list's integrity ! current.next = current; ! return this.queue.enqueue(this); ! } else { ! // keep looking ! last = current; ! current = current.next; ! } ! } ! } ! } ! } ! ! ! ! return this.queue.enqueue(this); // <--- original code } --- 189,195 ---- * it was not registered with a queue when it was created */ public boolean enqueue() { !return this.queue.enqueue(this); } The explanation of the above is attached to this form. ###@###.### 1999-07-02
11-06-2004

EVALUATION The submitted analysis is correct: It is possible for a pending reference object to be enqueued via an explicit invocation of the public enqueue method in a way that corrupts the pending-reference list. I think, however, that in practice this is rather unlikely. The test case provided by IBM only fails when the priority of the enqueuing thread is set to be the same (high) priority of the reference-handler thread. While it is possible that the test might fail without this priority adjustment, I think the probability of that happening is pretty low. A low but nonzero probability is still cause for some worry though. We cannot remove the public Reference.enqueue method due to compatibility constraints. In my opinion we should pursue the suggested solution of revising the enqueue method to safely remove a pending reference object. Given the low probability of a pending reference being explicitly enqueued I suspect we can get away with a simple linear iteration through the pending list. Changing the code to use a doubly-linked list would require nontrivial changes to both the Classic and HotSpot garbage collectors. I'd rather not go that route unless someone can demonstrate that the iteration approach turns out to be a significant performance problem. I've raised the priority of this bug to 3 so that it will be considered for the Kestrel FCS release. -- mr@eng 1999/7/13 IBM have sent an update on a problem their suggest fix : # The suggested fix works by modifying the enqeue method to ensure that the # pointers making up the release maintain their integrity. # Concern has arisen that while the pointers are being checked changes could # be made to the lists by the Garbage Collection and Reference Handler # threads. A solution to this is to obtain the reference lock while these # operations occur. However this could result in a deadlock situation. # # Full details are below - extracted from inter-IBM communications:- # # 1) The problem: # The idea of removing an object from the pending list before enqueuing # the object is fine. # However, note that in parallel to this activity, the garbage collector # and the reference handler may also work. Therefore, various race conditions # may occur in the current solution. For example, take the first statement. # It checks if "this.next" is null. The answer may be true, but then the # thread is stopped and the garbage collector may run a collection putting # this reference object in the pending list. After that, the program thread # resumes and executes the actual enqueuing. The problem is that the if-then # operation is not atomic. The same problem occurs 4 times with each # of the "else" cases of the solution. # # 2) A possible solution: # A possible solution is to obtain the Reference.Lock all through the # Reference.enqueue method. This would solve all race problems but will # cause some undesired effects. Holding the lock for a long time # hinders the collector from working as well as other parallel program # threads. This poses a scalability problem. In addition, we might expose # the JVM to deadlocks. Note that the ReferenceQueue.enqueue # method synchronizes on the reference object itself. The way the proposed # solution works is that enqueuing an object causes obtaining the # Reference.lock, then synchronzing on the reference object itself, and last # obtaining the referenceQueue.Lock. All through this time, a garbage # collection cannot start since the Reference.enqueue method holds the # Reference.Lock that the collector should acquire. Now, a second program # thread may synchronize on the reference object and then allocate an object # causing a need of garbage collection. At the same time, the first program # thread that performs the Reference.enqueue method has already acquired the # Reference.lock and is waiting to be synchronized on the reference object. # In this case, the system gets into a deadlock: The thread that has the lock # on the actual reference object is waiting for collection, and the thread # that enqueues is waiting for that thread to release the lock on the # reference objects, but the collection cannot start since the collector # waits for the enqueuing thread to release the Reference.lock. # # Suggested Fix: # We believe that the best way around the original bug is to add a private field # to reference objects called pending_next, which should be used to link the # pending list without any interfering with the queue. The disadvantage of this # solution is that it takes 4-8 more bytes of each reference object # (depending on the length of a pointer), but it seems better than foiling # scalability (with long locked paths), and exposing the system to deadlocks. mick.fleming@Ireland 1999-08-16 Further investigation has shown that the described problem may occur with more probability than previously thought. The given test case fails quite regularly when native threads, as opposed to Solaris "green" threads, are used. I agree with the assessment that changing the Reference.enqueue method to acquire the Reference.lock is undesirable for the reasons given by IBM above. Another possible solution, and one that avoids adding another field to every reference object, is for the Reference.enqueue method to detect when an object is still on the pending queue and simply wait for the reference handler to enqueue it. I've prototyped this approach and it appears to work well. This is too risky a change for Kestrel FCS at this time, however, so I'm committing this bug to Merlin. Note that while the current behavior is not as useful as one might like it is, strictly speaking, within the bounds of the specification. The specification only requires that a reference object be enqueued "at the same time or at some later time" relative to the time when the collector detects the relevant reachability change. This wording was carefully chosen so as not to place an undue burden of GC promptness upon Java VM implementations and to allow the widest possible range of GC implementations. -- mr@eng 1999/9/30 After yet further investigation, the best solution seems to be to re-use the "discovered" field of the java.lang.Reference class for the purpose of maintaining each reference-queue list. This requires a minor change in the VM (4965777). -- ###@###.### 2003/12/10
12-11-0176