JDK-4262633 : java.lang.ref.Reference -Enqueue Race Condition - Fix may not be thread safe
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 1.2.0
  • Priority: P4
  • Status: Closed
  • Resolution: Duplicate
  • OS: windows_nt
  • CPU: x86
  • Submitted: 1999-08-16
  • Updated: 1999-08-16
  • Resolved: 1999-08-16
Related Reports
Duplicate :  
Description

Name: mf23781			Date: 08/16/99


(3)   Test Case and Failure Data:

         Description of Problem:
            This relates to Sun Bug - 4243978.  This relates to a potential race condition
            when using Weak References and the publicly accusable enqueue method.  
            
            There were a number of suggested fixes provided in the original bug report.
            Code was provided for one of the sample solutions.  Further analysis of the problem
            by teams here in IBM Hurlsey and IBM Haifa, have raised concerns of the suitability of the
            coding of the suggested fix.
            
            It is believed that additional race conditions could be exposed relating to the thread
            safety.  The most obvious solution to the problem unfortunately also is risky as there
            may be cases were deadlock could occur.
      
         Problem Analysis:
                                                   
            The problem arose when the (public) enqueue method is called, the item could be on the
            pending list. This would result in the lists being destroyed.  (A test case is available
            for the previous SubBug which demonstratest this well).  
            
            The suggested fix works by modifying the enqeue method to ensure that the pointers
            making up the release maintain their integrity.  
            
            Concern has arisen that while the pointers are being checked changes could be made to the
            lists by the Garbage Collection and Reference Handler threads.  A solution to this is to
            obtain the reference lock while these operations occur.  However this could result in a 
            deadlock situation.
            
            Full details are below - extracted from inter-IBM communications:-
            
            1) The problem: 
                The idea of removing an object from the pending list before enqueuing the object is fine. 
                However, note that in parallel to this activity, the garbage collector and the reference handler 
                may also work. Therefore, various race conditions may occur in the current solution. For example, 
                take the first statement. It checks if "this.next" is null. The answer may be true, but then the 
                thread is stopped and the garbage collector may run a collection putting this reference object 
                in the pending list. After that, the program thread resumes and executes the actual enqueuing. 
                The problem is that the if-then operation is not atomic. The same problem occurs 4 times with each 
                of the  "else" cases of the solution. 

            2) A possible solution:
                A possible solution is to obtain the Reference.Lock all through the Reference.enqueue method. This 
                would solve all race problems but will cause some undesired effects. Holding the lock for a long time 
                hinders the collector from working as well as other parallel program threads. This poses a scalability 
                problem. In addition,  we might expose the JVM to deadlocks. Note that the ReferenceQueue.enqueue 
                method synchronizes on the reference object itself. The way the proposed solution works is that enqueuing 
                an object causes obtaining the Reference.lock, then synchronzing on the reference object itself, and last 
                obtaining the referenceQueue.Lock. All through this time, a garbage collection cannot start since the 
                Reference.enqueue method holds the Reference.Lock that the collector should acquire. Now, a second program 
                thread may synchronize on the reference object and then allocate an object causing a need of garbage collection. 
                At the same time, the first program thread that performs the Reference.enqueue method has already acquired the 
                Reference.lock and is waiting to be synchronized on the reference object. In this case, the system gets into a 
                deadlock: The thread that has the lock on the actual reference object is waiting for collection, and the thread 
                that enqueues is waiting for that thread to release the lock on the reference objects, but the collection cannot 
                start since the collector waits for the enqueuing thread to release the Reference.lock. 




    
            
(4) Targeted FCS Release:
            
            Originally reported against 1.2.2.  Aim is to achieve an agreement of what form the fix is likely
            to take.  


(5) Operational/Business Justification:

       Impact of bug on affected product:
             The original situation was raised as part of a porting exercise. This aim behind the
             first report was to obtain a potential direction that a fix would take.  The porting team
             are concerned that the fix has potential problems.  
             
             A revised fix however involves more extensive changes to the JVM which require analysis.

       Timefactors and deadlines involved:
            This problem has not yet been reported by a customer - it was found by code examination.
       
(6) Suggested Fix:

        Suggested Fix:
        
            We believe that the best way around the original bug is to add a private field to reference objects 
            called pending_next, which should be used to link the pending list without any interfering with the 
            queue. The disadvantage of this solution is that it takes 4-8 more bytes of each reference object 
            (depending on the length of a pointer), but it seems better than foiling scalability (with long locked 
            paths), and exposing the system to deadlocks. 
    
                                
        Documentation of how root cause was found:
            The original problem was found as a result of code examiniation and this potential problem was found
            in a similar manner.    
        
        Alternative Fixes (advantages/disadvntages):
        
        Results of IBM Testing in application/customer environment:
            This fix has not been coded so no results are available.  Changes would have to be made as the concept
            of "status" is being modified.
        
        Regression Test Run Status/Results:
                n/a        

        JCK Test run status:
                n/a
        


(Review ID: 93963)

======================================================================

Comments
EVALUATION This is more an update on 4243978 than a new bug. I've closed it as a duplicate, and added the latest info to the evaluation of 4243978. mick.fleming@Ireland 1999-08-16
16-08-1999