JDK-6379795 : memory fences missing in concurrent mark sweep GC code
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 1.4.2,1.4.2_12
  • Priority: P3
  • Status: Closed
  • Resolution: Duplicate
  • OS: generic
  • CPU: generic
  • Submitted: 2006-02-01
  • Updated: 2013-01-26
  • Resolved: 2010-08-04
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 7
7Resolved
Related Reports
Duplicate :  
Relates :  
Relates :  
Description
Several memory fences are needed in the concurrent mark sweep GC source code. This has caused intermittent crashes that's very hard to trace.

1. In ConcurrentMarkSweepGeneration::par_promote():

First in a call to alloc(), a free block is marked as not free by storing 0 to _prev field, which corresponds to the _klass field in an oop header. Next copy_words_aligned is called to fill in the other fields of the object. Last _klass field is set to the correct value. These three steps have to become visible exactly in this order.
But there is no code to force this ordering on a weakly ordered machine.
The fix is to add a call to membar() after the call to alloc(), then make the last store to _klass a volatile (release) store.

2. Corresponding to 1, in CompactibleFreeListSpace::block_is_obj():
The read from _klass needs to be changed to a volatile (acquire) load to make sure the last store to _klass in ConcurrentMarkSweepGeneration::par_promote() is observed only after all other stores to the promoted object.

3. The following comment

    // Update BOT last so that other (parallel) GC threads see a consistent
    // view of the BOT and free blocks.
    // Above must occur before BOT is updated below.

is seen in CompactibleFreeListSpace::getChunkFromLinearAllocBlock(), CompactibleFreeListSpace::getChunkFromLinearAllocBlockRemainder(), CompactibleFreeListSpace::splitChunkAndReturnRemainder(), and CompactibleFreeListSpace::par_get_chunk_of_blocks(). But there is no code that forces the order on a weakly ordered machine. A membar is needed for each instance.
more missing fences

inline bool OopTaskQueue::push(Task t) {
  ...
  _elems[localBot] = (TskET*) t;
  Atomic::write_barrier();  // MISSING BARRIER HERE <-- NOT IN SUN CODE????
  _bottom = increment_index(localBot);|
  ...
|}



template<class E> class TaskQueue : public GenericTaskQueueSuper {
  // Slow paths for push, pop_local.  (pop_global has no fast path.)
  bool push_slow(E t, juint dirty_n_elems) {
    if (dirty_n_elems == n() - 1) {
      // Actually means 0, so do the push.
      juint localBot = _bottom;
      _elems[localBot] = (TskET*) t;
      Atomic::write_barrier();  // Yet Another Missing Barrier in SUN
CODE????
      _bottom = increment_index(localBot);
      return true;
    } else
      return false;
  }


bool OopTaskQueue::pop_global(Task& t) {
  Age newAge;
  Age oldAge = get_age();
  Atomic::membar();  // Yet Another Missing Barrier in SUN CODE???
  juint localBot = _bottom;
  juint n_elems = size(localBot, oldAge.top());
  ...



bool GenericTaskQueueSuper::
pop_local_slow(juint localBot, Age oldAge) {
  ...
  set_age(newAge);
  Atomic::membar();  // Yet Another Missing Barrier in SUN CODE???
  assert(dirty_size(localBot, get_top()) != n() - 1,
     "Shouldn't be possible...");
  ...

Comments
EVALUATION The first two of the TaskQueue barriers mentioned in the description (the one in push() and the one in push_slow()) were added when 6888847 was fixed.
29-05-2010

EVALUATION This should be fixed; see comments section.
13-02-2006