JDK-6872049 : CMS: Failure in CompactibleFreeListSpace::block_size from ::block_start_unsafe()
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version:
    5.0u7,5.0u17,5.0u19,6u13,6u14,6u15,6u16,6u18,6u19,6u20 5.0u7,5.0u17,5.0u19,6u13,6u14,6u15,6u16,6u18,6u19,6u20
  • Priority: P2
  • Status: Closed
  • Resolution: Duplicate
  • OS: generic,solaris,solaris_10
  • CPU: generic,x86,sparc
  • Submitted: 2009-08-14
  • Updated: 2010-12-03
  • Resolved: 2010-08-20
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other JDK 6 Other
5.0-pool,hs18Resolved 6-poolResolved hs18Resolved
Related Reports
Duplicate :  
Duplicate :  
Duplicate :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Description
See 6840775.
After fix applied, the test case still failed:
  [7] __sighndlr(0xb, 0xffffffff768ff140, 0xffffffff768fee60, 0xffffffff7ddf6cf8, 0x0, 0xa), at 0xffffffff7edd418c 
  ---- called from signal handler with signal 11 (SIGSEGV) ------
  [8] CompactibleFreeListSpace::block_size(0x4c2a8, 0xffffffff7e72ce38, 0x6ac00, 0xffffffff7e6c2000, 0x715b64, 0x4c000), at 0xffffffff7dfac4e0 
=>[9] BlockOffsetArrayNonContigSpace::block_start_unsafe(0x10000ff6f5de398, 0xffffffff6d7e0000, 0x10045a418, 0xffffffff75500000, 0xffffffff6d7de000, 0xffffffff7e6c2000), at 0xffffffff7df4e44c 
  [10] CardTableModRefBS::process_chunk_boundaries(0x100136cf0, 0x10045a3a0, 0x10012be10, 0xffffffff6d7e0000, 0xffffffff7dfb34d8, 0x1011544a0), at 0xffffffff7e301a90 
  [11] CardTableModRefBS::par_non_clean_card_iterate_work(0x100136cf0, 0xffffffff6d7c0000, 0xffffffff7bc2511d, 0x32, 0x0, 0x3200), at 0xffffffff7e301770 
  [12] CardTableModRefBS::non_clean_card_iterate(0x100136cf0, 0x10045a3a0, 0xffffffff768ff810, 0x10012be10, 0xffffffff768ff820, 0xffffffff6b4c0000), at 0xffffffff7df57054 
  [13] CardTableRS::younger_refs_in_space_iterate(0x1001108f0, 0x10045a3a0, 0xffffffff7dfabbb0, 0xffffffff7e7055e0, 0x7698fc, 0xffffffff7e6c2000), at 0xffffffff7df5879c 
  [14] ConcurrentMarkSweepGeneration::younger_refs_iterate(0x1001187d0, 0x1001108f0, 0xffffffff6b4c0000, 0x6ea800, 0xffffffff7dfd74d8, 0x10161a880), at 0xffffffff7dfd7528 
  [15] GenCollectedHeap::gen_process_strong_roots(0x1001191d0, 0x2, 0x1, 0x0, 0x100119260, 0x10161a880), at 0xffffffff7e078b08 
  [16] ParNewGenTask::work(0xffffffff74bff370, 0x2, 0x10011be80, 0x10161a6f0, 0x0, 0x10012bfc8), at 0xffffffff7e305164 
  [17] GangWorker::loop(0x10012ba30, 0x3, 0xffffffff7e41ca90, 0x100119340, 0x1afaec, 0x2), at 0xffffffff7e41cb10 
  [18] java_start(0x10012ba30, 0x1d71, 0xffffffff7e723d14, 0xffffffff7e6c2000, 0xffffffff7e512b4f, 0xffffffff7e73bc94), at 0xffffffff7e2f2a30
Modified synopsis verbage.

Comments
EVALUATION This CR was fixed as a result of fixing 6948537, 6948538 and 6948539. Since 6948538 was, in chronological order, the last one of these three to be fixed (in hs19), this CR is being closed, nominally, as a duplicate of 6948538. Customers who run into this bug will need to get the fixes for (at least) the three bugs 694853{7,8,9}.
20-08-2010

EVALUATION Because this CR unearthed several distinct issues, we will break this into 3 bite-sized CR's and chew them separately. This CR will be deemed fixed when the three CR's below are fixed, but each will, on its own, reduce the cross-section of the occurrence of the crashes described here. The 3 CR's are:- 6948537 CMS: BOT walkers observe out-of-thin-air zeroes on sparc4v 6948538 CMS: BOT walkers can fall into object allocation and initialization cracks 6948539 CMS+UseCompressedOops: placement of cms_free bit interferes with promoted object link
29-04-2010

WORK AROUND If on a Niagara, the crash can also result from the platform-specific memset code, which uses BIS, to update the BOT. Avoiding the BIS for BOT updates also avoids that kind of crash. A big-hammer workaround is to use:- export LD_NOAUXFLTR=1 so as not to use the auxiliary filtered $PLATFORM/libc_psr.so version of memset (the downside of the workaround is that you lose other platform-specific code in libc and elsewhere as well -- CAVEAT: if your application uses native code that depends on auxiliary filters on certrain platforms for correctness, such aplications may even break when auxiliary filters are turned off above).
27-04-2010

EVALUATION On Niagara, memset uses BIS to reduce cache traffic, and is used in BOT updates. Because during CMS scavenges BOT updates are concurrent and lock-free wrt concurrent readers, such readers may end up seeing the 0'd cache-line before it has been updated. [This is a general "MT-unsafeness" of memset wrt concurrent reads which should probably be prominently documented.] This fleeting "0" can cause BOT walkers to go tumbling down the crevasse.
27-04-2010

EVALUATION Versions of this problem may also affect block_start calls made concurrent with mutator execution, a la CMS's precleaning (which uses a specialized blk_start_careful() call, IIRC) and G1's concurrent refinement (which should use a similar blk_start_careful call to avoid the pitfalls of blk_start_unsafe() which assumes that all objects being navigated are already initialized). All such code should be revisited ande carefully reexamined in view of this race.
12-11-2009

WORK AROUND Since the bug depends on concurrent ccard-scanning with promotion into CMS, which can only happen when using ParNew+CMS, either of the following constitutes a workaround:- (a) switch off CMS, OR (b) switch off ParNew of course, depending on platform, at potentially considerable loss in performance.
06-10-2009

EVALUATION It appears likely that this is the result of a (day-one) race between the copying of an object into the old generation by one worker thread on to a potentially dirty card that is being navigated by another thread's block_start() call during card-scanning. We are working on confirming the above diagnosis and on implementing a fix. MR's will be needed for porting this back to 5uXX and to 1.4.2_XX.
02-10-2009