JDK-8170409 : CMS: Crash in CardTableModRefBSForCTRS::process_chunk_boundaries
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 9
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2016-11-28
  • Updated: 2017-11-29
  • Resolved: 2016-12-05
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 8 JDK 9
8u152Fixed 9 b150Fixed
Description
gunter.haug@sap.com reported the following crash:

Native frames: (J=compiled Java code, j=interpreted, V=VM code (C/C++), v=VM code (generated), C=native code)
V  [libjvm.so+0xffffffff]  CardTableModRefBS::process_chunk_boundaries(Space*,DirtyCardToOopClosure*,MemRegion,MemRegion,signed char**,unsigned long,unsigned long)+0xd4 (sp=0x000000011089eb40) (pc=0x0900000293b83154)
V  [libjvm.so+0xffffffff]  CardTableModRefBS::process_stride(Space*,MemRegion,int,int,OopsInGenClosure*,CardTableRS*,signed char**,unsigned long,unsigned long)+0x1a0 (sp=0x000000011089ec30) (pc=0x0900000293b82f20)
V  [libjvm.so+0xffffffff]  CardTableModRefBS::non_clean_card_iterate_parallel_work(Space*,MemRegion,OopsInGenClosure*,CardTableRS*,int)+0xe4 (sp=0x000000011089ed70) (pc=0x0900000293b827e4)
V  [libjvm.so+0xffffffff]  CardTableModRefBS::non_clean_card_iterate_possibly_parallel(Space*,MemRegion,OopsInGenClosure*,CardTableRS*)+0x54 (sp=0x000000011089ee60) (pc=0x0900000293b80db4)
V  [libjvm.so+0xffffffff]  CardTableRS::younger_refs_in_space_iterate(Space*,OopsInGenClosure*)+0x80 (sp=0x000000011089ef20) (pc=0x0900000293b892e0)
V  [libjvm.so+0xffffffff]  Generation::younger_refs_in_space_iterate(Space*,OopsInGenClosure*)+0x3c (sp=0x000000011089efc0) (pc=0x0900000293a306bc)
V  [libjvm.so+0xffffffff]  ConcurrentMarkSweepGeneration::younger_refs_iterate(OopsInGenClosure*)+0x4c (sp=0x000000011089f030) (pc=0x09000002933c956c)
V  [libjvm.so+0xffffffff]  CardTableRS::younger_refs_iterate(Generation*,OopsInGenClosure*)+0x4c (sp=0x000000011089f0b0) (pc=0x0900000293b88d2c)
V  [libjvm.so+0xffffffff]  GenCollectedHeap::gen_process_roots(int,bool,bool,SharedHeap::ScanningOption,bool,OopsInGenClosure*,OopsInGenClosure*,CLDClosure*)+0x19c (sp=0x000000011089f120) (pc=0x090000029341795c)
V  [libjvm.so+0xffffffff]  ParNewGenTask::work(unsigned int)+0x1c8 (sp=0x000000011089f230) (pc=0x0900000293a3a6c8)
V  [libjvm.so+0xffffffff]  GangWorker::loop()+0x164 (sp=0x000000011089f3e0) (pc=0x0900000293b850c4)
V  [libjvm.so+0xffffffff]  GangWorker::run()+0x58 (sp=0x000000011089f4c0) (pc=0x0900000293b84eb8)
V  [libjvm.so+0xffffffff]  java_start(Thread*)+0x1b8 (sp=0x000000011089f540) (pc=0x0900000292fd2f38)
C  [libpthread.a+0xffffffff]  _pthread_body+0xec (sp=0x000000011089f790) (pc=0x0900000000520fec)

This crash occurs from time to time since several years but only on non TSO platforms.
- It only happens in opt builds.
- Analysis of the assembly code revealed the actual crash site to be an array store to a pointer which is an argument to process_chunk_boundaries
- The pointer is actually calculated in CardTableModRefBS::get_LNC_array_for_space
- CardTableModRefBS::get_LNC_array_for_space doesn't enforce TSO on  _last_LNC_resizing_collection[i] so the pointer to an uninitialized structure could become visible to other threads.

Solution:

Use OrderAccess::load_acquire and OrderAccess::release_store for accessing _last_LNC_resizing_collection[i] 
Comments
Added a 'noreg-hard' label because it is very hard to write a regression test for this issue. First of all it only impacts weak memory model architectrues like PowerPC or Itanium, second it happens very seldom or only with huge data-sets (e.g. Hadoop with a terabyte of data: http://mail.openjdk.java.net/pipermail/hotspot-dev/2016-December/025483.html)
13-12-2016