JDK-8039042 : G1: Phantom zeros in cardtable
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: hs24,7u80,8u20,9
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2014-04-02
  • Updated: 2015-07-20
  • Resolved: 2014-05-20
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 7 JDK 8 JDK 9
7u76Fixed 8u20Fixed 9 b15Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Description
The hs_err head is:
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/tmp/jprt/P1/150850.aeriksso/s/src/share/vm/memory/cardTableModRefBS.cpp:648), pid=28378, tid=61
#  guarantee(!failures) failed: there should not have been any failures
#
# JRE version: Java(TM) SE Runtime Environment (7.0_40-b43) (build 1.7.0_40-fastdebug-b43)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (24.80-b05-internal-201403271508.aeriksso.hsx-jdk7u_hotspot-fastdebug mixed mode solaris-sparc compressed oops)
# Core dump written. Default location: /scratch/local/aurora/sandbox/results/results/latest/applications.c24.Century24Bench.executeBasicParse/core or core.28378
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.sun.com/bugreport/crash.jsp
#

---------------  T H R E A D  ---------------

Current thread (0x000000010a332800):  VMThread [stack: 0xffffffff1a800000,0xffffffff1a900000] [id=61]

Stack: [0xffffffff1a800000,0xffffffff1a900000],  sp=0xffffffff1a8fe840,  free space=1018k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x141c4ac]  void VMError::report_and_die()+0x79c
V  [libjvm.so+0x7ca768]  void report_vm_error(const char*,int,const char*,const char*)+0x78
V  [libjvm.so+0x58e300]  void CardTableModRefBS::verify_region(MemRegion,signed char,bool)+0x5b0
V  [libjvm.so+0x913fbc]  void G1SATBCardTableModRefBS::verify_g1_young_region(MemRegion)+0x28
V  [libjvm.so+0x8d035c]  void G1CollectedHeap::verify_dirty_region(HeapRegion*)+0x94
V  [libjvm.so+0x8d03ac]  void G1CollectedHeap::verify_dirty_young_list(HeapRegion*)+0x14
V  [libjvm.so+0x8c8980]  bool G1CollectedHeap::do_collection_pause_at_safepoint(double)+0x220
V  [libjvm.so+0x1451464]  void VM_G1IncCollectionPause::doit()+0x2cc
V  [libjvm.so+0x144d6d0]  void VM_Operation::evaluate()+0xe0
V  [libjvm.so+0x144985c]  void VMThread::evaluate_operation(VM_Operation*)+0x23c
V  [libjvm.so+0x144a308]  void VMThread::loop()+0x6a0
V  [libjvm.so+0x14492dc]  void VMThread::run()+0xe4
V  [libjvm.so+0x108eb10]  java_start+0x258
Comments
It should also be noted that this problem only exists in debug builds. The guarantee() that fails is inside verification code that is never executed in product builds. The actually issue (phantom zeros in cardtable) exists in all builds, but is benign. An incorrectly dirtied card will be processed by a refinement thread, which will detect that this card covers a young region and just ignore it. Given that we now understand this problem and its side-effects the ILW needs to be adjusted: I=Product builds unaffected. Crash in debug builds, causes noise in testing -> M L=Reproducible with debug builds on Sparc T4 + G1 -> M W=Don't use Sparc or G1 -> H MMH=P3
13-05-2014

Mikael Gerdin was able to reproduce the bug and we now understand the root cause and how to fix this. memset() on Sparc T4 (and later) uses some special instructions with the side-effect that the memory buffer can temporarily be filled with zeros before the actual value is set. This problem has been observed and fixed before (JDK-6948537) when using memset on the BlockOffsetTable. Here's what's happening in this case: 1) Thread A allocates a TLAB, calls G1CollectedHeap::dirty_young_block() to memset the cards for this memory area with the value 32 (which means this is a "young" region). 2) Thread B allocates a TLAB, which end up right after thread A's TLAB. The TLABs are not aligned to 512 (the size represented by a card), which means that the card for the end of TLAB A is the same as the card for the start of TLAB B. Thread B also goes on to call G1CollectedHeap::dirty_young_block() to memset the cards to 32. 3) While thread B is doing dirty_young_block() thread A writes a reference into the memory area covered by the card that is shared by the two TLABs. The write barrier first checks to see if the card is young (32), in which case we should not log this write. The card should be 32 in this case, however because of the memset() going on in thread B, thread A temporarily observes a zero here and continues to execute the barrier. The barrier then reloads the card to see if it's already dirty. It will now see 32 again (because the memset() in thread B has now takes affect) and anything non-zero will be considered as non-dirty and the barrier goes on to write a zero into the card to mark it as dirty. 4) So, in the end the barrier dirties a young card (writes zero into it), which should never happen, but because of the concurrent memset (which can temporarily set the buffer to zero) the barrier gets fooled and doesn't notice it's a young card. Suggested fix: G1SATBCardTableModRefBS::g1_mark_as_young() should use the UseMemSetInBOT flags to check if it's safe to use memset and if not fall back to do a loop to set the values. In JDK9 we might want to follow up with an additional patch to remove the UseMemsetInBOT flag completely (who would want to use that?) and instead have an internal flag with a better name (memset_is_thread_safe or something similar).
12-05-2014

Looks like a possible problem with G1 post-barrier in C2 (test is using -XX:+UseG1GC -XX:-TieredCompilation) Has happened on both 32 and 64-bit Sparc. Doesn't seem to be related to JDK-8038335, which was a recently introduced regression also on Sparc + G1. The old JDK-8027840 has also been backported to 7u so that can be ruled out. Possibly related to JDK-8014555, but that fix has been in 8 and 9 for quite some time without showing any problems. I=Crash -> H L=Seen three times on Sparc + G1 -> M W=Don't use G1 -> H HMH=P1
04-04-2014