JDK-8026303 : CMS: JVM intermittently crashes with "FreeList of size 258 violates Conservation Principle" assert
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: hs25
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • OS: generic
  • CPU: generic
  • Submitted: 2013-10-11
  • Updated: 2015-02-02
  • Resolved: 2014-08-14
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
7u76Fixed 8u31Fixed 9 b29Fixed
Related Reports
Relates :  
JVM intermittently fails with following assert with test on Max/MinHeapFreeRatio flags:

# A fatal error has been detected by the Java Runtime Environment:
#  Internal Error (/HUDSON/workspace/8-2-build-linux-i586/jdk8/367/hotspot/src/share/vm/gc_implementation/concurrentMarkSweep/adaptiveFreeList.cpp:162), pid=3087, tid=3077081968
#  assert((_allocation_stats.prev_sweep() + _allocation_stats.split_births() + _allocation_stats.coal_births() + 1) >= (_allocation_stats.split_deaths() + _allocation_stats.coal_deaths() + (ssize_t)count())) failed: FreeList 0x9d0bac08 of size 258 violates Conservation Principle: prev_sweep(1) + split_births(1) + coal_births(1) + 1 >=  split_deaths(2) coal_deaths(0) + count(2)
# JRE version: Java(TM) SE Runtime Environment (8.0-b110) (build 1.8.0-ea-fastdebug-b110)
# Java VM: Java HotSpot(TM) Server VM (25.0-b52-fastdebug compiled mode linux-x86 )
# Core dump written. Default location: /tmp/core or core.3087
# If you would like to submit a bug report, please visit:
#   http://bugreport.sun.com/bugreport/crash.jsp

---------------  T H R E A D  ---------------

Current thread (0xb7511800):  GCTaskThread [stack: 0xb7608000,0xb7689000] [id=3091]

Stack: [0xb7608000,0xb7689000],  sp=0xb76877c0,  free space=509k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0xcbcef5]  VMError::report_and_die()+0x185
V  [libjvm.so+0x5929a8]  report_vm_error(char const*, int, char const*, char const*)+0x68
V  [libjvm.so+0x2a7175]  AdaptiveFreeList<FreeChunk>::verify_stats() const+0x95
V  [libjvm.so+0x503240]  CompactibleFreeListSpace::par_get_chunk_of_blocks(unsigned int, unsigned int, AdaptiveFreeList<FreeChunk>*)+0xa70
V  [libjvm.so+0x50356d]  CFLS_LAB::get_from_global_pool(unsigned int, AdaptiveFreeList<FreeChunk>*)+0x9d
V  [libjvm.so+0x50381c]  CFLS_LAB::alloc(unsigned int)+0x1ec
V  [libjvm.so+0x561997]  ConcurrentMarkSweepGeneration::expand_and_par_lab_allocate(CMSParGCThreadState*, unsigned int)+0x57
V  [libjvm.so+0x568b7b]  ConcurrentMarkSweepGeneration::par_promote(int, oopDesc*, markOopDesc*, unsigned int)+0x4fb
V  [libjvm.so+0xacfb40]  ParNewGeneration::copy_to_survivor_space_avoiding_promotion_undo(ParScanThreadState*, oopDesc*, unsigned int, markOopDesc*)+0x8a0
V  [libjvm.so+0x743508]  void ParScanClosure::do_oop_work<oopDesc*>(oopDesc**, bool, bool)+0x158
V  [libjvm.so+0x732fc2]  InstanceKlass::oop_oop_iterate_nv(oopDesc*, ParScanWithBarrierClosure*)+0xa2
V  [libjvm.so+0xad00da]  ParScanThreadState::trim_queues(int)+0x1aa
V  [libjvm.so+0xad01f2]  ParEvacuateFollowersClosure::do_void()+0x22
V  [libjvm.so+0xad0bee]  ParNewGenTask::work(unsigned int)+0x1ee
V  [libjvm.so+0xd00bdb]  GangWorker::loop()+0x30b
V  [libjvm.so+0xcff428]  GangWorker::run()+0x18
V  [libjvm.so+0xaa82e9]  java_start(Thread*)+0x119
C  [libpthread.so.0+0x69e9]  abort@@GLIBC_2.0+0x69e9

Failed test loading and unloading certain amount of data and verifies that old gen resized to fit into Max/MinHeapFreeSizeRatio values:

I've attached a reproducer extracted from this test.
SQE OK to have this fix in PSU15_01 if it reach jdk7u76 b05. This means we need the backport to 7u76 ASAP or would prefer to postpone the fix to 7u80.

Critical Request Template - Justification : Caused test failure in Linux-Sparc testing, see JDK-8055730 - Risk Analysis : Low, fix improves book-keeping to avoid the assert - Webrev : See above - Testing (done/to-be-done) : Found during testing, run same tests to verify - Back ports (done/to-be-done) : All backports done - Fix For Release : 7 PSU

Please, justify absence of regression tests. I think, "noreg-sqe" is applicable here.

CompactibleFreeListSpace:: par_get_chunk_of_blocks() replenishes the free list of a given size by splitting a larger chunk. The code searched for a block that was large enough to split. If a large enough chunk was found it was removed from the dictionary and a split death was recorded. But the remainder after splitting was too small, that block was returned to the dictionary but forgot to fix the split death accounting.

Release team: Approved for deferral to 8-pool

According to ILW, my opinion is: Impact = Medium since assert only triggers if debug builds, the product build is not affected Likelihood = Low since I have not been able to reproduce this. I've tried both on the host where the problem occurred and another host. Workaround = High since there is no workaround for debug builds (Low for product builds as they are not affected) Medium, Low, High => P4

SQE team has no objections to defer.

8-defer-request justification: After reading through the code (I never manage to reproduce it), it looks like someone is adding or removing an item from the free lists without updating the statistics. These statistics are not used to make any decisions in CMS, they are only used to keep track of why we add/remove things on the free lists. This does not affect the product at all, the verification is only done in debug builds.

Bengt: Thanks for your comment, great explanation. Since all values in the assertion is correct and only one is the err_msg is wrong (coal_births), the assertion could only have failed if coal_births == 0.

It looks like there is a bug in the assert code. We don't log the coal_births() value instead we log the split_births() value twice. err_msg("FreeList " PTR_FORMAT " of size " SIZE_FORMAT " violates Conservation Principle: " "prev_sweep(" SIZE_FORMAT ")" " + split_births(" SIZE_FORMAT ")" " + coal_births(" SIZE_FORMAT ") + 1 >= " " split_deaths(" SIZE_FORMAT ")" " coal_deaths(" SIZE_FORMAT ")" " + count(" SSIZE_FORMAT ")", this, size(), _allocation_stats.prev_sweep(), _allocation_stats.split_births(), _allocation_stats.split_births(), _allocation_stats.split_deaths(), _allocation_stats.coal_deaths(), count())); So, my guess is that it is the coal_births() value that is wrong. Unfortunately we need to fix the error message before we can know for sure. The assert itself uses the correct values. So, the fact that it fires means that we probably have a bug in the statistics gathering too: assert((_allocation_stats.prev_sweep() + _allocation_stats.split_births() + _allocation_stats.coal_births() + 1) // Total Production Stock + 1 >= (_allocation_stats.split_deaths() + _allocation_stats.coal_deaths() + (ssize_t)count()), // Total Current Stock + depletion

Interestingly, the Conservation Principle (the inequality) seems to hold from the print info; but it's still possible that it didn't hold when checking the assertion.

It may take a lot of time to reproduce issue manually. With regression test VM crashes once per 50 runs and with attached reproducer it crashes once per 100 iterations.