Bug ID: JDK-8026303 CMS: JVM intermittently crashes with "FreeList of size 258 violates Conservation Principle" assert

JDK-8026303 : CMS: JVM intermittently crashes with "FreeList of size 258 violates Conservation Principle" assert

Type: Bug
Component: hotspot
Sub-Component: gc
Affected Version: hs25

Priority: P3
Status: Resolved
Resolution: Fixed
OS: generic
CPU: generic

Submitted: 2013-10-11
Updated: 2015-02-02
Resolved: 2014-08-14

Versions (Unresolved/Resolved/Fixed)

The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.

JDK 7	JDK 8	JDK 9
7u76Fixed	8u31Fixed	9 b29Fixed

Related Reports

Relates :

JDK-8026784 - Error message in AdaptiveFreeList::verify_stats is wrong

Description

JVM intermittently fails with following assert with test on Max/MinHeapFreeRatio flags:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/HUDSON/workspace/8-2-build-linux-i586/jdk8/367/hotspot/src/share/vm/gc_implementation/concurrentMarkSweep/adaptiveFreeList.cpp:162), pid=3087, tid=3077081968
#  assert((_allocation_stats.prev_sweep() + _allocation_stats.split_births() + _allocation_stats.coal_births() + 1) >= (_allocation_stats.split_deaths() + _allocation_stats.coal_deaths() + (ssize_t)count())) failed: FreeList 0x9d0bac08 of size 258 violates Conservation Principle: prev_sweep(1) + split_births(1) + coal_births(1) + 1 >=  split_deaths(2) coal_deaths(0) + count(2)
#
# JRE version: Java(TM) SE Runtime Environment (8.0-b110) (build 1.8.0-ea-fastdebug-b110)
# Java VM: Java HotSpot(TM) Server VM (25.0-b52-fastdebug compiled mode linux-x86 )
# Core dump written. Default location: /tmp/core or core.3087
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.sun.com/bugreport/crash.jsp
#

---------------  T H R E A D  ---------------

Current thread (0xb7511800):  GCTaskThread [stack: 0xb7608000,0xb7689000] [id=3091]

Stack: [0xb7608000,0xb7689000],  sp=0xb76877c0,  free space=509k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0xcbcef5]  VMError::report_and_die()+0x185
V  [libjvm.so+0x5929a8]  report_vm_error(char const*, int, char const*, char const*)+0x68
V  [libjvm.so+0x2a7175]  AdaptiveFreeList<FreeChunk>::verify_stats() const+0x95
V  [libjvm.so+0x503240]  CompactibleFreeListSpace::par_get_chunk_of_blocks(unsigned int, unsigned int, AdaptiveFreeList<FreeChunk>*)+0xa70
V  [libjvm.so+0x50356d]  CFLS_LAB::get_from_global_pool(unsigned int, AdaptiveFreeList<FreeChunk>*)+0x9d
V  [libjvm.so+0x50381c]  CFLS_LAB::alloc(unsigned int)+0x1ec
V  [libjvm.so+0x561997]  ConcurrentMarkSweepGeneration::expand_and_par_lab_allocate(CMSParGCThreadState*, unsigned int)+0x57
V  [libjvm.so+0x568b7b]  ConcurrentMarkSweepGeneration::par_promote(int, oopDesc*, markOopDesc*, unsigned int)+0x4fb
V  [libjvm.so+0xacfb40]  ParNewGeneration::copy_to_survivor_space_avoiding_promotion_undo(ParScanThreadState*, oopDesc*, unsigned int, markOopDesc*)+0x8a0
V  [libjvm.so+0x743508]  void ParScanClosure::do_oop_work<oopDesc*>(oopDesc**, bool, bool)+0x158
V  [libjvm.so+0x732fc2]  InstanceKlass::oop_oop_iterate_nv(oopDesc*, ParScanWithBarrierClosure*)+0xa2
V  [libjvm.so+0xad00da]  ParScanThreadState::trim_queues(int)+0x1aa
V  [libjvm.so+0xad01f2]  ParEvacuateFollowersClosure::do_void()+0x22
V  [libjvm.so+0xad0bee]  ParNewGenTask::work(unsigned int)+0x1ee
V  [libjvm.so+0xd00bdb]  GangWorker::loop()+0x30b
V  [libjvm.so+0xcff428]  GangWorker::run()+0x18
V  [libjvm.so+0xaa82e9]  java_start(Thread*)+0x119
C  [libpthread.so.0+0x69e9]  abort@@GLIBC_2.0+0x69e9

Failed test loading and unloading certain amount of data and verifies that old gen resized to fit into Max/MinHeapFreeSizeRatio values:
http://cr.openjdk.java.net/~kshefov/8025166/webrev.01/raw_files/new/test/gc/arguments/TestMaxMinHeapFreeRatioFlags.java

I've attached a reproducer extracted from this test.

Comments

SQE OK to have this fix in PSU15_01 if it reach jdk7u76 b05. This means we need the backport to 7u76 ASAP or would prefer to postpone the fix to 7u80.
20-10-2014
Critical Request Template - Justification : Caused test failure in Linux-Sparc testing, see JDK-8055730 - Risk Analysis : Low, fix improves book-keeping to avoid the assert - Webrev : See above - Testing (done/to-be-done) : Found during testing, run same tests to verify - Back ports (done/to-be-done) : All backports done - Fix For Release : 7 PSU
17-10-2014
Please, justify absence of regression tests. I think, "noreg-sqe" is applicable here.
09-12-2013
CompactibleFreeListSpace:: par_get_chunk_of_blocks() replenishes the free list of a given size by splitting a larger chunk. The code searched for a block that was large enough to split. If a large enough chunk was found it was removed from the dictionary and a split death was recorded. But the remainder after splitting was too small, that block was returned to the dictionary but forgot to fix the split death accounting.
21-11-2013
Release team: Approved for deferral to 8-pool
22-10-2013
According to ILW, my opinion is: Impact = Medium since assert only triggers if debug builds, the product build is not affected Likelihood = Low since I have not been able to reproduce this. I've tried both on the host where the problem occurred and another host. Workaround = High since there is no workaround for debug builds (Low for product builds as they are not affected) Medium, Low, High => P4
22-10-2013
SQE team has no objections to defer.
21-10-2013
8-defer-request justification: After reading through the code (I never manage to reproduce it), it looks like someone is adding or removing an item from the free lists without updating the statistics. These statistics are not used to make any decisions in CMS, they are only used to keep track of why we add/remove things on the free lists. This does not affect the product at all, the verification is only done in debug builds.
17-10-2013
Bengt: Thanks for your comment, great explanation. Since all values in the assertion is correct and only one is the err_msg is wrong (coal_births), the assertion could only have failed if coal_births == 0.
16-10-2013
It looks like there is a bug in the assert code. We don't log the coal_births() value instead we log the split_births() value twice. err_msg("FreeList " PTR_FORMAT " of size " SIZE_FORMAT " violates Conservation Principle: " "prev_sweep(" SIZE_FORMAT ")" " + split_births(" SIZE_FORMAT ")" " + coal_births(" SIZE_FORMAT ") + 1 >= " " split_deaths(" SIZE_FORMAT ")" " coal_deaths(" SIZE_FORMAT ")" " + count(" SSIZE_FORMAT ")", this, size(), _allocation_stats.prev_sweep(), _allocation_stats.split_births(), _allocation_stats.split_births(), _allocation_stats.split_deaths(), _allocation_stats.coal_deaths(), count())); So, my guess is that it is the coal_births() value that is wrong. Unfortunately we need to fix the error message before we can know for sure. The assert itself uses the correct values. So, the fact that it fires means that we probably have a bug in the statistics gathering too: assert((_allocation_stats.prev_sweep() + _allocation_stats.split_births() + _allocation_stats.coal_births() + 1) // Total Production Stock + 1 >= (_allocation_stats.split_deaths() + _allocation_stats.coal_deaths() + (ssize_t)count()), // Total Current Stock + depletion
15-10-2013
Interestingly, the Conservation Principle (the inequality) seems to hold from the print info; but it's still possible that it didn't hold when checking the assertion.
14-10-2013
It may take a lot of time to reproduce issue manually. With regression test VM crashes once per 50 runs and with attached reproducer it crashes once per 100 iterations.
11-10-2013