Bug ID: JDK-6538488 CMS: crashed in removeChunkReplaceIfNeeded

Type: Bug
Component: hotspot
Sub-Component: gc
Affected Version: 6u1,6u16

Priority: P3
Status: Closed
Resolution: Duplicate
OS: generic,solaris
CPU: generic,x86

Submitted: 2007-03-24
Updated: 2012-02-11
Resolved: 2010-01-11

JDK 7
7Resolved

I saw the following crash when I run product build jdk6U1 b05 on ATG app with CMS gc. It happened after running for about 35 hours, but I couldn't reproduce it since then.

Here is Ramki's comment regarding the crash:
some of the code related to these methods was indeed
touched somewhat recently in relation to CR 6459113, which was
recently backported to 6u1. [FWIW, the same changes are present
in 7.0, 5u10 and 1.4.2_14 as well.]

The webrev for this set of changes in 6u1 can be found here:

http://analemma.sfbay.sun.com/net/prt-archiver.sfbay/data/archived_workspaces/1.6/update1/baseline/2006/20061121150426.ysr.6u1/workspace/webrevs/webrev-2006.11.21/index

In particular look at the changes in compactibleFreeListSpace.cpp.

Testing information:
--------------------
hostname: jtg-i119.sfbay.sun.com (it is a hyperthread enabled machine)
platform:
SunOS jtg-i119 5.10 Generic_118855-36 i86pc i386 i86pc
VM option: "-server -XX:+UseConcMarkSweepGC"

log/core file location: /bt/atgrun.6807.-server
-rw-r--r--   1 root     root     7422998 Mar 15 00:03 atgserver.log
-rw-r--r--   1 root     root     205737145 Mar 16 11:02 core.8343

bash-3.00# tail atgserver.log
Unexpected Error
------------------------------------------------------------------------------
SIGSEGV (0xb) at pc=0xd0a3a777, pid=8343, tid=6

Do you want to debug the problem?

To debug, run 'dbx - 8343'; then switch to thread 6
Enter 'yes' to launch dbx automatically (PATH must include dbx)
Otherwise, press RETURN to abort...

(dbx) where
current thread: t@6
=>[1] ___nanosleep(0xcd298c58, 0xcd298c60), at 0xd0f303d5
[2] _sleep(0x64), at 0xd0f24a33
[3] os::message_box(0xd0d93f99, 0xd0e21900), at 0xd0c42ace
[4] VMError::show_message_box(0xcd298f14, 0xd0e21900, 0x7d0), at 0xd0cd2f3d
[5] VMError::report_and_die(0xcd298f14), at 0xd0cd29bf
[6] JVM_handle_solaris_signal(0xb, 0xcd299234, 0xcd299034, 0x1), at 0xd08f1704
[7] signalHandler(0xb, 0xcd299234, 0xcd299034), at 0xd08f0fce
[8] __sighndlr(0xb, 0xcd299234, 0xcd299034, 0xd08f0fa8), at 0xd0f3014f
---- called from signal handler with signal 11 (SIGSEGV) ------
[9] TreeList::removeChunkReplaceIfNeeded(0x200023, 0xc54d49c0), at 0xd0a3a777
[10] BinaryTreeDictionary::removeChunkFromTree(0x8076950, 0xc54d49c0), at 0xd0a3ac8f
[11] BinaryTreeDictionary::getChunkFromTree(0x8076950, 0x100, 0x0, 0x0), at 0xd0a3abae
[12] BinaryTreeDictionary::getChunk(0x8076950, 0x100, 0x0), at 0xd0a3c0fa
[13] CompactibleFreeListSpace::par_get_chunk_of_blocks(0x80bac10, 0x10, 0x10, 0x80c541c), at 0xd0a6ff78
[14] CFLS_LAB::alloc(0x80c4e58, 0x10), at 0xd0a6fb6a
[15] ConcurrentMarkSweepGeneration::par_promote(0x80bab08, 0x0, 0xc1e988c8, 0x21, 0x10), at 0xd0a7b24d
[16] ParNewGeneration::copy_to_survivor_space_avoiding_promotion_undo(0x8078ed8, 0xcd299ae0, 0xc1e988c8, 0x10, 0x21, 0x0), at 0xd0c4a3fb
[17] ParScanClosure::do_oop_work(0xcd299c6c, 0xc5426168, 0x1, 0x0, 0x0), at 0xd0acf52f
[18] instanceKlass::oop_oop_iterate_nv(0xc900c730, 0xc5426160, 0xcd299c6c), at 0xd0acb4f9
[19] ParScanThreadState::trim_queues(0xcd299ae0, 0x28), at 0xd0c48f0d
[20] ParScanClosure::do_oop_work(0xcd299cb4, 0xc3c61bac, 0x1, 0x1, 0x0), at 0xd0acf552
[21] ParRootScanWithBarrierTwoGensClosure::do_oop(0xcd299cb4, 0xc3c61bac), at 0xd0c4add6
[22] objArrayKlass::oop_oop_iterate_nv_m(0xc908e650, 0xc3c5f590, 0xcd2996e0, 0xcd299638), at 0xd0c3cc68
[23] FreeListSpace_DCTOC::walk_mem_region_with_cl_par(0x80787a8, 0xcd299690, 0xc3c5f590, 0xc3c61e00, 0xcd2996e0), at 0xd0a6c057
[24] FreeListSpace_DCTOC::walk_mem_region_with_cl(0x80787a8, 0xcd2996d0, 0xc3c5f590, 0xc3c61e00, 0xcd2996e0), at 0xd0a6bf1f
[25] Filtering_DCTOC::walk_mem_region(0x80787a8, 0xcd299738, 0xc3c5f590, 0xc3c61e00), at 0xd0c8206b
[26] DirtyCardToOopClosure::do_MemRegion(0x80787a8, 0xcd299778), at 0xd0c81ea9
[27] ClearNoncleanCardWrapper::do_MemRegion(0xcd2999c0, 0xcd2997e8), at 0xd0a43925
[28] CardTableModRefBS::non_clean_card_iterate_work(0x8078e78, 0xcd299868, 0xcd2999c0, 0x0), at 0xd0a42293
[29] CardTableModRefBS::process_stride(0x8078e78, 0x80bac10, 0xcd2998f8, 0x3, 0x8, 0x80787a8, 0xcd2999c0, 0x0, 0x82da598, 0xcd1ff0, 0x1c5), at 0xd0a42974
[30] CardTableModRefBS::par_non_clean_card_iterate_work(0x8078e78, 0x80bac10, 0xcd299960, 0x80787a8, 0xcd2999c0, 0x0, 0x4), at 0xd0a42390
[31] CardTableModRefBS::non_clean_card_iterate(0x8078e78, 0x80bac10, 0xcd2999b8, 0x80787a8, 0xcd2999c0, 0x0), at 0xd0a41fb7
[32] CardTableRS::younger_refs_in_space_iterate(0x8078e70, 0x80bac10, 0xcd299cb4), at 0xd0a430a5
[33] Generation::younger_refs_in_space_iterate(0x80bab08, 0x80bac10, 0xcd299cb4), at 0xd0abdc15
[34] ConcurrentMarkSweepGeneration::younger_refs_iterate(0x80bab08, 0xcd299cb4), at 0xd0a7dd91
[35] CardTableRS::younger_refs_iterate(0x8078e70, 0x80bab08, 0xcd299cb4), at 0xd0a42fa4
[36] GenCollectedHeap::gen_process_strong_roots(0x8074208, 0x0, 0x1, 0x0, 0x1, 0xcd299cb4, 0xcd299c90), at 0xd0ab7b4d
[37] ParNewGenTask::work(0xcd1329bc, 0x0), at 0xd0c49560
[38] GangWorker::loop(0x8078000), at 0xd0cd6504
[39] GangWorker::run(0x8078000), at 0xd0cd63ef
[40] java_start(0x8078000), at 0xd0c403eb
[41] _thr_setup(0xcd320c00), at 0xd0f2fd46
[42] _lwp_start(), at 0xd0f30030
In view of the recently reported crashes with this signature seen
with some frequency on 6u11 and with greater frequency on 6u16,
where the customer was using a fixed size heap, the conjecture
in the display above, linking this to the changeset for 6459113,
is incorrect and is a different issue.

Customer will be checking if:
(1) problem reproduces with 6u16 with -XX:-UseParNewGC, and
(2) problem reproduces with hs17 jvm

EVALUATION Three crashes with the same symptom have been reported by ###@###.###. All 3 crashes were on 2 cpu Intel boxes running Linux (none so far on Niagara/Solaris boxes run by the customer, but that's probably just a timing thing). Most crashes occurred within a day of the VM probably when GC load was high (conjecture). See comments section for further details. Since the crash has been seen as recently as the latest 6u16, we are reopening this bug while we check reproduciility with the latest hs17 jvm (as well as the efficacy of -XX:-UseParNewGC per current customer testing, ongoing).

04-09-2009

EVALUATION It is possible that this symptom indicates a bug that we have subsequently fixed in the product; please refer to CR 6642634.

16-06-2008

EVALUATION Examination of the GC logs indicated that the old gen occupancy was approaching the threshold at which one would expect a concurrent collection to kick off soon. In other words, it's likely that the free list cache may have been approaching exhaustion (actually that's not quite true -- that point is usually reached around the point at which the CMS-remark phase hits). [The concurrent GC cycles were occuring roughly once an hour and we were roughly 40 hours into the run.] In any case, Li-Feng reports that the bug has not been seen since that first sighting. As such, I am closing this bug as not reproducible. Should the bug reappear in later testing, please either reopen this bug or open a new bug referencing this one (as appropriate).

12-07-2007

Duplicate :	JDK-6912018 - CMS: guarantee(head() != 0,"The head of the list cannot be NULL")
Relates :	JDK-6901609 - CMS crash in GCTaskThread BinaryTreeDictionary::getChunkFromTree
Relates :	JDK-6642634 - Test nsk/regression/b6186200 crashed with SIGSEGV