United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
Bug ID: JDK-6409002 Crash due to fatal error in Par_PushAndMarkClosure::do_oop()
JDK-6409002 : Crash due to fatal error in Par_PushAndMarkClosure::do_oop()

Details
Type:
Bug
Submit Date:
2006-04-05
Status:
Resolved
Updated Date:
2011-12-15
Project Name:
JDK
Resolved Date:
2006-11-20
Component:
hotspot
OS:
solaris_8,solaris_10
Sub-Component:
gc
CPU:
sparc
Priority:
P4
Resolution:
Fixed
Affected Versions:
1.4.2_09,1.4.2_11
Fixed Versions:
1.4.2_14 (b01)

Related Reports
Duplicate:
Relates:

Sub Tasks

Description
OPERATING SYSTEM(S): Solaris 5.8

FULL JDK VERSION(S): 1.4.2_11

DESCRIPTION:
The JVM is crashing frequently with CMS enabled. The following hs_err output is produced (multiple full hs_err files are available on request.):

# An unexpected error has been detected by HotSpot Virtual Machine:
#
#  Internal Error (434F4E43555252454E542D41524B335745455027454E45524154494F4E0E4350501143 01), pid=17359, tid=4
#
# Java VM: Java HotSpot(TM) Client VM (1.4.2_11-b06 mixed mode)

The crash also produced a core file. Here is the output from pstack:

ff3692f0 __sigprocmask (ff36b8f4, 0, 0, fd681d98, ff37c000, 0) + 8
ff35dd34 _sigon   (fd681d98, ff3838b0, 6, fd68177c, fd681d98, 0) + d0
ff360d90 _thrp_kill (0, 5, 6, ff37c000, 5, ff2c0458) + f8
ff24bcec raise    (6, 0, 0, ffffffff, ff2c03c4, 0) + 40
ff235984 abort    (ff2bc008, a, 0, 1, fe3470f8, 0) + 100
fe3415b0 __1cCosFabort6Fi_v_ (1, fe3e313d, 1, 7efefeff, 81010100, ff00) + 54
fe39afd8 __1cHVMErrorOreport_and_die6M_v_ (fe3f5a60, fe3f5a6f, fe3f5a7f, fe3c5f97, 1143, e0000000) + 984
fe23b370 __1cMreport_fatal6Fpkci1_v_ (fe3c5f97, 1143, fe3c5ff3, 0, 1fff, fc01ec00) + 24
fe236040 __1cWPar_PushAndMarkClosureGdo_oop6MppnHoopDesc__v_ (fd681bf4, e7161334, 1, 1fff, 11d2a0, fffe0000) + 104
fe25be4c __1cNinstanceKlassSoop_oop_iterate_nv6MpnHoopDesc_pnWPar_PushAndMarkClosure__i_ (f74699f0, e7161328, fd681bf4, fd681bc0, f4c00000, 1e72bc) + c4
fe2372c8 __1cbEPar_MarkRefsIntoAndScanClosureKtrim_queue6MI_v_ (fe40a000, 0, fd681bcc, fd681bc0, 1, 0) + 110
fe231f88 __1cQCMSParRemarkTaskEwork6Mi_v_ (fb281924, 3, 0, 3, 42ac, 0) + 27c
fe39e2d8 __1cKGangWorkerDrun6M_v_ (ca2e0, ffffffe2, fe42a6c0, ffff8000, 0, ff37c000) + b0
fe340e10 java_start (ca2e0, ff37d660, 1, 1, ff37c000, 0) + 134
ff36b11c _thread_start (ca2e0, 0, 0, 0, 0, 0) + 40

The Internal Error ID in the hs_err output translates to "concurrentMarkSweepGeneration.cpp, line 4419".

Checking the source reveals that the JVM died due a fatal condition, namely: "Task queue overflow in Par_PushAndMarkClosure".

We do not know what can cause this task queue to overflow, so we cannot produce a testcase or devise a workaround. The only workaround currently available to us is disabling CMS, which is not an acceptable solution.

Any input on what is causing the crash and how we can avoid it would be most beneficial.

                                    

Comments
WORK AROUND

Disable CMS but this is unacceptable.
                                     
2006-04-05
SUGGESTED FIX

- CR4615723 includes two putbacks - thus two webrev urls.  The second putback is the relevant one that needs to be 
   pabckported into 1.4.2 for this fix to work (i.e. cr6409002).  Here's the webrev-url:
   The http://analemma.sfbay.sun.com/net/prt-archiver.sfbay/data/archived_workspaces/main/gc_baseline/2003/20031010014636.ysr.ovflw2/workspace/webrevs/webrev-2003.10.10/
 - Note: It would be nice if the first putback be backported as well into 1.4.2, though it's not necessary for this fix to work.
                                     
2006-04-10
EVALUATION

This is a duplicate of (a part of) 4615723. See suggested fix section for
more details.
                                     
2006-04-10
SUGGESTED FIX

The link above is broken (it actually moved when PRT's archives moved).
Here is its current location:

http://analemma.sfbay.sun.com/net/prt-archiver.sfbay/data/archived_workspaces/main/gc_baseline/2003/20031010014636.ysr.ovflw2/workspace/webrevs/webrev-2003.10.10/
                                     
2006-04-10
WORK AROUND

-XX:-CMSParallelRemarkEnabled -XX:CMSMarkStackSize=64m (or some suitably large value)

[However, disabling parallel remark when using CMS on large MP machines
can adversely impact GC pause times "CMS-remark".]
                                     
2008-05-12



Hardware and Software, Engineered to Work Together