JDK-6409002 : Crash due to fatal error in Par_PushAndMarkClosure::do_oop()
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 1.4.2_09,1.4.2_11
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • OS: solaris_8,solaris_10
  • CPU: sparc
  • Submitted: 2006-04-05
  • Updated: 2011-12-15
  • Resolved: 2006-11-20
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
1.4.2_14 b01Fixed
Related Reports
Duplicate :  
Relates :  
Description
OPERATING SYSTEM(S): Solaris 5.8

FULL JDK VERSION(S): 1.4.2_11

DESCRIPTION:
The JVM is crashing frequently with CMS enabled. The following hs_err output is produced (multiple full hs_err files are available on request.):

# An unexpected error has been detected by HotSpot Virtual Machine:
#
#  Internal Error (434F4E43555252454E542D41524B335745455027454E45524154494F4E0E4350501143 01), pid=17359, tid=4
#
# Java VM: Java HotSpot(TM) Client VM (1.4.2_11-b06 mixed mode)

The crash also produced a core file. Here is the output from pstack:

ff3692f0 __sigprocmask (ff36b8f4, 0, 0, fd681d98, ff37c000, 0) + 8
ff35dd34 _sigon   (fd681d98, ff3838b0, 6, fd68177c, fd681d98, 0) + d0
ff360d90 _thrp_kill (0, 5, 6, ff37c000, 5, ff2c0458) + f8
ff24bcec raise    (6, 0, 0, ffffffff, ff2c03c4, 0) + 40
ff235984 abort    (ff2bc008, a, 0, 1, fe3470f8, 0) + 100
fe3415b0 __1cCosFabort6Fi_v_ (1, fe3e313d, 1, 7efefeff, 81010100, ff00) + 54
fe39afd8 __1cHVMErrorOreport_and_die6M_v_ (fe3f5a60, fe3f5a6f, fe3f5a7f, fe3c5f97, 1143, e0000000) + 984
fe23b370 __1cMreport_fatal6Fpkci1_v_ (fe3c5f97, 1143, fe3c5ff3, 0, 1fff, fc01ec00) + 24
fe236040 __1cWPar_PushAndMarkClosureGdo_oop6MppnHoopDesc__v_ (fd681bf4, e7161334, 1, 1fff, 11d2a0, fffe0000) + 104
fe25be4c __1cNinstanceKlassSoop_oop_iterate_nv6MpnHoopDesc_pnWPar_PushAndMarkClosure__i_ (f74699f0, e7161328, fd681bf4, fd681bc0, f4c00000, 1e72bc) + c4
fe2372c8 __1cbEPar_MarkRefsIntoAndScanClosureKtrim_queue6MI_v_ (fe40a000, 0, fd681bcc, fd681bc0, 1, 0) + 110
fe231f88 __1cQCMSParRemarkTaskEwork6Mi_v_ (fb281924, 3, 0, 3, 42ac, 0) + 27c
fe39e2d8 __1cKGangWorkerDrun6M_v_ (ca2e0, ffffffe2, fe42a6c0, ffff8000, 0, ff37c000) + b0
fe340e10 java_start (ca2e0, ff37d660, 1, 1, ff37c000, 0) + 134
ff36b11c _thread_start (ca2e0, 0, 0, 0, 0, 0) + 40

The Internal Error ID in the hs_err output translates to "concurrentMarkSweepGeneration.cpp, line 4419".

Checking the source reveals that the JVM died due a fatal condition, namely: "Task queue overflow in Par_PushAndMarkClosure".

We do not know what can cause this task queue to overflow, so we cannot produce a testcase or devise a workaround. The only workaround currently available to us is disabling CMS, which is not an acceptable solution.

Any input on what is causing the crash and how we can avoid it would be most beneficial.

Comments
WORK AROUND -XX:-CMSParallelRemarkEnabled -XX:CMSMarkStackSize=64m (or some suitably large value) [However, disabling parallel remark when using CMS on large MP machines can adversely impact GC pause times "CMS-remark".]
12-05-2008

SUGGESTED FIX The link above is broken (it actually moved when PRT's archives moved). Here is its current location: http://analemma.sfbay.sun.com/net/prt-archiver.sfbay/data/archived_workspaces/main/gc_baseline/2003/20031010014636.ysr.ovflw2/workspace/webrevs/webrev-2003.10.10/
10-04-2006

SUGGESTED FIX - CR4615723 includes two putbacks - thus two webrev urls. The second putback is the relevant one that needs to be pabckported into 1.4.2 for this fix to work (i.e. cr6409002). Here's the webrev-url: The http://analemma.sfbay.sun.com/net/prt-archiver.sfbay/data/archived_workspaces/main/gc_baseline/2003/20031010014636.ysr.ovflw2/workspace/webrevs/webrev-2003.10.10/ - Note: It would be nice if the first putback be backported as well into 1.4.2, though it's not necessary for this fix to work.
10-04-2006

EVALUATION This is a duplicate of (a part of) 4615723. See suggested fix section for more details.
10-04-2006

WORK AROUND Disable CMS but this is unacceptable.
05-04-2006