JDK-6642634 : Test nsk/regression/b6186200 crashed with SIGSEGV
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 6u4p,6u7-rev,7
  • Priority: P2
  • Status: Closed
  • Resolution: Fixed
  • OS: generic,solaris,solaris_10
  • CPU: generic,x86,sparc
  • Submitted: 2007-12-17
  • Updated: 2021-10-05
  • Resolved: 2011-03-07
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other Other Other Other Other JDK 6 Other
1.4.2_17-rev,hs11Fixed 1.4.2_18-revFixed 1.4.2_19Fixed 5.0u16-revFixed 5.0u17Fixed 6u10Fixed hs11Fixed
Related Reports
Duplicate :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Test nsk/regression/b6186200 crashed with 6u4p-b02 during promotion testing on solaris-sparcv9 bits.
Results are available here
or here

This failure reproduce permanently with all current trains 6u4p/6u4/7.

Hss_err file:
# An unexpected error has been detected by Java Runtime Environment:
#  SIGSEGV (0xb) at pc=0xffffffff7dfac208, pid=28359, tid=3
# Java VM: Java HotSpot(TM) 64-Bit Server VM (11.0-b09 mixed mode solaris-sparc)
# Problematic frame:
# V  [libjvm.so+0x3ac208]
# If you would like to submit a bug report, please visit:
#   http://java.sun.com/webapps/bugreport/crash.jsp

---------------  T H R E A D  ---------------

Current thread (0x0000000100132400):  ConcurrentGCThread [stack: 0xffffffff6c600000,0xffffffff6c700000] [id=3]

siginfo:si_signo=SIGSEGV: si_errno=0, si_code=1 (SEGV_MAPERR), si_addr=0x0000000000000018

 O0=0x0000000000042000 O1=0xffffffff72218868 O2=0x0000000000000000 O3=0x0000000000000000
 O4=0x0000000000000000 O5=0x0000000000000001 O6=0xffffffff6c6fead1 O7=0xffffffff7dfac0d0
 G1=0x0000000000042d50 G2=0x0000000000000000 G3=0xffffffff7e5f5630 G4=0x0000000000000001
 G5=0xffffffff7e5ae000 G6=0x0000000000000000 G7=0xffffffff6c500000 Y=0x0000000000000000
 PC=0xffffffff7dfac208 nPC=0xffffffff7dfac20c

Top of Stack: (sp=0xffffffff6c6ff2d0)
0xffffffff6c6ff2d0:   0000000000000001 ffffffff6d100000
0xffffffff6c6ff2e0:   00000000000ecc00 000000000001d980
0xffffffff6c6ff2f0:   0000000000766000 0000000000766000
0xffffffff6c6ff300:   0000000003b30000 0000000100131c00
0xffffffff6c6ff310:   ffffffff6c6ff570 ffffffff723f0000
0xffffffff6c6ff320:   ffffffff6c6ff570 0000000000000000
0xffffffff6c6ff330:   0000000000000000 ffffffff7e5f0d50
0xffffffff6c6ff340:   ffffffff6c6feb81 ffffffff7df8245c
0xffffffff6c6ff350:   0000000000000000 0000000000298000
0xffffffff6c6ff360:   ffffffff73cc0000 0000000000000000
0xffffffff6c6ff370:   0000000100131d40 0000000100131e60
0xffffffff6c6ff380:   ffffffff7e5d9b78 ffffffff7df2ce70
0xffffffff6c6ff390:   ffffffff7e5daf38 ffffffff6c6ff418
0xffffffff6c6ff3a0:   0000000000000000 ffffffff7e5ae000
0xffffffff6c6ff3b0:   ffffffff723f0000 00000000001d7798
0xffffffff6c6ff3c0:   000000010011c170 ffffffff6c6ff570 

Instructions: (pc=0xffffffff7dfac208)
0xffffffff7dfac1f8:   00 00 00 00 00 00 00 00 9d e3 bf 50 c4 5e 60 08
0xffffffff7dfac208:   c2 00 a0 18 80 a0 60 00 14 40 00 1a 91 38 60 03 

Stack: [0xffffffff6c600000,0xffffffff6c700000],  sp=0xffffffff6c6ff2d0,  free space=1020k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x3ac208]
V  [libjvm.so+0x382464]
V  [libjvm.so+0x3a7058]
V  [libjvm.so+0x3a66ac]
V  [libjvm.so+0x39c2e4]
V  [libjvm.so+0x3b0588]
V  [libjvm.so+0x638cdc]

EVALUATION Fix putback to hotspot-gc; see comments section and suggested fix section for jprt id and comment.

SUGGESTED FIX JPRT: [sfbay] job notification - success with job 2008-02-21-190720.ysr.hg-gc JPRT Job ID: 2008-02-21-190720.ysr.hg-gc JPRT System Used: sfbay JPRT Version Used: Feb 15 2008 - Case of the Bartered Bikini [50c84a85177a] Job URL: http://javaweb.sfbay/jdk/jprt/archive/2008/02/2008-02-21-190720.ysr.hg-gc Job ARCHIVE: /net/prt-archiver.sfbay/data/jprt/archive/2008/02/2008-02-21-190720.ysr.hg-gc User: ysr Email: ###@###.### Release: jdk7 Job Source: Mercurial: /net/neeraja/export/ysr/hg-gc/{.} Parent: /net/jano2.sfbay/export2/hotspot/hg/hotspot-gc Push Parent: /net/jano2.sfbay/export2/hotspot/hg/hotspot-gc File List: {.} Command Line: jprt submit -m jprt.txt -cr 6642634 -p /net/jano2.sfbay/export2/hotspot/hg/hotspot-gc Job submitted at: Thursday February 21, 2008 11:07:22 PST Total time in queue: 1h 58m 59s Job started at: Thursday February 21, 2008 11:08:58 PST Job integrated at: Thursday February 21, 2008 13:06:03 PST Job finished at: Thursday February 21, 2008 13:06:21 PST Job run time: 1h 57m 23s Job state: success Job flags: SYNC INTEGRATE PRECIOUS Bundles: USE: jprt install 2008-02-21-190720.ysr.hg-gc HINT: Use 'jprt rerun -comment <arg> -retest 2008-02-21-190720.ysr.hg-gc' to rerun the tests for this job (you can also add tests with 'jprt rerun'). NOTE: Zip files containing exe or dll files on windows have had problems with execute permissions. You may need to 'chmod a+x' the windows exe and dll files. User Comments: 6642634: Test nsk/regression/b6186200 crashed with SIGSEGV Summary: Use correct allocation path in expand_and_allocate() so object's mark and p-bits are set as appropriate. Reviewed-by: jmasa, pbk Fixed 6642634: Test nsk/regression/b6186200 crashed with SIGSEGV This is a rather old bug and it's not clear why it started showing up recently in testing, except that some timing change may have rendered the bug more easily reproducible. With the right stress options (see below) the crash can be reproduced with older JVM's as well. When direct allocation occurs in the old generation, collected by the CMS collector, concurrent with a CMS cycle, objects must be allocated live (and P-bits used to mark the size of those objects to allow precleaning or sweeping phases to determine the sizes of objects allocated but not yet initialized). This requires the use of specialized allocation paths which were normally used. Except when the allocation failed and the generation had to be expanded to accommodate the allocation. In this case, the correct allocation path was not used, and consequently the object was not allocated live. Depending on when the allocation occurred, this could cause a crash either in a sweeping phase (because the size of an uninitialized block could not be determined) or in a later marking phase (because a reachable block had been reclaimed prematurely). A temporary workaround, as documented in the bug report, is to fix the size of the old generation. Fix Verified: yes Verification Test: nsk/regression/b6186200 with the set of stress options documented in the bug report for greater reproducibility. Without the fix the test fails withing 2-5 iterations of the test (about 2-5 minutes on the test machine). With the fix the test was run successfully for more than 48 hours. It is expected that this fix will also address another couple of very hard to reproduce heisenbugs that we have seen occasionally in nightly testing. Epilogue: the allocation paths can be further cleaned up, since they seem to have organically evolved over a period of time and collected a bunch of cruft. That will be done in a separate CR, meanwhile putting back this more local fix.

SUGGESTED FIX <deleted; obsolete; see above for diffs>

SUGGESTED FIX changeset: 6:df2fc160f817 tag: tip user: ysr date: Thu Feb 21 11:03:54 2008 -0800 summary: 6642634: Test nsk/regression/b6186200 crashed with SIGSEGV diff --git a/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp b/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp --- a/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp +++ b/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp @@ -3121,12 +3121,7 @@ ConcurrentMarkSweepGeneration::expand_an if (GCExpandToAllocateDelayMillis > 0) { os::sleep(Thread::current(), GCExpandToAllocateDelayMillis, false); } - size_t adj_word_sz = CompactibleFreeListSpace::adjustObjectSize(word_size); - if (parallel) { - return cmsSpace()->par_allocate(adj_word_sz); - } else { - return cmsSpace()->allocate(adj_word_sz); - } + return have_lock_and_allocate(word_size, tlab); } // YSR: All of this generation expansion/shrinking stuff is an exact copy of

EVALUATION This appears to be a bug in the product at least since 2002 as far as i can tell, where we are not careful to deal with direct allocation following an expansion when a CMS cycle is in progress. It is not yet clear why the bug is difficult to reproduce once we use ParNew. See the workaround section for a workaround. The "Suggested Fix" section will be updated with a fix over the next few days. A bit more archeology is in progress, but it appears as though all current versions of the JDK going all the way back to 1.4.2 would be vulnerable to this problem. Watch this space for further updates soon.

WORK AROUND Fixing the size of the old generation (or the entire heap via -Xmx == -Xms) and of the perm gen (via -XX:PermSize == -XX:MaxPermSize) appears to be a good workaround since the bug is in the expand_and_allocate() path.

SUGGESTED FIX CMSGen::expand_and_allocate() should call CMSGen::have_lock_and_allocate() instead of calling CMSSpace::allocate() as it does currently.