United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
Bug ID: JDK-6642634 Test nsk/regression/b6186200 crashed with SIGSEGV
JDK-6642634 : Test nsk/regression/b6186200 crashed with SIGSEGV

Details
Type:
Bug
Submit Date:
2007-12-17
Status:
Closed
Updated Date:
2011-03-07
Project Name:
JDK
Resolved Date:
2011-03-07
Component:
hotspot
OS:
solaris,generic,solaris_10
Sub-Component:
gc
CPU:
x86,sparc,generic
Priority:
P2
Resolution:
Fixed
Affected Versions:
6u4p,6u7-rev,7
Fixed Versions:
hs12 (b02)

Related Reports
Backport:
Backport:
Backport:
Backport:
Backport:
Backport:
Backport:
Backport:
Duplicate:
Relates:
Relates:
Relates:
Relates:
Relates:
Relates:

Sub Tasks

Description
Test nsk/regression/b6186200 crashed with 6u4p-b02 during promotion testing on solaris-sparcv9 bits.
Results are available here
http://sqeweb.sfbay/nfs/tools/gtee/results/JDK_PERFORMANCE/PROMOTION/VM/6u4p/b02/CMS/2007-12-14/vm/64BITSOLSPARC5.10/server/mixed/vm-64BITSOLSPARC5.10_server_mixed_nsk.regression.testlist2007-12-14-12-52-21/analysis.html
or here
/net/sqenfs-1.sfbay/export1/tools/gtee/results/JDK_PERFORMANCE/PROMOTION/VM/6u4p/b02/CMS/2007-12-14/vm/64BITSOLSPARC5.10/server/mixed/vm-64BITSOLSPARC5.10_server_mixed_nsk.regression.testlist2007-12-14-12-52-21

This failure reproduce permanently with all current trains 6u4p/6u4/7.

Hss_err file:
#
# An unexpected error has been detected by Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0xffffffff7dfac208, pid=28359, tid=3
#
# Java VM: Java HotSpot(TM) 64-Bit Server VM (11.0-b09 mixed mode solaris-sparc)
# Problematic frame:
# V  [libjvm.so+0x3ac208]
#
# If you would like to submit a bug report, please visit:
#   http://java.sun.com/webapps/bugreport/crash.jsp
#

---------------  T H R E A D  ---------------

Current thread (0x0000000100132400):  ConcurrentGCThread [stack: 0xffffffff6c600000,0xffffffff6c700000] [id=3]

siginfo:si_signo=SIGSEGV: si_errno=0, si_code=1 (SEGV_MAPERR), si_addr=0x0000000000000018

Registers:
 O0=0x0000000000042000 O1=0xffffffff72218868 O2=0x0000000000000000 O3=0x0000000000000000
 O4=0x0000000000000000 O5=0x0000000000000001 O6=0xffffffff6c6fead1 O7=0xffffffff7dfac0d0
 G1=0x0000000000042d50 G2=0x0000000000000000 G3=0xffffffff7e5f5630 G4=0x0000000000000001
 G5=0xffffffff7e5ae000 G6=0x0000000000000000 G7=0xffffffff6c500000 Y=0x0000000000000000
 PC=0xffffffff7dfac208 nPC=0xffffffff7dfac20c


Top of Stack: (sp=0xffffffff6c6ff2d0)
0xffffffff6c6ff2d0:   0000000000000001 ffffffff6d100000
0xffffffff6c6ff2e0:   00000000000ecc00 000000000001d980
0xffffffff6c6ff2f0:   0000000000766000 0000000000766000
0xffffffff6c6ff300:   0000000003b30000 0000000100131c00
0xffffffff6c6ff310:   ffffffff6c6ff570 ffffffff723f0000
0xffffffff6c6ff320:   ffffffff6c6ff570 0000000000000000
0xffffffff6c6ff330:   0000000000000000 ffffffff7e5f0d50
0xffffffff6c6ff340:   ffffffff6c6feb81 ffffffff7df8245c
0xffffffff6c6ff350:   0000000000000000 0000000000298000
0xffffffff6c6ff360:   ffffffff73cc0000 0000000000000000
0xffffffff6c6ff370:   0000000100131d40 0000000100131e60
0xffffffff6c6ff380:   ffffffff7e5d9b78 ffffffff7df2ce70
0xffffffff6c6ff390:   ffffffff7e5daf38 ffffffff6c6ff418
0xffffffff6c6ff3a0:   0000000000000000 ffffffff7e5ae000
0xffffffff6c6ff3b0:   ffffffff723f0000 00000000001d7798
0xffffffff6c6ff3c0:   000000010011c170 ffffffff6c6ff570 

Instructions: (pc=0xffffffff7dfac208)
0xffffffff7dfac1f8:   00 00 00 00 00 00 00 00 9d e3 bf 50 c4 5e 60 08
0xffffffff7dfac208:   c2 00 a0 18 80 a0 60 00 14 40 00 1a 91 38 60 03 

Stack: [0xffffffff6c600000,0xffffffff6c700000],  sp=0xffffffff6c6ff2d0,  free space=1020k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x3ac208]
V  [libjvm.so+0x382464]
V  [libjvm.so+0x3a7058]
V  [libjvm.so+0x3a66ac]
V  [libjvm.so+0x39c2e4]
V  [libjvm.so+0x3b0588]
V  [libjvm.so+0x638cdc]
...

                                    

Comments
EVALUATION

This appears to be a bug in the product at least since 2002 as far
as i can tell, where we are not careful to deal with direct allocation
following an expansion when a CMS cycle is in progress. It is not yet
clear why the bug is difficult to reproduce once we use ParNew.
See the workaround section for a workaround. The "Suggested Fix"
section will be updated with a fix over the next few days.
A bit more archeology is in progress, but it appears as though
all current versions of the JDK going all the way back to 1.4.2
would be vulnerable to this problem.

Watch this space for further updates soon.
                                     
2008-01-02
SUGGESTED FIX

CMSGen::expand_and_allocate() should call CMSGen::have_lock_and_allocate()
instead of calling CMSSpace::allocate() as it does currently.
                                     
2008-01-02
WORK AROUND

Fixing the size of the old generation (or the entire heap via
-Xmx == -Xms)  and of the perm gen (via -XX:PermSize == -XX:MaxPermSize)
appears to be a good workaround since the bug is in the expand_and_allocate() path.
                                     
2008-01-02
SUGGESTED FIX

changeset:   6:df2fc160f817
tag:         tip
user:        ysr
date:        Thu Feb 21 11:03:54 2008 -0800
summary:     6642634: Test nsk/regression/b6186200 crashed with SIGSEGV

diff --git a/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp b/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp
--- a/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp
+++ b/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp
@@ -3121,12 +3121,7 @@ ConcurrentMarkSweepGeneration::expand_an
   if (GCExpandToAllocateDelayMillis > 0) {
     os::sleep(Thread::current(), GCExpandToAllocateDelayMillis, false);
   }
-  size_t adj_word_sz = CompactibleFreeListSpace::adjustObjectSize(word_size);
-  if (parallel) {
-    return cmsSpace()->par_allocate(adj_word_sz);
-  } else {
-    return cmsSpace()->allocate(adj_word_sz);
-  }
+  return have_lock_and_allocate(word_size, tlab);
 }
 
 // YSR: All of this generation expansion/shrinking stuff is an exact copy of
                                     
2008-01-03
SUGGESTED FIX

<deleted; obsolete; see above for diffs>
                                     
2008-01-08
EVALUATION

Fix putback to hotspot-gc; see comments section and suggested fix section
for jprt id and comment.
                                     
2008-02-21
SUGGESTED FIX

JPRT: [sfbay] job notification - success with job 2008-02-21-190720.ysr.hg-gc



JPRT Job ID:            2008-02-21-190720.ysr.hg-gc
JPRT System Used:       sfbay
JPRT Version Used:      Feb 15 2008 - Case of the Bartered Bikini
  [50c84a85177a]
Job URL:
  http://javaweb.sfbay/jdk/jprt/archive/2008/02/2008-02-21-190720.ysr.hg-gc
Job ARCHIVE:
  /net/prt-archiver.sfbay/data/jprt/archive/2008/02/2008-02-21-190720.ysr.hg-gc
User:                   ysr
Email:                  ###@###.###
Release:                jdk7
Job Source:             Mercurial: /net/neeraja/export/ysr/hg-gc/{.}
Parent:                 /net/jano2.sfbay/export2/hotspot/hg/hotspot-gc
Push Parent:            /net/jano2.sfbay/export2/hotspot/hg/hotspot-gc
File List:              {.}
Command Line:           jprt submit -m jprt.txt -cr 6642634 -p
  /net/jano2.sfbay/export2/hotspot/hg/hotspot-gc
Job submitted at:       Thursday February 21, 2008 11:07:22 PST
Total time in queue:    1h 58m 59s
Job started at:         Thursday February 21, 2008 11:08:58 PST
Job integrated at:      Thursday February 21, 2008 13:06:03 PST
Job finished at:        Thursday February 21, 2008 13:06:21 PST
Job run time:           1h 57m 23s
Job state:              success
Job flags:              SYNC INTEGRATE PRECIOUS
Bundles:                USE: jprt install 2008-02-21-190720.ysr.hg-gc

HINT: Use 'jprt rerun -comment <arg> -retest 2008-02-21-190720.ysr.hg-gc' to
rerun the tests for this job (you can also add tests with 'jprt
rerun').
NOTE: Zip files containing exe or dll files on windows have had problems with
execute permissions. You may need to 'chmod a+x' the windows exe and
dll files.

User Comments:

6642634: Test nsk/regression/b6186200 crashed with SIGSEGV
Summary: Use correct allocation path in expand_and_allocate() so object's
  mark and p-bits are set as appropriate.
Reviewed-by: jmasa, pbk

Fixed 6642634: Test nsk/regression/b6186200 crashed with SIGSEGV

This is a rather old bug and it's not clear why it started showing up
recently in testing, except that some timing change may have rendered
the bug more easily reproducible. With the right stress options
(see below) the crash can be reproduced with older JVM's as well.

When direct allocation occurs in the old generation, collected by the
CMS collector, concurrent with a CMS cycle, objects must be
allocated live (and P-bits used to mark the size of those objects
to allow precleaning or sweeping phases to determine the sizes
of objects allocated but not yet initialized). This requires the use
of specialized allocation paths which were normally used.

Except when the allocation failed and the generation had to be
expanded to accommodate the allocation. In this case, the correct
allocation path was not used, and consequently the object was not
allocated live. Depending on when the allocation occurred, this
could cause a crash either in a sweeping phase (because the
size of an uninitialized block could not be determined) or in a later
marking phase (because a reachable block had been reclaimed
prematurely).

A temporary workaround, as documented in the bug report, is
to fix the size of the old generation.

Fix Verified: yes

Verification Test: nsk/regression/b6186200
with the set of stress options documented in the
bug report for greater reproducibility.

Without the fix the test fails withing 2-5 iterations of the test
(about 2-5 minutes on the test machine). With the fix the
test was run successfully for more than 48 hours.

It is expected that this fix will also address another couple of
very hard to reproduce heisenbugs that we have seen occasionally
in nightly testing.

Epilogue: the allocation paths can be further cleaned up,
since they seem to have organically evolved over a period of
time and collected a bunch of cruft. That will be done in
a separate CR, meanwhile putting back this more local fix.
                                     
2008-02-21



Hardware and Software, Engineered to Work Together