Bug ID: JDK-6372906 JVM crashes when classes.jsa file is corrupted

Type: Bug
Component: hotspot
Sub-Component: runtime
Affected Version: 6

Priority: P2
Status: Closed
Resolution: Fixed
OS: solaris
CPU: sparc

Submitted: 2006-01-16
Updated: 2012-02-01
Resolved: 2006-02-15

JDK 6
6 b72Fixed

During nightly testing, sometimes many tests fail with SIGBUS
An example log is available at

http://vmsqe.sfbay/nightly/mantis/DTWS/results/01-13-06/ServerVM/64BITSOLSPARC/mixed/Main_Baseline/vm.gc-NIGHTLY-Main_Baseline-ServerVM-mixed-64BITSOLSPARC-2006-01-14-06-52-56/ResultDir/allocate001/allocate001.tlog

#
# An unexpected error has been detected by Java Runtime Environment:
#
#  SIGBUS (0xa) at pc=0xfec6658c, pid=13791, tid=2
#
# Java VM: Java HotSpot(TM) Client VM (20060113030552.ap159146.gc_merge mixed mode, sharing)
# Problematic frame:
# Segmentation Fault (core dumped)

Here is pstack output.

-----------------  lwp# 2 / thread# 2  --------------------
 fee59944 void frame::print_on_error(outputStream*,char*,int,bool)const (f788, fe9fb608, ff1b4d18, 7d0, fe9fb558, ff1a7ca8) + 30
 ff06b08c void VMError::report(outputStream*) (fe9fb608, ff1b4d18, fe9fb6a4, 20d18, ff13f016, ff194000) + 350
 ff06bf24 void VMError::report_and_die() (fe9fb6a4, c9fe, 13c00, 0, 214e8, 2) + 474
 fed647a4 JVM_handle_solaris_signal (a, fe9fbb88, fe9fb8d0, 97400, 30000, fec6658c) + 9bc
 ff385fec __sighndlr (a, fe9fbb88, fe9fb8d0, fed63dc8, 0, 0) + c
 ff37fdd8 call_user_handler (a, fe9fbb88, fe9fb8d0, 0, 0, 0) + 234
 ff37ff88 sigacthandler (a, fe9fbb88, fe9fb8d0, 54fe04, ff194000, 34e20) + 64
 --- called from signal handler with signal 10 (SIGBUS) ---
 fec6658c void CompactingPermGenGen::initialize_oops() (32b70, ff19fac0, bac0, ff194000, 52daa4, b800) + 34
 fec5f7d4 int universe_init() (16800, 16800, 32b70, cc00, 32d78, ff1aaa88) + 388
 fec4e51c int init_globals() (16000, 16170, 0, ff1aa178, fe9fbd5c, ff1232b0) + 44
 ff031d40 int Threads::create_vm(JavaVMInitArgs*,bool*) (12db8, fe9fbf1b, 30000, 16400, ff1aa504, ff194000) + 290
 fec45938 JNI_CreateJavaVM (fe9fbf94, fe9fbf90, fe9fbf80, 10002, 54e794, ff194000) + d0
 00012664 JavaMain (fec45868, 2b0cc, 0, 0, 0, 0) + 188
 ff385c94 _lwp_start (0, 0, 0, 0, 0, 0)
-----------------  lwp# 1 / thread# 1  --------------------
 ff31cb30 _lwp_wait (2, ffbff234, 110a0, ff371d18, 5, 0) + 8
 ff379844 _thrp_join (2, 0, ffbff2f8, 1, 0, ffbff2fc) + 44
 ff3799b8 thr_join (2, 0, ffbff2f8, ffbff388, 0, ffbff2fc) + 10
 000188c0 ContinueInNewThread (124dc, 0, 0, ffbff388, fffe7e0c, 0) + 30
 0001249c main     (18000, 2ab28, 10000, 2b0cc, 458, 10001) + eac
 000111c0 _start   (0, 0, 0, 0, 0, 0) + 108

Investigation shows that, for example, on machine starwars.sfbay.sun.com Main_baseline java/javac cannot be started at all (JDK distribution located on starwars in /var/tmp/Work/Work/JDK/NIGHTLY/Main_Baseline/solaris-sparc/bin, I've also copied it to /net/sqesvr-nfs.sfbay/global/nfs/vm1/users/nh161220/jdk-bad in case it gets overwritten). 

The problem seems to be in corrupted classes.jsa file. Removing it solves the problem. Problematic file is attached.

Crash is only repeatable when java is started with default options or -client -XX:+UseSerialGC. Also, crash seems to be hardware dependant. For example there is no crash on gtee.sfbay.sun.com. 

This bug severly impacts testing.

EVALUATION The corrupted classes.jsa is only 8192 bytes, so it is most likely due to incomplete write during java -Xshare:dump (since I can't reproduce the bug and this bug can only be reproduced on some particular machine sometimes, my evaluation is just observation strictly speaking). To be more defensive, if the code which generate classes.jsa can't finish correctly, we need to remove that file rather than leave there as it is. That will potentially crash the java applications during startup. At least there is one significant problem in the dumping code, we exit the VM right away if the write system call fails during writing the data to classes.jsa. Here is the problematic code: < src/share/vm/memory/filemap.cpp> void FileMapInfo::write_bytes(const void* buffer, int nbytes) { if (_file_open) { int n = ::write(_fd, buffer, nbytes); if (n != nbytes) { fail_stop("Unable to write to shared archive file.", NULL); } } _file_offset += nbytes; } So one possible fix is to remove classes.jsa before calling fail_stop.

20-01-2006

WORK AROUND re-create shared archive

17-01-2006

WORK AROUND Use -Xshare:off or remove corrupted classes.jsa file.

16-01-2006