A licensee found the core when the program creates child process in
Runtime.exec().
The crash frequency is once a month in 24 hours run on multi processor
box.(4 CPU)
The crash(SIGABORT) occurs in "close()" called from
java/lang/UNIXprocess#forkAndExec and some message of libthread panic
is output.
We will get more details later.
The attache in below are their investigation.
INVESTIGATION:
There seems a problem in
j2se/src/solaris/native/java/loang/UNIXProcess_md.c.solaris
specifically, the problem may be in the lines which try to close()
file desciptor after fork1().
=== UNIXProcess_md.c.solaris 1.3.1_0X ====
.....
resultPid = fork1();
.......
if (resultPid == 0) {
/* Child process */
int i, max_fd;
.......
/* close everything */
max_fd = sysconf(_SC_OPEN_MAX);
for (i = 3; i < max_fd; i++) close(i); // (*1), may have problem
......
execvp(cmdv[0], cmdv);
_exit(-1);
}
<=======
They consider the above (*1) part should be as follows.
=======>
.....
resultPid = fork1();
.......
if (resultPid == 0) {
/* Child process */
int i, max_fd;
.......
/* close everything */
max_fd = sysconf(_SC_OPEN_MAX);
for (i = 3; i < max_fd; i++) { // change
int flags = fcntl(i, F_GETFD); // change
if(flags >= 0) { // change
flags |= FD_CLOEXEC; // change
fcntl(i, F_SETFD, flags); // change
} // change
} // change
......
execvp(cmdv[0], cmdv);
_exit(-1);
}
<=======
NOTE:
According to their investigation, when the following occurs in libthread
in Solaris7 106980-12 or later, the program outputs core files.
- To fail in creating lwp because of the short of resources( ex. mem)
- application program close(2)s fd whcih libthread open(2)
==========================================================================