Other | Other |
---|---|
1.4.1_05 05Fixed | 1.4.2Fixed |
Duplicate :
|
|
Relates :
|
|
Relates :
|
|
Relates :
|
|
Relates :
|
I'm running: datsun$ /opt/jdk1.4/bin/java -version java version "1.4.2-beta" Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2-beta-b16) Java HotSpot(TM) Client VM (build 1.4.2-beta-b16, mixed mode) and the following problem occurs. Here's what's happening... I have a server process that runs forever. It sometimes uses Runtime.exec to start subprocesses. datsun$ ps ax | grep java 4100 ? S 3:42 /opt/jdk1.4/bin/java -Djava.ext.dirs=/home/shannon/ext This is my server process. It's been running since Mar 17. datsun$ pfiles 4100 4100: /opt/jdk1.4/bin/java -Djava.ext.dirs=/home/shannon/ext -jar /home/shan Current rlimit: 1024 file descriptors 0: S_IFCHR mode:0666 dev:136,0 ino:60568 uid:0 gid:3 rdev:13,2 O_RDONLY|O_LARGEFILE 1: S_IFREG mode:0600 dev:32,16 ino:745340 uid:1011 gid:10 size:6037573 O_WRONLY|O_LARGEFILE 2: S_IFIFO mode:0000 dev:231,0 ino:131335 uid:0 gid:25 size:0 O_RDWR 3: S_IFCHR mode:0666 dev:136,0 ino:60570 uid:0 gid:3 rdev:13,12 O_RDWR 4: S_IFDOOR mode:0444 dev:236,0 ino:61825 uid:0 gid:0 size:0 O_RDONLY|O_LARGEFILE FD_CLOEXEC door to nscd[223] 5: S_IFIFO mode:0664 dev:32,16 ino:745581 uid:1011 gid:10 size:780 O_RDONLY|O_LARGEFILE 6: S_IFREG mode:0600 dev:32,16 ino:746048 uid:1011 gid:10 size:7607 O_RDONLY|O_LARGEFILE 7: S_IFIFO mode:0664 dev:32,16 ino:745581 uid:1011 gid:10 size:780 O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE 8: S_IFREG mode:0600 dev:32,16 ino:745922 uid:1011 gid:10 size:1034402 O_WRONLY|O_APPEND|O_CREAT|O_LARGEFILE 10: S_IFREG mode:0600 dev:32,16 ino:745922 uid:1011 gid:10 size:1034402 O_WRONLY|O_APPEND|O_CREAT|O_LARGEFILE Note that fd 9 and fd 11 are the last two unused fds. If I were to create a pipe, the read end would be fd 9 and the write end would be fd 11. datsun$ ps alx | grep 4100 8 1011 4100 4097 0 29 3011865664992 0002d934 S ? 3:42 /opt/jdk1.4/ 8 1011 17934 4100 0 49 23 1096 744 300037f4c40 S ? 0:00 /bin/sh /hom datsun$ ps alx | grep 17934 8 1011 17934 4100 0 49 23 1096 744 300037f4c40 S ? 0:00 /bin/sh /hom 8 1011 17936 17934 0 49 23 976 704 300043d8a80 S ? 0:00 cat 17934 is the shell started by 4100. datsun$ pfiles 17934 17934: /bin/sh /home/shannon/.mail/filtermsg metadata-jsr +In/metadata Current rlimit: 1024 file descriptors 0: S_IFIFO mode:0000 dev:231,0 ino:178470 uid:1011 gid:10 size:0 O_RDWR 1: S_IFREG mode:0600 dev:0,2 ino:1624593 uid:1011 gid:10 size:7 O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE 2: S_IFREG mode:0600 dev:0,2 ino:1624593 uid:1011 gid:10 size:7 O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE 11: S_IFIFO mode:0000 dev:231,0 ino:178470 uid:1011 gid:10 size:0 O_RDWR 19: S_IFREG mode:0775 dev:32,16 ino:745551 uid:1011 gid:10 size:933 O_RDONLY|O_LARGEFILE FD_CLOEXEC Note that both fd 0 and fd 11 are the same pipe. fd 1 and fd 2 were redirected to a debug output file by the shell script. 17936 is a "cat" command capturing the data from the pipe. It should terminate when the pipe is closed. datsun$ pfiles 17936 17936: cat Current rlimit: 1024 file descriptors 0: S_IFIFO mode:0000 dev:231,0 ino:178470 uid:1011 gid:10 size:0 O_RDWR 1: S_IFREG mode:0600 dev:0,2 ino:4450889 uid:1011 gid:10 size:7649 O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE 2: S_IFREG mode:0600 dev:0,2 ino:1624593 uid:1011 gid:10 size:7 O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE 11: S_IFIFO mode:0000 dev:231,0 ino:178470 uid:1011 gid:10 size:0 O_RDWR Note that it also has the pipe open on fd 11. Because so many processes have the pipe open, the fact that the JVM end closed the pipe makes no difference. The pipe doesn't get an EOF and so the cat process hangs forever. It looks like there might be some sort of bug in the JVM where it doesn't always close the write end of the pipe before starting the subprocess that should only get the read end of the pipe. The parent should keep the write end open and close the read end, and the child should keep the read end open and close the write end. And the weird thing is this doesn't happen every time. It only seems to fail after running for a long time, and even then it only fails sometimes. This also fails on 1.4.1.
|