JDK-4656697 : Linux: VM hang when Java program exits
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 1.4.1
  • Priority: P4
  • Status: Closed
  • Resolution: Won't Fix
  • OS: linux
  • CPU: generic
  • Submitted: 2002-03-22
  • Updated: 2002-03-22
  • Resolved: 2002-03-22
Description
This bug is filed to document a known glibc problem.

Sometimes VM will hang when a Java program has finished its execution and tries
to exit by calling "System.exit()" or returning from the main() function. 
When that happens, there is only one remaining Java thread of the program.

This is a bug in how glibc-2.2.x handles program exit. The program will hang
on exit if one of the threads happen to allocate or deallocate memory at
the time. LinuxThreads up to 2.2.4 tries to "free" the manager thread stack
after all user threads have been killed. It is unsafe because user threads
might get killed when they are holding the malloc lock.

Here is a Java testcase:

---------------------------------- ShutdownMallocTest.java -------------
import java.io.*;

public class ShutdownMallocTest extends Thread{

   public native void foo();

   static {
     System.loadLibrary("ShutdownMallocTest");
   }

   public void run() {
     foo();
   }

   public static void main(String args[]) {
     System.out.println("- ShutdownMallocTest -");

     for (int i = 0; i < 4; i++) {
       ShutdownMallocTest smt = new ShutdownMallocTest();
       smt.setDaemon(true);
       smt.start();
     }
   }
}
---------------------------------- ShutdownMallocTest.c -----------------
#include <jni.h>

JNIEXPORT void JNICALL Java_ShutdownMallocTest_foo (JNIEnv * env, jobject obj)
{
   while (1) {
     malloc(1);
   }
}
-------------------------------------------------------------------------
To build the testcase:
  javac ShutdownMallocTest.java
  gcc -g -shared -I${JAVA_HOME}/include -I${JAVA_HOME}/include/linux ShutdownMallocTest.c -o libShutdownMallocTest.so
-------------------------------------------------------------------------

When that hangs, "ps -A|grep java" returns only one java thread:

raq:~> ps -A|grep java
17111 pts/2    00:00:00 java

If you use gdb to attach to the thread, you can see it hangs in libc_free:

(gdb) where
#0  0x40075aa5 in __sigsuspend (set=0xbffff600)
    at ../sysdeps/unix/sysv/linux/sigsuspend.c:45
#1  0x40037079 in __pthread_wait_for_restart_signal (self=0x4003fd60)
    at pthread.c:967
#2  0x40038d39 in __pthread_alt_lock (lock=0x4017ba40, self=0x0)
    at restart.h:34
#3  0x40035c16 in __pthread_mutex_lock (mutex=0x4017ba30) at mutex.c:120
#4  0x400c7be8 in __libc_free (mem=0x8081188) at malloc.c:3152
#5  0x4003732c in pthread_onexit_process (retcode=0, arg=0x0) at pthread.c:796
#6  0x4007842b in exit (status=0) at exit.c:54
#7  0x40063510 in __libc_start_main (main=0x8048c60 <strcpy+200>, argc=2, 
    ubp_av=0xbffff8b4, init=0x80488e8, fini=0x804ba0c <strcpy+11892>, 
    rtld_fini=0x4000dc14 <_dl_fini>, stack_end=0xbffff8ac)
    at ../sysdeps/generic/libc-start.c:129

The bug was introduced in glibc-2.2 and it has been fixed in 2.2.5.

Most Linux distributions today include some version of glibc-2.2.x, so they
all could be affected. But since the hang happens at the very end of a program's
life cycle, user can simply kill the last remaining thread when that happens.

Comments
EVALUATION The problematic code is in pthread_onexit_process() (pthread.c): /* Main thread should accumulate times for thread manager and its children, so that timings for main thread account for all threads. */ if (self == __pthread_main_thread) { waitpid(__pthread_manager_thread.p_pid, NULL, __WCLONE); (*) free (__pthread_manager_thread_bos); __pthread_manager_thread_bos = __pthread_manager_thread_tos = NULL; } (*) this "free" may hang if manager thread kills a user thread when it is holding the malloc lock. Deallocating manager thread stack is unnecessary because the operating system will soon reclaim everything once this onexit function returns. The problem is fixed in glibc-2.2.5 by removing the "free" call. There is nothing we can do in VM for this bug. ###@###.### 2002-03-21
21-03-2002