Bug ID: JDK-4701980 HPROF: -Xrunhprof option crashes and restarts S1AS app server

Type: Bug
Component: vm-legacy
Sub-Component: jvmpi
Affected Version: 1.3.0,1.3.1_07,1.4.0

Priority: P2
Status: Resolved
Resolution: Fixed
OS: generic,solaris_8,windows_2000
CPU: generic,x86,sparc

Submitted: 2002-06-13
Updated: 2003-04-12
Resolved: 2002-09-02

Other	Other	Other
1.4.0_04 04Fixed	1.4.1_02Fixed	1.4.2Fixed

Using -Xrunhprof:cpu=samples on S1AS7.0 is not giving the desired profiling output. Also, its resulting in the server being crashed.  Server works fine without the -Xrunhprof.

Update:

Please follow http://siva.sfbay:8080/sunrise/profiler.html for the latest on this issue.
Related Bug Ids:4674906, 4701995

CONVERTED DATA BugTraq+ Release Management Values COMMIT TO FIX: 1.4.0_04 1.4.1_02 mantis mantis-b02 FIXED IN: 1.4.0_04 1.4.1_02 mantis mantis-b02 INTEGRATED IN: 1.4.0_04 1.4.1_02 mantis mantis-b02
14-06-2004
EVALUATION ###@###.### 2002-08-02 There are four problems being tracked by this bug: 1) Dan's Client VM SIGSEGV running vignette when OutOfMemoryError occurs 2) unable to get complete java.hprof.txt output 3) fastdebug VM fails an assertion on app server start up 4) 1.4.0-U1 Server VM SIGSEGV running vignette when OutOfMemoryError occurs Problem #2 has been resolved: - delete the -Xrs option - use 'kill -3' or 'kill -QUIT' on the appservd process to force java.hprof.txt to be flushed Problem #3 will be deferred for now. This issue doesn't impact MDE, but should be resolved with the iPlanet dev team. Problem #1 will be deferred in favor of problem #4. Crashes with my bits are less important than crashes with 1.4.0-U1 bits. ###@###.### 2002-08-07 The -Xrunhrof:cpu=samples option uses SuspendThread(), GetCallTrace() and ResumeThread() to gather sample data. JVM/PI requires that GC be disabled before SuspendThread() is called and GC cannot be enabled before all threads have been resumed. This means that GC can be disabled for a long time when there are lots of threads. Combined with a low memory situation, this can result in OutOfMemoryErrors. ###@###.### 2002-08-07 (update 1) The hprof sampler thread grabs disables GC, grabs the hprof_dump_lock and finally grabs the data_access_lock. GC is disabled for the thread suspend operations (per JVM/PI spec). The hprof_dump_lock is grabbed to prevent hprof data from being dumped while actively sampling. The data_access_lock is grabbed to prevent thread list changes and to safely save the sampling data (with possible table updates). GC is disabled before grabbing the hprof_dump_lock to prevent a deadlock between the sampler thread (trying to disable GC) at the same time the VM thread is trying post the JVM_SHUT_DOWN event. See Karen's fix for 4325941. GC is disabled before grabbing the data_access_lock to prevent a deadlock between the sampler thread (trying to disable GC) at the same time the VM thread is trying to post a GC_START event. The hprof_dump_lock has to be grabbed before the data_access_lock because the routines that dump the hprof data grab and release the data_access_lock as needed. The hprof_dump_lock is hot just because it is held for a long time (relative to other locks). The data_access_lock is hot because it is used to control access to so many things. The hprof_dump_lock can be made less hot by only holding the lock long enough to set various flags. Local variables can be used to remember caller sensitive state. The data_access_lock can be made less hot by splitting off its protection of the thread lists into a new lock, thread_list_lock. The data_access_lock will still need to be held to safely save the sampling data, but that is after the threads are resumed. By changing the sampler thread to hold the hprof_dump_lock for less time, there is no longer a race with the VM thread trying to post the JVM_SHUT_DOWN event. By splitting off thread list control into the thread_list_lock, the data_access_lock is also held for less time. This means there is no longer a race with the VM thread trying to post a GC start event. The GC disable call can be moved right before the SuspendThread() calls and the GC enable call can be moved right after the ResumeThread() calls. This will greatly reduce the amount of time that GC is disabled, but it probably won't be enough. Disabling GC doesn't scale as the thread count grows. Also, the JVM/DI SuspendThread() API does not require GC to be disabled. I wonder if this JVM/PI spec "requirement" is due to the original lock order implementation described above.
11-06-2004
PUBLIC COMMENTS Picking up an escalation under 1.4.0_01 for this bug. ###@###.### 2002-08-29
29-08-2002
SUGGESTED FIX ###@###.### 2002-08-12 See attached 4701980-webrev.tar for the proposed fixes (pre-code review). This webrev is relative to my Batch-20020522 proposed changes for Merlin Update 2 (4683023). ###@###.### 2002-08-13 See attached 4701980-webrev-hopper.tar for the Hopper version of the proposed fixes (pre-code review). This webrev is relative to my Batch-20020522 proposed changes for Hopper (4683023). ###@###.### 2002-08-14 See attached 4701980-webrev-cr1.tar for the proposed fixes after the first round of code review. ###@###.### 2002-08-16 See attached 4701980-webrev-hopper-cr1.tar for the Hopper version of the proposed fixes after the first round of code review.
16-08-2002

Duplicate :	JDK-4325563 - HPROF: CPU sampling can continue during VM shutdown
Duplicate :	JDK-4835665 - HPROF: -Xrunprof:heap=dump causes OutOfMemoryError
Relates :	JDK-4804070 - RegTest-CTE test 4701980/b4701980.java fails on Solaris x86
Relates :	JDK-4727676 - MarkSweepAlwaysCompactCount causes unpredictable memory allocation