Bug ID: JDK-6375302 (thread) Thread.currentThread().getStackTrace(); is 10x slower vs getting stacktrace from throwable

Versions (Unresolved/Resolved/Fixed)

The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.

JDK 6
6 b78Fixed

It seems like Thread.currentThread().getStackTrace(); is 10x slower,
compared to getting the stacktrace from a throwable. Actually the 
overhead is probably due to context switching since all the time in 
the first case seems to be SystemTime and not user time.

In my env where I run some stress tests I can max out the 4way Linux 
box with 20 client threads when I enable some diagnostic probes within 
our code. In	this case almost all the CPU is SystemTime and appears to 
be due to Thread.currentThread().getStackTrace();

Here is the test case. You will notice that one version is about 
10x slower than the other version.


public class Test {

    public Test() {

    }

    public static void main(String[] args) {

	long start = System.currentTimeMillis();
	for(long i=0; i<100000; i++) {

            StackTraceElement [] stackTrace;

            if (args.length > 0)

		stackTrace = new Exception().getStackTrace();

            else

		stackTrace = Thread.currentThread().getStackTrace();

	}

	System.out.println("Total time = " + (System.currentTimeMillis()
- start) + "ms.");

    }
}

I looked into the VM code and here is what I can make of it -

The stack is retrieved by calling vframeStream st((JavaThread*) THREAD);

Both version do this. That is good. The exception version calls this
directly.
The getThreadDumps() does a few "extra" steps that can be more expensive -



1. Get the request into a shared queue

2. Schedule a VM thread to run your command.

3. The scheduled VM thread seems to do some stuff to make sure it's safe
   to get stack trace.

4. Allocates a C++ objects to collect the stack trace. This is much more
   expensive than a Java object. You have to new/delete - acquire mutex
   to do that.

5. Finally calls vframeStream()

6. The VM thread returns the result. And the caller gets the result
   out of the shared queue.
There is a comment in file threadService.cpp that says

// TODO: Optimization if only the current thread or maxDepth = 1

EVALUATION Being able to answer the question "how did control get here?" is sometimes very useful and the lower the overhead the better.

16-02-2006

SUGGESTED FIX As suggested, the trivial, obvious approach is best for now.

16-02-2006

EVALUATION Throwable.fillInstackTrace() knows it always works with the current thread's stack and so it can walk the stack with no effort. On the other hand the Thread getStackTrace() method makes no assumptions about whether or not the target thread is the current thread, and it simply requests a thread dump for a single thread ('this') which in this case happens to be the current thread (creating a Thread[] and a StackTraceElement[][] in the process). Obtaining the stack trace of an arbitrary thread requires a safepoint so we take a large performance hit compared to the fillInStackTrace approach. Using the given "benchmark" I got the following results: new Exception().getStackTrace() 734ms Thread.currentThread().getStackTrace() 10823ms

31-01-2006