JDK-6237552 : HPROF: cpu=times reports incorrect percentages when rolling up time spent
  • Type: Bug
  • Component: tools
  • Sub-Component: hprof
  • Affected Version: 5.0
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • OS: linux
  • CPU: x86
  • Submitted: 2005-03-08
  • Updated: 2010-04-02
  • Resolved: 2005-04-01
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other JDK 6
5.0u4Fixed 6 b31Fixed
Description
FULL PRODUCT VERSION :
java version "1.5.0_01"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_01-b08)
Java HotSpot(TM) Client VM (build 1.5.0_01-b08, mixed mode)


ADDITIONAL OS VERSION INFORMATION :
Linux andromeda.basis.com 2.6.10-1.766_FC3 #1 Wed Feb 9 23:06:42 EST 2005 i686 i686 i386 GNU/Linux


A DESCRIPTION OF THE PROBLEM :
Running java -agentlib:hprof=cpu=times  gives an incorrect report.  The header in the java.hprof.txt file says the following:

CPU TIME        is a profile of program execution obtained by measuring
                the time spent in individual methods (excluding the time
                spent in callees), as well as by counting the number of
                times each method is called. Entries in this record are
                TRACEs ranked by the percentage of total CPU time. The
                "count" field indicates the number of times each TRACE
                is invoked.

In hprof_tls.c, included with the jdk in demos/jvmti/hprof/src, there is a function pop_method() that makes a call to adjust_stats().  It passes in a pointer  to a StackElement, but the pointer it passes in is a pointer to a temporary local variable instead of a pointer to the StackElement that's really in the tlsinfo->stack.  Because of this, the time_in_callees field is only changed on a temporary copy.  The local variable self_count in pop_method() depends on this value, and since the value of time_in_callees is always zero in the real StackElement, self_count is always equal to the local variable total_count.

In addition, in hprof_trace.c, there is a function collect_iterator, which has the line:

    iterate->grand_total_cost += info->total_cost;
                                                                  ^^^^

This line counts the total time spent in each method.  If one method is a callee of another method, the time spent in the callee is counted once for the caller and once for the callee.  If a method at stack depth d costs n with all its children, the total contribution to grand_total_cost will be d * n for that method.  The line should be changed to read:

    iterate->grand_total_cost += info->self_cost;
                                                                    ^^^^

Finally, also in hprof_trace.c, the function trace_output_cost() determines its percentages in terms of total_cost / grand_total_cost.  In order to match documentation, these lines should be changed from:

            percent = (double)info->total_cost / (double)iterate.grand_total_cost;
                                                      ^^^^
to

            percent = (double)info->self_cost / (double)iterate.grand_total_cost;
                                                       ^^^^

This seems to help hprof report values that are more useful.

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Here is a test java program to profile:
}

1) Save this file to TestAnalyze.java.
2) Compile it:  javac TestAnalyze.java
2) Run it with the command line:
   java -agentlib:hprof=cpu=times,depth=12 TestAnalyze
3) When it prints out "Waiting...", hit Ctrl-\ and then the Enter key in very quick succession.  If done correctly, the enter will be done before the dump completes, and it will say:

"VM Thread" prio=1 tid=0x08114020 nid=0x2dd2 runnable

"VM Periodic Task Thread" prio=1 tid=0x0816d5a0 nid=0x2dd9 waiting on condition

Dumping CPU usage by timing methods ...
 done.

(including the carriage return between "..." and "done".  This allows us to ignore all the class loading nonsense).

4)  The program will take a little while to complete, but it will again dump CPU usage on exit, and produce the file java.hprof.txt

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
With the changes above, and some more changes to give more detailed costs of methods, the last few lines of java.hprof.txt should look something like below.  Notice that fooA() and main() do not even make the list.  Each method which does nearly the same amount of work has very similar values instead of foo() being so much more than bar() or baz().

CPU TIME (ms) BEGIN (total = 24625550) Fri Mar  4 17:57:23 2005
rank   self  accum   count trace method
   1 19.38% 19.38%       2 302129 TestAnalyze.foo
   2 14.19% 33.57%       2 302128 TestAnalyze.baz
   3 13.46% 47.03%       2 302123 TestAnalyze.baz
   4 13.35% 60.38%       2 302127 TestAnalyze.bar
   5 13.23% 73.61%       2 302125 TestAnalyze.baz
   6 13.22% 86.83%       2 302126 TestAnalyze.foo
   7 13.16% 99.99%       2 302124 TestAnalyze.foo
CPU TIME (ms) END
ACTUAL -
The times portion below show a great deal of time spent in TestAnalyze.main, even though main() only calls fooA() after the System.in.readLine().  Simillarly, it shows that fooA takes up 22.51%, where it should probably not even make the trace.

CPU TIME (ms) BEGIN (total = 113182) Fri Mar  4 17:49:58 2005
rank   self  accum   count trace method
   1 26.11% 26.11%       1 302131 TestAnalyze.main
   2 22.51% 48.62%       2 302129 TestAnalyze.foo
   3 22.51% 71.14%       1 302130 TestAnalyze.fooA
   4  8.68% 79.82%       2 302127 TestAnalyze.bar
   5  5.82% 85.64%       2 302126 TestAnalyze.foo
   6  5.74% 91.38%       2 302124 TestAnalyze.foo
   7  2.88% 94.27%       2 302123 TestAnalyze.baz
   8  2.87% 97.14%       2 302125 TestAnalyze.baz
   9  2.86% 100.00%       2 302128 TestAnalyze.baz
CPU TIME (ms) END

REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
public class TestAnalyze
{
    public static void main(String[] argv)
        throws Exception
    {
        TestAnalyze a = new TestAnalyze();
        System.out.println("Waiting...");
        System.in.read();
        a.fooA();
    }

    public void fooA()
    {
        foo(true);
        foo(true);
    }

    public void foo(boolean p_recurse)
    {
        double c = 0.0;
        for (int i = 1; i <= 10000; ++i)
            for (int j = 1; j <= 10000; ++j)
                c += (double)i / (double)j;

        if (p_recurse)
        {
            foo(false);
            bar();
        }
        baz();
    }

    public void bar()
    {
        double c = 0.0;
        for (int i = 1; i <= 10000; ++i)
            for (int j = 1; j <= 10000; ++j)
                c += (double)i / (double)j;

        foo(false);
    }

    public void baz()
    {
        double c = 0.0;
        for (int i = 1; i <= 10000; ++i)
            for (int j = 1; j <= 10000; ++j)
                c += (double)i / (double)j;

    }
}
---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
Implement the fixes I mentioned above into the source code provided with the jdk.
###@###.### 2005-03-08 17:30:50 GMT

Comments
EVALUATION Excellent webbig. See suggested fix. Exactly as stated in the webbug. ###@###.### 2005-03-12 00:34:36 GMT After testing a little (our testing of the actual output from hprof is limited, so this is manual testing) I discovered that the cpu=samples needed to treat self_cost and total_cost the same, and the sorting needed to change to be based on the self_cost, and the cost printed out needed to be the self_cost. See suggested fix for the additional changes. ###@###.### 2005-03-21 19:42:36 GMT
12-03-2005

SUGGESTED FIX ######### File: ./hprof_tls.c ######### (cd . && sccs diffs -w -s -b hprof_tls.c) ------- hprof_tls.c ------- 437d436 < StackElement parent; 494,495c493 < parent = *(StackElement*)p; < adjust_stats(total_time, self_time, trace_index, &parent); --- > adjust_stats(total_time, self_time, trace_index, (StackElement*)p); ######### File: ./hprof_trace.c ######### (cd . && sccs diffs -w -s -b hprof_trace.c) ------- hprof_trace.c ------- 341c341 < iterate->grand_total_cost += info->total_cost; --- > iterate->grand_total_cost += info->self_cost; 684c684 < percent = (double)info->total_cost / (double)iterate.grand_total_cost; --- > percent = (double)info->self_cost / (double)iterate.grand_total_cost; 710c710 < percent = ((double)info->total_cost / (double)iterate.grand_total_cost) * 100.0; --- > percent = ((double)info->self_cost / (double)iterate.grand_total_cost) * 100.0; bonsai<9> ###@###.### 2005-03-12 00:34:36 GMT Needed to make these additional changes: ------- hprof_trace.c ------- *** /tmp/sccs.6Layc3 Mon Mar 21 11:03:25 2005 --- hprof_trace.c Mon Mar 21 10:57:40 2005 *************** *** 356,362 **** info1 = get_info(trace1); info2 = get_info(trace2); /*LINTED*/ ! return (int)(info2->total_cost - info1->total_cost); } static int --- 357,363 ---- info1 = get_info(trace1); info2 = get_info(trace2); /*LINTED*/ ! return (int)(info2->self_cost - info1->self_cost); } static int *************** *** 629,634 **** --- 630,636 ---- info = get_info(traces[i]); info->num_hits += 1; + info->self_cost += (jlong)1; info->total_cost += (jlong)1; } } *************** *** 720,726 **** } io_write_cpu_samples_elem(i+1, percent, accum, info->num_hits, ! (jint)info->total_cost, info->serial_num, key->n_frames, csig, mname); jvmtiDeallocate(csig); --- 722,728 ---- } io_write_cpu_samples_elem(i+1, percent, accum, info->num_hits, ! (jint)info->self_cost, info->serial_num, key->n_frames, csig, mname); jvmtiDeallocate(csig); ###@###.### 2005-03-21 19:42:36 GMT
12-03-2005