FULL PRODUCT VERSION : java version "1.5.0_01" Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_01-b08) Java HotSpot(TM) Client VM (build 1.5.0_01-b08, mixed mode) ADDITIONAL OS VERSION INFORMATION : Linux andromeda.basis.com 2.6.10-1.766_FC3 #1 Wed Feb 9 23:06:42 EST 2005 i686 i686 i386 GNU/Linux A DESCRIPTION OF THE PROBLEM : Running java -agentlib:hprof=cpu=times gives an incorrect report. The header in the java.hprof.txt file says the following: CPU TIME is a profile of program execution obtained by measuring the time spent in individual methods (excluding the time spent in callees), as well as by counting the number of times each method is called. Entries in this record are TRACEs ranked by the percentage of total CPU time. The "count" field indicates the number of times each TRACE is invoked. In hprof_tls.c, included with the jdk in demos/jvmti/hprof/src, there is a function pop_method() that makes a call to adjust_stats(). It passes in a pointer to a StackElement, but the pointer it passes in is a pointer to a temporary local variable instead of a pointer to the StackElement that's really in the tlsinfo->stack. Because of this, the time_in_callees field is only changed on a temporary copy. The local variable self_count in pop_method() depends on this value, and since the value of time_in_callees is always zero in the real StackElement, self_count is always equal to the local variable total_count. In addition, in hprof_trace.c, there is a function collect_iterator, which has the line: iterate->grand_total_cost += info->total_cost; ^^^^ This line counts the total time spent in each method. If one method is a callee of another method, the time spent in the callee is counted once for the caller and once for the callee. If a method at stack depth d costs n with all its children, the total contribution to grand_total_cost will be d * n for that method. The line should be changed to read: iterate->grand_total_cost += info->self_cost; ^^^^ Finally, also in hprof_trace.c, the function trace_output_cost() determines its percentages in terms of total_cost / grand_total_cost. In order to match documentation, these lines should be changed from: percent = (double)info->total_cost / (double)iterate.grand_total_cost; ^^^^ to percent = (double)info->self_cost / (double)iterate.grand_total_cost; ^^^^ This seems to help hprof report values that are more useful. STEPS TO FOLLOW TO REPRODUCE THE PROBLEM : Here is a test java program to profile: } 1) Save this file to TestAnalyze.java. 2) Compile it: javac TestAnalyze.java 2) Run it with the command line: java -agentlib:hprof=cpu=times,depth=12 TestAnalyze 3) When it prints out "Waiting...", hit Ctrl-\ and then the Enter key in very quick succession. If done correctly, the enter will be done before the dump completes, and it will say: "VM Thread" prio=1 tid=0x08114020 nid=0x2dd2 runnable "VM Periodic Task Thread" prio=1 tid=0x0816d5a0 nid=0x2dd9 waiting on condition Dumping CPU usage by timing methods ... done. (including the carriage return between "..." and "done". This allows us to ignore all the class loading nonsense). 4) The program will take a little while to complete, but it will again dump CPU usage on exit, and produce the file java.hprof.txt EXPECTED VERSUS ACTUAL BEHAVIOR : EXPECTED - With the changes above, and some more changes to give more detailed costs of methods, the last few lines of java.hprof.txt should look something like below. Notice that fooA() and main() do not even make the list. Each method which does nearly the same amount of work has very similar values instead of foo() being so much more than bar() or baz(). CPU TIME (ms) BEGIN (total = 24625550) Fri Mar 4 17:57:23 2005 rank self accum count trace method 1 19.38% 19.38% 2 302129 TestAnalyze.foo 2 14.19% 33.57% 2 302128 TestAnalyze.baz 3 13.46% 47.03% 2 302123 TestAnalyze.baz 4 13.35% 60.38% 2 302127 TestAnalyze.bar 5 13.23% 73.61% 2 302125 TestAnalyze.baz 6 13.22% 86.83% 2 302126 TestAnalyze.foo 7 13.16% 99.99% 2 302124 TestAnalyze.foo CPU TIME (ms) END ACTUAL - The times portion below show a great deal of time spent in TestAnalyze.main, even though main() only calls fooA() after the System.in.readLine(). Simillarly, it shows that fooA takes up 22.51%, where it should probably not even make the trace. CPU TIME (ms) BEGIN (total = 113182) Fri Mar 4 17:49:58 2005 rank self accum count trace method 1 26.11% 26.11% 1 302131 TestAnalyze.main 2 22.51% 48.62% 2 302129 TestAnalyze.foo 3 22.51% 71.14% 1 302130 TestAnalyze.fooA 4 8.68% 79.82% 2 302127 TestAnalyze.bar 5 5.82% 85.64% 2 302126 TestAnalyze.foo 6 5.74% 91.38% 2 302124 TestAnalyze.foo 7 2.88% 94.27% 2 302123 TestAnalyze.baz 8 2.87% 97.14% 2 302125 TestAnalyze.baz 9 2.86% 100.00% 2 302128 TestAnalyze.baz CPU TIME (ms) END REPRODUCIBILITY : This bug can be reproduced always. ---------- BEGIN SOURCE ---------- public class TestAnalyze { public static void main(String[] argv) throws Exception { TestAnalyze a = new TestAnalyze(); System.out.println("Waiting..."); System.in.read(); a.fooA(); } public void fooA() { foo(true); foo(true); } public void foo(boolean p_recurse) { double c = 0.0; for (int i = 1; i <= 10000; ++i) for (int j = 1; j <= 10000; ++j) c += (double)i / (double)j; if (p_recurse) { foo(false); bar(); } baz(); } public void bar() { double c = 0.0; for (int i = 1; i <= 10000; ++i) for (int j = 1; j <= 10000; ++j) c += (double)i / (double)j; foo(false); } public void baz() { double c = 0.0; for (int i = 1; i <= 10000; ++i) for (int j = 1; j <= 10000; ++j) c += (double)i / (double)j; } } ---------- END SOURCE ---------- CUSTOMER SUBMITTED WORKAROUND : Implement the fixes I mentioned above into the source code provided with the jdk. ###@###.### 2005-03-08 17:30:50 GMT
|