JDK-6669441 : Performance regression running javac in 6u4 using -server
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 6u4
  • Priority: P3
  • Status: Closed
  • Resolution: Won't Fix
  • OS: solaris_10
  • CPU: sparc
  • Submitted: 2008-02-29
  • Updated: 2010-04-02
  • Resolved: 2009-07-31
Related Reports
Relates :  
Description
This table shows the user cpu time for 6u3 and 6u4 reported by
/bin/time for running 32 bit javac on various numbers of files on a Sun Blade 2500, 2 cpus. Sol 10.
Options used are:
   -J-XX:ThreadStackSize=768 -J-Xmx896m -J-Xms128m -J-XX:PermSize=32m -J-XX:MaxPermSize=160m

                6u3     6u4     Per cent degradation
Server:                           
    122 files   12.3    13.2    0.07
    203 files   13.7    15.6    0.14
    435 files   31.6    36.2    0.15
                        
Client 
    203 files   5.3     5.4     0.02

I added the option
    -XX:+PrintGCDetails
and the two log files shows that the GC time is the same for 6u3 and 6u4.

I swapped libjvm.so in 6u3 and 6u4 and the degradation followed libjvm.so.

The attached script can be used to repro this problem.

Comments
EVALUATION The actual cause of the regression is mainly due to the change of ciMethod::instructions_size Here is the webrevs for the change (loop at ciMethod.cpp) file:///net/prt-archiver.sfbay/data/archived_workspaces/main/c2_baseline/2006/20061002120630.jrose.dolphin-cleanups/workspace/webrevs/webrev-2006.10.02/index.html // ------------------------------------------------------------------ // ciMethod::instructions_size +// This is a rough metric for "fat" methods, compared +// before inlining with InlineSmallCode. +// The CodeBlob::instructions_size accessor includes +// junk like exception handler, stubs, and constant table, +// which are not highly relevant to an inlined method. +// So we use the more specific accessor nmethod::code_size. int ciMethod::instructions_size() { GUARDED_VM_ENTRY( nmethod* code = get_methodOop()->code(); // if there's no compiled code or the code was produced by the // tier1 profiler return 0 for the code size. This should @@ -889,11 +901,11 @@ // that currently isn't properly recorded. if (code == NULL || (TieredCompilation && code->compiler() != NULL && code->compiler()->is_c1())) { return 0; } - return code->instructions_size(); + return code->code_size(); ) } As as result, ciMethod::instructions_size is correctly reduced to a smaller size. Given the same InlineSmallCode, this change effectively increases the number of methods to be inlined, and thus cause the regression. We may want to retune the flag -XX:InlineSmallCode to regain the javac server performance. For 6u14, -XX:InlineSmallCode=800 performs the same as 6u3 with -XX:InlineSmallCode=1000. Unfortunately, for 6u14, the default of InlineSmallCode is 1500 (must be good for some other benchmarks) which may prevent us from lowering down the default value.
30-06-2009

EVALUATION >The reason of the regression is the -XX:Tier2CompileThreshold default value change. >on sparc, it changed from 15000 (6u3) to 10000 (6u4). The lower threshold will >definitely incur more compilation. The above comment is not quite approriate. In the current VM, -XX:Tier2CompileThreshold is only used to set -XX:CompileThreshold default (in from 6u4 only). It is -XX:CompileThreshold that actually affects C2 compilation. So, for 6u4, -XX:CompileThreshold is set to -XX:Tier2CompileThreshold (10000), which is the same as 6u3. Of course if I raise -XX:Tier2CompileThreshold to 15000, or directly raise XX:CompileThreshold to 15000, the regression will disappear for javac -serever.
30-06-2009

EVALUATION The reason of the regression is the -XX:Tier2CompileThreshold default value change. on sparc, it changed from 15000 (6u3) to 10000 (6u4). The lower threshold will definitely incur more compilation.
29-06-2009

EVALUATION I think I should post the client timing here also. For compiling 435 files on sparc with solaris 10, javac client is approximately 2X faster than javac client. 6u4 client real 0m9.95s user 0m10.44s sys 0m0.66s 6u3 client real 0m10.07s user 0m10.49s sys 0m0.64s 6u4 server real 0m19.92s user 0m33.96s sys 0m2.83s 6u3 server real 0m18.97s user 0m29.68s sys 0m0.83s
29-06-2009

EVALUATION With -XX:+TimeCompiler on fastdebug, I could see 6u4 spent much longer time on parsing. Also with -XX:-Inline, the number of parsed methods are the same for 6u4 and 6u3; and the regression disappeared. Based on these observation, I think 6u4 performs more aggressive inlining than 6u3 and the additional parsing is the major contributor to the regression. 6u3: Accumulated compiler times: --------------------------- Total compilation: 60.666 sec. method compilation : 60.637 sec/232295 bytes (3831 bytes per sec) stub compilation : 0.028 sec. Phases: parse : 20.778 sec optimizer : 8.403 sec matcher : 3.431 sec scheduler : 3.610 sec regalloc : 17.869 sec macroExpand : 0.330 sec removeEmpty : 0.099 sec peephole : 0.000 sec codeGen : 4.258 sec install_code : 1.044 sec ------------ : ---------- total : 59.921 sec, 98.82 % output : 4.259 sec isched : 0.809 sec bldOopMaps: 0.633 sec 6u4: Accumulated compiler times: --------------------------- Total compilation: 78.351 sec. method compilation : 78.325 sec/235534 bytes (3007 bytes per sec) stub compilation : 0.025 sec. Phases: parse : 34.776 sec optimizer : 9.818 sec matcher : 3.226 sec scheduler : 3.864 sec regalloc : 22.452 sec macroExpand : 0.421 sec removeEmpty : 0.189 sec peephole : 0.000 sec codeGen : 3.132 sec install_code : 0.480 sec ------------ : ---------- total : 78.457 sec, 100.17 % output : 3.134 sec isched : 0.846 sec bldOopMaps: 0.701 sec
29-06-2009

EVALUATION Here are some compilation statistics with -XX:+LogCompilation. (6u4 compiled 10% more methods, and created 10% more nodes than 6u3): 6u3: --- Compiler Statistics --- Methods seen: 7429 Methods parsed: 7429 Nodes created: 1365878 32096 original NULL checks - 22439 elided (69%); optimizer leaves 8544, 6u4: --- Compiler Statistics --- Methods seen: 8169 Methods parsed: 8169 Nodes created: 1476945 35921 original NULL checks - 24695 elided (68%); optimizer leaves 8677, Also, there is no regression on client vm from 6u3 to 6u4.
26-06-2009

EVALUATION I could reproduce the javac -server regression with the script provided in Description. The data listed below are based on 6u3 JDK and different vms (6u3, 6u4 and 6u14). All vms are 32-bit. The tests are performed on sparc with solais 10 (escuintla). 435 files were compiled by javac. The numbers in the table are in seconds. 6u3 6u4 6u14 #1 18.21 21.22 21.17 #2 19.03 20.27 20.00 #3 18.10 19.11 19.74 #4 19.40 21.80 20.45 #5 18.29 19.46 19.62 #6 18.21 18.93 19.59 #7 19.27 22.46 21.66 #8 17.74 21.90 20.55 #9 18.05 19.95 20.01 #10 19.22 20.18 21.35 avg. 18.55 20.53 20.41 We could see there was a ~10% regression from 6u3 to 6u4, and the regression is still there with the current vm (6u14). The regression seems to be cumulative, i.e., I could not find a single changeset that is responsible for this 10% regression.
26-06-2009