United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
Bug ID: JDK-6565458 10x performance regression on x86 in micro test with Reflection.
JDK-6565458 : 10x performance regression on x86 in micro test with Reflection.

Details
Type:
Enhancement
Submit Date:
2007-06-04
Status:
Open
Updated Date:
2014-07-01
Project Name:
JDK
Resolved Date:
Component:
hotspot
OS:
solaris_10
Sub-Component:
compiler
CPU:
x86
Priority:
P4
Resolution:
Unresolved
Affected Versions:
7
Targeted Versions:

Related Reports
Relates:
Relates:
Relates:

Sub Tasks

Description
During investigation of old bug 4939638 I found the next regression
caused by 20061023120554.jrose.dolphin-cleanups putback.
I attached the java file and the class file compiled by jikes.

hsdev-5% cp_prt6 20061020142756.sgoldman.6481940/ i486 2 product
hsdev-5% gamma -XX:LoopUnrollLimit=0 -XX:LoopOptsCount=0 Micro
Reflection: test(10) x 10000000
  elapsed time: 924ms
  elapsed time: 616ms
  elapsed time: 582ms
  elapsed time: 581ms
  elapsed time: 584ms
Reflection: loop in sub-method: test(10) x 10000000
  elapsed time: 630ms
  elapsed time: 637ms
  elapsed time: 584ms
  elapsed time: 583ms
  elapsed time: 583ms
Reflection: test(1..10000000)
  elapsed time: 983ms
  elapsed time: 956ms
  elapsed time: 960ms
  elapsed time: 957ms
  elapsed time: 960ms
Reflection: loop in sub-method: test(1..10000000)
  elapsed time: 975ms
  elapsed time: 933ms
  elapsed time: 950ms
  elapsed time: 952ms
  elapsed time: 977ms
result: 9999999

hsdev-5% cp_prt6 20061023120554.jrose.dolphin-cleanups/ i486 2 product
hsdev-5% gamma -XX:LoopUnrollLimit=0 -XX:LoopOptsCount=0 Micro
Reflection: test(10) x 10000000
  elapsed time: 7461ms
  elapsed time: 7363ms
  elapsed time: 7350ms
  elapsed time: 7359ms
  elapsed time: 7351ms
Reflection: loop in sub-method: test(10) x 10000000
  elapsed time: 7340ms
  elapsed time: 7339ms
  elapsed time: 7342ms
  elapsed time: 7340ms
  elapsed time: 7326ms
Reflection: test(1..10000000)
  elapsed time: 7690ms
  elapsed time: 7691ms
  elapsed time: 7694ms
  elapsed time: 7680ms
  elapsed time: 7676ms
Reflection: loop in sub-method: test(1..10000000)
  elapsed time: 7455ms
  elapsed time: 7431ms
  elapsed time: 7420ms
  elapsed time: 7401ms
  elapsed time: 7406ms
result: 9999999

                                    

Comments
EVALUATION

Compiled method is larger in newer JVM, because of increased size of
uncommon (cold) paths.  The hot code is about the same size, but our
heuristic does not measure hot code size.  It should; this is an old
problem.  See comparative nmethod dumps below.

A reasonable solution would be to record, on the nmethod, an offset
which estimates the end of the hot code.  (The compiler attempts to
lay all cold code out after all hot code.)  This offset can be
estimated (for example) in Compile::Output, near the comment which
says "Insert epilogs before every return", by noting the offset of the
latest return (usually the only return).

Note that AMD64 does not manifest this bug because the invoke call is
*never* inlined, because InlineSmallCode is too small for a 64-bit
machine.  This also should be fixed.

In this case, the extra cold code comes from a path taken just once.
The older compiler treats this path as never-taken, because the MDO
has no record of the path.  But the newer compiler treats the path as
seldom-taken (which is more accurate), and therefore emits cold code
for it, calling ensureMemberAccess from Method.invoke (at bci 111).
This extra cold code bloats the compiled method and fools the inlining
heuristic.


--- Comparative nmethod dumps ---
Bad version of code:

Compiled   4   nmethod java.lang.reflect.Method::invoke (167 bytes)
 total in heap  [0xfa2ce588,0xfa2cf050] = 2760
 relocation     [0xfa2ce634,0xfa2ce698] = 100
 main code      [0xfa2ce6a0,0xfa2ceba0] = 1280 (** greater than InlineSmallCode)
   main code, hot part                  = 531 (** up to RET opcode only)
 exception code [0xfa2ceba0,0xfa2cebc0] = 32
 stub code      [0xfa2cebc0,0xfa2cebe8] = 40
 scopes data    [0xfa2cebe8,0xfa2cedfc] = 532
 scopes pcs     [0xfa2cedfc,0xfa2cefbc] = 448
 dependencies   [0xfa2cefbc,0xfa2cefc0] = 4
 handler table  [0xfa2cefc0,0xfa2cf020] = 96
 nul chk table  [0xfa2cf020,0xfa2cf044] = 36
 oops           [0xfa2cf044,0xfa2cf050] = 12


Good version of code:

Compiled   4   nmethod java.lang.reflect.Method::invoke (167 bytes)
 total in heap  [0xfa4cec88,0xfa4cf510] = 2184
 relocation     [0xfa4ced34,0xfa4ced8c] = 88
 main code      [0xfa4ceda0,0xfa4cf100] = 864 (** less than InlineSmallCode)
   main code, hot part                  = 519 (** up to RET opcode only)
 exception code [0xfa4cf100,0xfa4cf120] = 32
 stub code      [0xfa4cf120,0xfa4cf13e] = 30
 constants      [0xfa4cf13e,0xfa4cf140] = 2
 scopes data    [0xfa4cf140,0xfa4cf30c] = 460
 scopes pcs     [0xfa4cf30c,0xfa4cf49c] = 400
 dependencies   [0xfa4cf49c,0xfa4cf4a0] = 4
 handler table  [0xfa4cf4a0,0xfa4cf4e8] = 72
 nul chk table  [0xfa4cf4e8,0xfa4cf504] = 28
 oops           [0xfa4cf504,0xfa4cf510] = 12
                                     
2007-06-05
This is somewhat fixed but not really.  Why is "Reflection: test(1..10000000)" slower than in the original report?

$ java -showversion Micro
java version "1.8.0-ea"
Java(TM) SE Runtime Environment (build 1.8.0-ea-b111)
Java HotSpot(TM) 64-Bit Server VM (build 25.0-b53, mixed mode)

Reflection: test(10) x 10000000
  elapsed time: 124ms
  elapsed time: 41ms
  elapsed time: 41ms
  elapsed time: 40ms
  elapsed time: 40ms
Reflection: loop in sub-method: test(10) x 10000000
  elapsed time: 58ms
  elapsed time: 36ms
  elapsed time: 22ms
  elapsed time: 22ms
  elapsed time: 22ms
Reflection: test(1..10000000)
  elapsed time: 2646ms
  elapsed time: 2523ms
  elapsed time: 2500ms
  elapsed time: 2521ms
  elapsed time: 2476ms
Reflection: loop in sub-method: test(1..10000000)
  elapsed time: 77ms
  elapsed time: 53ms
  elapsed time: 38ms
  elapsed time: 41ms
  elapsed time: 42ms
result: 9999999

                                     
2013-10-16



Hardware and Software, Engineered to Work Together