JDK-6565458 : 10x performance regression on x86 in micro test with Reflection.
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 7,9,10
  • Priority: P4
  • Status: Open
  • Resolution: Unresolved
  • OS: generic
  • CPU: x86
  • Submitted: 2007-06-04
  • Updated: 2018-10-05
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdUnresolved
Related Reports
Relates :  
Relates :  
Relates :  
Description
During investigation of old bug 4939638 I found the next regression
caused by 20061023120554.jrose.dolphin-cleanups putback.
I attached the java file and the class file compiled by jikes.

hsdev-5% cp_prt6 20061020142756.sgoldman.6481940/ i486 2 product
hsdev-5% gamma -XX:LoopUnrollLimit=0 -XX:LoopOptsCount=0 Micro
Reflection: test(10) x 10000000
  elapsed time: 924ms
  elapsed time: 616ms
  elapsed time: 582ms
  elapsed time: 581ms
  elapsed time: 584ms
Reflection: loop in sub-method: test(10) x 10000000
  elapsed time: 630ms
  elapsed time: 637ms
  elapsed time: 584ms
  elapsed time: 583ms
  elapsed time: 583ms
Reflection: test(1..10000000)
  elapsed time: 983ms
  elapsed time: 956ms
  elapsed time: 960ms
  elapsed time: 957ms
  elapsed time: 960ms
Reflection: loop in sub-method: test(1..10000000)
  elapsed time: 975ms
  elapsed time: 933ms
  elapsed time: 950ms
  elapsed time: 952ms
  elapsed time: 977ms
result: 9999999

hsdev-5% cp_prt6 20061023120554.jrose.dolphin-cleanups/ i486 2 product
hsdev-5% gamma -XX:LoopUnrollLimit=0 -XX:LoopOptsCount=0 Micro
Reflection: test(10) x 10000000
  elapsed time: 7461ms
  elapsed time: 7363ms
  elapsed time: 7350ms
  elapsed time: 7359ms
  elapsed time: 7351ms
Reflection: loop in sub-method: test(10) x 10000000
  elapsed time: 7340ms
  elapsed time: 7339ms
  elapsed time: 7342ms
  elapsed time: 7340ms
  elapsed time: 7326ms
Reflection: test(1..10000000)
  elapsed time: 7690ms
  elapsed time: 7691ms
  elapsed time: 7694ms
  elapsed time: 7680ms
  elapsed time: 7676ms
Reflection: loop in sub-method: test(1..10000000)
  elapsed time: 7455ms
  elapsed time: 7431ms
  elapsed time: 7420ms
  elapsed time: 7401ms
  elapsed time: 7406ms
result: 9999999

Comments
We should look on Christian results before closing this.
10-03-2017

This is somewhat fixed but not really. Why is "Reflection: test(1..10000000)" slower than in the original report? $ java -showversion Micro java version "1.8.0-ea" Java(TM) SE Runtime Environment (build 1.8.0-ea-b111) Java HotSpot(TM) 64-Bit Server VM (build 25.0-b53, mixed mode) Reflection: test(10) x 10000000 elapsed time: 124ms elapsed time: 41ms elapsed time: 41ms elapsed time: 40ms elapsed time: 40ms Reflection: loop in sub-method: test(10) x 10000000 elapsed time: 58ms elapsed time: 36ms elapsed time: 22ms elapsed time: 22ms elapsed time: 22ms Reflection: test(1..10000000) elapsed time: 2646ms elapsed time: 2523ms elapsed time: 2500ms elapsed time: 2521ms elapsed time: 2476ms Reflection: loop in sub-method: test(1..10000000) elapsed time: 77ms elapsed time: 53ms elapsed time: 38ms elapsed time: 41ms elapsed time: 42ms result: 9999999
16-10-2013

EVALUATION Compiled method is larger in newer JVM, because of increased size of uncommon (cold) paths. The hot code is about the same size, but our heuristic does not measure hot code size. It should; this is an old problem. See comparative nmethod dumps below. A reasonable solution would be to record, on the nmethod, an offset which estimates the end of the hot code. (The compiler attempts to lay all cold code out after all hot code.) This offset can be estimated (for example) in Compile::Output, near the comment which says "Insert epilogs before every return", by noting the offset of the latest return (usually the only return). Note that AMD64 does not manifest this bug because the invoke call is *never* inlined, because InlineSmallCode is too small for a 64-bit machine. This also should be fixed. In this case, the extra cold code comes from a path taken just once. The older compiler treats this path as never-taken, because the MDO has no record of the path. But the newer compiler treats the path as seldom-taken (which is more accurate), and therefore emits cold code for it, calling ensureMemberAccess from Method.invoke (at bci 111). This extra cold code bloats the compiled method and fools the inlining heuristic. --- Comparative nmethod dumps --- Bad version of code: Compiled 4 nmethod java.lang.reflect.Method::invoke (167 bytes) total in heap [0xfa2ce588,0xfa2cf050] = 2760 relocation [0xfa2ce634,0xfa2ce698] = 100 main code [0xfa2ce6a0,0xfa2ceba0] = 1280 (** greater than InlineSmallCode) main code, hot part = 531 (** up to RET opcode only) exception code [0xfa2ceba0,0xfa2cebc0] = 32 stub code [0xfa2cebc0,0xfa2cebe8] = 40 scopes data [0xfa2cebe8,0xfa2cedfc] = 532 scopes pcs [0xfa2cedfc,0xfa2cefbc] = 448 dependencies [0xfa2cefbc,0xfa2cefc0] = 4 handler table [0xfa2cefc0,0xfa2cf020] = 96 nul chk table [0xfa2cf020,0xfa2cf044] = 36 oops [0xfa2cf044,0xfa2cf050] = 12 Good version of code: Compiled 4 nmethod java.lang.reflect.Method::invoke (167 bytes) total in heap [0xfa4cec88,0xfa4cf510] = 2184 relocation [0xfa4ced34,0xfa4ced8c] = 88 main code [0xfa4ceda0,0xfa4cf100] = 864 (** less than InlineSmallCode) main code, hot part = 519 (** up to RET opcode only) exception code [0xfa4cf100,0xfa4cf120] = 32 stub code [0xfa4cf120,0xfa4cf13e] = 30 constants [0xfa4cf13e,0xfa4cf140] = 2 scopes data [0xfa4cf140,0xfa4cf30c] = 460 scopes pcs [0xfa4cf30c,0xfa4cf49c] = 400 dependencies [0xfa4cf49c,0xfa4cf4a0] = 4 handler table [0xfa4cf4a0,0xfa4cf4e8] = 72 nul chk table [0xfa4cf4e8,0xfa4cf504] = 28 oops [0xfa4cf504,0xfa4cf510] = 12
05-06-2007