United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
Bug ID: JDK-7097546 Optimize use of CMOVE instructions
JDK-7097546 : Optimize use of CMOVE instructions

Details
Type:
Enhancement
Submit Date:
2011-10-03
Status:
Closed
Updated Date:
2014-02-13
Project Name:
JDK
Resolved Date:
2012-01-23
Component:
hotspot
OS:
generic
Sub-Component:
compiler
CPU:
generic
Priority:
P4
Resolution:
Fixed
Affected Versions:
8-pool
Fixed Versions:
hs23 (b06)

Related Reports
Backport:
Backport:
Relates:
Relates:

Sub Tasks

Description
Performance testing of 6890673 implementation showed a regression in scimark.Monte:

  Benchmark         Samples        Mean     Stdev   %Diff    P   Significant
    Monte                20      411.60      1.49  -14.91 0.000          Yes

By analyzing generated code I found that it is caused by generated CMOVE instruction:

338   	movl    RBX, R10	# spill
33b   	decl    RBX	# int
33d   	testl   R10, R10
340   	movl    R10, RBX	# spill
343   	cmovle R10, RDX	# signed, int

instead of branch and increment with infrequent code (movl R10, #16) moved from hot path by BlockLayoutByFrequency optimization:

298   B44: #	B56 B45 <- B43  Freq: 69040
298   	testl   R10, R10
29b   	je     B56  P=0.058864 C=8749.000000
29b
2a1   B45: #	B46 <- B44  Freq: 64976.1
2a1   	decl    R10	# int
2a4
2a4   B46: #	B58 B47 <- B45 B56  Freq: 69040

...

34a   B56: #	B46 <- B44  Freq: 4063.96
34a   	movl    R10, #16	# int
350   	jmp     B46

                                    

Comments
EVALUATION

Avoid CMOVE if possible. May generate CMOVE if it could be moved outside a loop.
                                     
2011-10-21
EVALUATION

Avoid CMove in a loop if possible. May generate CMove if it could be moved outside a loop. Don't generated CMoveD/CmoveF: it is expensive to compute both float/double values + cmove. Note, on x86 when SSE>=2 (all modern cpus) CMoveD/CMoveF mach instructions are implemented as jmp+move. Don't generated CMove when BlockLayoutByFrequency optimization moves infrequent branch from hot path. Added Cmove mach instructions with jmp+move to x86 when there is no HW cmove instruction.

Main part of changes in loopopts.cpp is coding style correction.

Print size of compiled method and compilation time when PrintCompilation and PrintInlining are specified on command line. I thought first to print it with just PrintCompilation but it will double output.

No effect on refworkload but it will help later for 6890673 fix. Verified with microbenchmark I wrote (attached to the bug report).
                                     
2011-10-26
EVALUATION

http://hg.openjdk.java.net/hsx/hotspot-rt/hotspot/rev/d8cb48376797
                                     
2011-11-15
EVALUATION

http://hg.openjdk.java.net/hsx/hotspot-emb/hotspot/rev/d8cb48376797
                                     
2011-11-29
EVALUATION

See main CR
                                     
2011-11-30
EVALUATION

http://hg.openjdk.java.net/lambda/lambda/hotspot/rev/d8cb48376797
                                     
2012-03-22
EVALUATION

http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/d8cb48376797
                                     
2011-10-26



Hardware and Software, Engineered to Work Together