JDK-7097546 : Optimize use of CMOVE instructions
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 8-pool
  • Priority: P4
  • Status: Closed
  • Resolution: Fixed
  • OS: generic
  • CPU: generic
  • Submitted: 2011-10-03
  • Updated: 2014-02-13
  • Resolved: 2012-01-23
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 7 JDK 8 Other
7u4Fixed 8Fixed hs23Fixed
Related Reports
Relates :  
Relates :  
Performance testing of 6890673 implementation showed a regression in scimark.Monte:

  Benchmark         Samples        Mean     Stdev   %Diff    P   Significant
    Monte                20      411.60      1.49  -14.91 0.000          Yes

By analyzing generated code I found that it is caused by generated CMOVE instruction:

338   	movl    RBX, R10	# spill
33b   	decl    RBX	# int
33d   	testl   R10, R10
340   	movl    R10, RBX	# spill
343   	cmovle R10, RDX	# signed, int

instead of branch and increment with infrequent code (movl R10, #16) moved from hot path by BlockLayoutByFrequency optimization:

298   B44: #	B56 B45 <- B43  Freq: 69040
298   	testl   R10, R10
29b   	je     B56  P=0.058864 C=8749.000000
2a1   B45: #	B46 <- B44  Freq: 64976.1
2a1   	decl    R10	# int
2a4   B46: #	B58 B47 <- B45 B56  Freq: 69040


34a   B56: #	B46 <- B44  Freq: 4063.96
34a   	movl    R10, #16	# int
350   	jmp     B46

EVALUATION http://hg.openjdk.java.net/lambda/lambda/hotspot/rev/d8cb48376797


EVALUATION http://hg.openjdk.java.net/hsx/hotspot-emb/hotspot/rev/d8cb48376797

EVALUATION http://hg.openjdk.java.net/hsx/hotspot-rt/hotspot/rev/d8cb48376797

EVALUATION Avoid CMove in a loop if possible. May generate CMove if it could be moved outside a loop. Don't generated CMoveD/CmoveF: it is expensive to compute both float/double values + cmove. Note, on x86 when SSE>=2 (all modern cpus) CMoveD/CMoveF mach instructions are implemented as jmp+move. Don't generated CMove when BlockLayoutByFrequency optimization moves infrequent branch from hot path. Added Cmove mach instructions with jmp+move to x86 when there is no HW cmove instruction. Main part of changes in loopopts.cpp is coding style correction. Print size of compiled method and compilation time when PrintCompilation and PrintInlining are specified on command line. I thought first to print it with just PrintCompilation but it will double output. No effect on refworkload but it will help later for 6890673 fix. Verified with microbenchmark I wrote (attached to the bug report).

EVALUATION http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/d8cb48376797

EVALUATION Avoid CMOVE if possible. May generate CMOVE if it could be moved outside a loop.