Bug ID: JDK-7097546 Optimize use of CMOVE instructions

Versions (Unresolved/Resolved/Fixed)

The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.

JDK 7	JDK 8	Other
7u4Fixed	8Fixed	hs23Fixed

Performance testing of 6890673 implementation showed a regression in scimark.Monte:

  Benchmark         Samples        Mean     Stdev   %Diff    P   Significant
    Monte                20      411.60      1.49  -14.91 0.000          Yes

By analyzing generated code I found that it is caused by generated CMOVE instruction:

338   	movl    RBX, R10	# spill
33b   	decl    RBX	# int
33d   	testl   R10, R10
340   	movl    R10, RBX	# spill
343   	cmovle R10, RDX	# signed, int

instead of branch and increment with infrequent code (movl R10, #16) moved from hot path by BlockLayoutByFrequency optimization:

298   B44: #	B56 B45 <- B43  Freq: 69040
298   	testl   R10, R10
29b   	je     B56  P=0.058864 C=8749.000000
29b
2a1   B45: #	B46 <- B44  Freq: 64976.1
2a1   	decl    R10	# int
2a4
2a4   B46: #	B58 B47 <- B45 B56  Freq: 69040

...

34a   B56: #	B46 <- B44  Freq: 4063.96
34a   	movl    R10, #16	# int
350   	jmp     B46

EVALUATION http://hg.openjdk.java.net/lambda/lambda/hotspot/rev/d8cb48376797
22-03-2012
EVALUATION See main CR
30-11-2011
EVALUATION http://hg.openjdk.java.net/hsx/hotspot-emb/hotspot/rev/d8cb48376797
29-11-2011
EVALUATION http://hg.openjdk.java.net/hsx/hotspot-rt/hotspot/rev/d8cb48376797
15-11-2011
EVALUATION Avoid CMove in a loop if possible. May generate CMove if it could be moved outside a loop. Don't generated CMoveD/CmoveF: it is expensive to compute both float/double values + cmove. Note, on x86 when SSE>=2 (all modern cpus) CMoveD/CMoveF mach instructions are implemented as jmp+move. Don't generated CMove when BlockLayoutByFrequency optimization moves infrequent branch from hot path. Added Cmove mach instructions with jmp+move to x86 when there is no HW cmove instruction. Main part of changes in loopopts.cpp is coding style correction. Print size of compiled method and compilation time when PrintCompilation and PrintInlining are specified on command line. I thought first to print it with just PrintCompilation but it will double output. No effect on refworkload but it will help later for 6890673 fix. Verified with microbenchmark I wrote (attached to the bug report).
26-10-2011
EVALUATION http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/d8cb48376797
26-10-2011
EVALUATION Avoid CMOVE if possible. May generate CMOVE if it could be moved outside a loop.
21-10-2011

Relates :	JDK-8034833 - Strange performance behaviour of cmov vs branch on x86
Relates :	JDK-6890673 - Eliminate allocations immediately after EA