JDK-6932855 : Save superfluous CMP instruction from while loop
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 7,8,9
  • Priority: P4
  • Status: Closed
  • Resolution: Won't Fix
  • OS: generic
  • CPU: x86
  • Submitted: 2010-03-08
  • Updated: 2016-12-19
  • Resolved: 2014-02-11
Description
A DESCRIPTION OF THE REQUEST :
1.) Compare last lines from EXPECTED section against ACTUAL.
    Loop termination doesn't need to compare the counter against constant -1.
2.) At PC 0x00b8838a in compiled code you can see superfluous comparison.
3.) Additionally the swapping of counter n between %ebx and %ecx seems superfluous boilerplate.
4.) Pushing all values o1, o2, v1 liberately to stack seems avoidable if registers were better assigned, e.g. %eax is never used.

Using JDK7 fastdebug build b84.

JUSTIFICATION :
1.) Especially short loops would perform better.
2.-4.) Could perform better too and saves footprint.


EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -

_> load v2[o2+n] in %ebp
  0x00b88404: movzwl 0xc(%esi,%ecx,2),%ebp  ;*caload
                                        ; - java.lang.String::equals4@80 (line 1146)
_> load v1[o1+n] in %ebx
  0x00b88409: movzwl 0xc(%edx,%ecx,2),%ebx  ;*caload
                                        ; - java.lang.String::equals4@73 (line 1146)
  0x00b8840e: cmp    %ebp,%ebx
  0x00b88410: jne    0x00b8841d         ;*if_icmpeq
                                        ; - java.lang.String::equals4@81 (line 1146)
_> decrement --> n--
  0x00b88412: dec    %ecx               ;*iinc
                                        ; - java.lang.String::equals4@60 (line 1144)
_> break while_1 if n < 0
  0x00b88413: jge     0x00b88404         ;*ifeq
                                        ; - java.lang.String::equals4@33 (line 1141)
  0x00b88418:


ACTUAL -

- this          in %edx
- anotherString in %ebp
_> load count --> countLocal in %edi
  0x00b882ff: mov    0x10(%edx),%edi    ;*getfield count
                                        ; - java.lang.String::equals4@20 (line 1139)
_> load anotherString.count in %ecx
  0x00b88302: mov    0x10(%ebp),%ecx
  0x00b88305: cmp    %ecx,%edi
  0x00b88307: jne    0x00b8841d         ;*if_icmpne
                                        ; - java.lang.String::equals4@29 (line 1140)
_> return true if countLocal == 0 (seems to be superfluous)
  0x00b8830d: test   %edi,%edi
  0x00b8830f: je     0x00b88418         ;*ifeq
                                        ; - java.lang.String::equals4@33 (line 1141)
_> copy countLocal --> n-- in %ebx
  0x00b88315: mov    %edi,%ebx
  0x00b88317: dec    %ebx               ;*iinc
                                        ; - java.lang.String::equals4@60 (line 1144)
_> break while() (return true) if n < 0
  0x00b88318: test   %ebx,%ebx
  0x00b8831a: jl     0x00b88418         ;*iflt
                                        ; - java.lang.String::equals4@64 (line 1144)
_> load anotherString.offset --> o2 in %ecx + 0x4(%esp)
  0x00b88320: mov    0xc(%ebp),%ecx     ;*getfield offset
                                        ; - java.lang.String::equals4@55 (line 1143)
  0x00b88323: mov    %ecx,0x4(%esp)
_> move anotherString from %ebp in %ecx
  0x00b88327: mov    %ebp,%ecx
_> load value --> v1 in %esi + 0x8(%esp)
  0x00b88329: mov    0x8(%edx),%esi     ;*getfield value
                                        ; - java.lang.String::equals4@37 (line 1142)
  0x00b8832c: mov    %esi,0x8(%esp)
_> load offset --> o1 in %ebp + 0xc(%esp)
  0x00b88330: mov    0xc(%edx),%ebp     ;*getfield offset
                                        ; - java.lang.String::equals4@49 (line 1143)
  0x00b88333: mov    %ebp,0xc(%esp)
_> load anotherString.value --> v2 in %ebp
  0x00b88337: mov    0x8(%ecx),%ebp     ;*getfield value
                                        ; - java.lang.String::equals4@43 (line 1142)
_> copy v1.length in %ecx via %esi
  0x00b8833a: mov    0x8(%esi),%ecx     ; implicit exception: dispatches to 0x00b88428
_> copy countLocal to %esi
  0x00b8833d: mov    %edi,%esi
_> check o1 + countLocal - 1 < v1.length
  0x00b8833f: add    0xc(%esp),%esi
  0x00b88343: dec    %esi
  0x00b88344: cmp    %ecx,%esi
  0x00b88346: jae    0x00b88428

_> copy v2.length in %edx via %ebp
  0x00b8834c: mov    0x8(%ebp),%edx     ; implicit exception: dispatches to 0x00b88428
_> copy countLocal to %ecx
  0x00b8834f: mov    %edi,%ecx
_> check o2 + countLocal - 1 < v2.length
  0x00b88351: add    0x4(%esp),%ecx
  0x00b88355: dec    %ecx
  0x00b88356: cmp    %edx,%ecx
  0x00b88358: jae    0x00b88428

_> copy &v2[o2] in %esi via %ecx
  0x00b8835e: mov    0x4(%esp),%ecx
  0x00b88362: lea    0x0(%ebp,%ecx,2),%esi
_> copy &v1[o1] in %edx via %ecx
  0x00b88366: mov    0x8(%esp),%ebp
  0x00b8836a: mov    0xc(%esp),%ecx
  0x00b8836e: lea    0x0(%ebp,%ecx,2),%edx
_> decrement countLocal -= 2 in %edi
  0x00b88372: add    $0xfffffffe,%edi   ;*aload
                                        ; - java.lang.String::equals4@67 (line 1146)
_> load v2[o2+n] in %ebp
  0x00b88375: movzwl 0xc(%esi,%ebx,2),%ebp  ;*caload
                                        ; - java.lang.String::equals4@80 (line 1146)
_> load v1[o1+n] in %ecx
  0x00b8837a: movzwl 0xc(%edx,%ebx,2),%ecx  ;*caload
                                        ; - java.lang.String::equals4@73 (line 1146)
  0x00b8837f: cmp    %ebp,%ecx
  0x00b88381: jne    0x00b8841d         ;*if_icmpeq
                                        ; - java.lang.String::equals4@81 (line 1146)
_> decrement --> n-- via %ecx
  0x00b88387: mov    %ebx,%ecx
  0x00b88389: dec    %ecx               ;*iinc
                                        ; - java.lang.String::equals4@60 (line 1144)
_> break while_1 if n <= countLocal (occurs always, so cmp is superfluos)
  0x00b8838a: cmp    %edi,%ecx
  0x00b8838c: jle    0x00b88392  (n equals countLocal after 1st loop iteration)
  0x00b8838e: mov    %ecx,%ebx
  0x00b88390: jmp    0x00b88375  (...so this line will never be reached)
  0x00b88392:
  .....
  .....
  .....
_> load v2[o2+n] in %ebp
  0x00b88404: movzwl 0xc(%esi,%ecx,2),%ebp  ;*caload
                                        ; - java.lang.String::equals4@80 (line 1146)
_> load v1[o1+n] in %ebx
  0x00b88409: movzwl 0xc(%edx,%ecx,2),%ebx  ;*caload
                                        ; - java.lang.String::equals4@73 (line 1146)
  0x00b8840e: cmp    %ebp,%ebx
  0x00b88410: jne    0x00b8841d         ;*if_icmpeq
                                        ; - java.lang.String::equals4@81 (line 1146)
_> decrement --> n--
  0x00b88412: dec    %ecx               ;*iinc
                                        ; - java.lang.String::equals4@60 (line 1144)
_> break while_1 if n <= -1
  0x00b88413: cmp    $0xffffffff,%ecx
  0x00b88416: jg     0x00b88404         ;*ifeq
                                        ; - java.lang.String::equals4@33 (line 1141)
  0x00b88418:



---------- BEGIN SOURCE ----------

    public boolean equals4(Object anObject) {
        ...;
            int n = count;
            if (n == anotherString.count) { // 3rd check lengths
                if (n != 0) { // 4th avoid loading registers from members if length == 0
                    char[] v1 = value, v2 = anotherString.value;
                    int o1 = offset, o2 = anotherString.offset;
                    while (--n >= 0) // only decrement 1 register
                        // to benefit from CPU's complex addressing modes
                        if (v1[o1+n] != v2[o2+n]) // compare the chars backwards
                            return false;
                }
                return true;
            }
        ...;
    }


---------- END SOURCE ----------

Comments
See Vladimir's comment
11-02-2014

Unfortunately, it is limitation of our JIT. In our IR nodes can't produce arithmetic and flag results at the same time. Rickard recently tried to do that for MathExact but have to use different approach at the end. I doubt we will be able to fix it. Note, modern x86 cpus fuse tst+jcc instructions into one micro-instruction.
10-02-2014

This simple example reproduces the superfluous CMP instruction. 00b MOV EBX,java/lang/Class:exact * 010 MOV EDI,[EBX + #80] # int ! Field: volatile Test.X 013 MEMBAR-acquire ! (empty encoding) 013 DEC EDI 014 TEST EDI,EDI 016 Jge,s B3 P=0.000001 C=-1.000000
10-02-2014