JDK-6763452 : Loop induction variable incremented on back edge
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 6,9,10
  • Priority: P3
  • Status: Open
  • Resolution: Unresolved
  • OS: generic
  • CPU: x86
  • Submitted: 2008-10-24
  • Updated: 2020-11-03
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdUnresolved
Related Reports
Relates :  
Description
FULL PRODUCT VERSION :
java version "1.6.0_10-rc"
Java(TM) SE Runtime Environment (build 1.6.0_10-rc-b28)
Java HotSpot(TM) 64-Bit Server VM (build 11.0-b15, mixed mode)

FULL OS VERSION :
Linux mac.config 2.6.25.9-76.fc9.x86_64 #1 SMP Fri Jun 27 15:58:30 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux

EXTRA RELEVANT SYSTEM CONFIGURATION :
Since this is a JIT optimization issue, the CPU architecture might be useful, my machine is a dual quad core Xeon 5400 series 2.8Ghz (Mac Pro)

A DESCRIPTION OF THE PROBLEM :
When I run the code added to the "Source code for an executable test case" section, the performance from the third iteration onwards is worse than the first two. Based on the output of -XX:+PrintCompilation it seems like the method is JIT'd a second time causing the slowdown. That seems like a bug. Note that this is only relevant with -server and the 64-bit JVM.

THE PROBLEM WAS REPRODUCIBLE WITH -Xint FLAG: Did not try

THE PROBLEM WAS REPRODUCIBLE WITH -server FLAG: Yes

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Run the test case with -server and a 64-bit JVM.

EXPECTED VERSUS ACTUAL BEHAVIOR :
I expect each iteration in the output to perform at similar or better times than the previous one. Instead I get:

Time: 4915
Value: 10000000000
Time: 4869
Value: 10000000000
Time: 7183
Value: 10000000000
Time: 7186
Value: 10000000000
Time: 7184
Value: 10000000000
Time: 7193
Value: 10000000000
Time: 7186
Value: 10000000000
Time: 7190
Value: 10000000000
Time: 7204
Value: 10000000000
Time: 7210
Value: 10000000000

Note that the time after the second iteration is much worse.
ERROR MESSAGES/STACK TRACES THAT OCCUR :
The hotspot.log generated from -XX:+LogCompilation is:

<?xml version='1.0' encoding='UTF-8'?>
<hotspot_log version='160 1' process='519' time_ms='1222474495317'>
<vm_version>
<name>
Java HotSpot(TM) 64-Bit Server VM
</name>
<release>
11.0-b15
</release>
<info>
Java HotSpot(TM) 64-Bit Server VM (11.0-b15) for linux-amd64 JRE (1.6.0_10-rc-b28), built on Jul 21 2008 01:02:51 by &quot;java_re&quot; with gcc 3.2.2 (SuSE Linux)
</info>
</vm_version>
<vm_arguments>
<args>
-Xbatch -XX:+PrintCompilation -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation -XX:MaxPermSize=256m -Xms312m -Xmx5500m -XX:+UseConcMarkSweepGC -Dfile.encoding=UTF-8
</args>
<command>
test.PerfTest
</command>
<launcher>
SUN_STANDARD
</launcher>
<properties>
java.vm.specification.version=1.0
java.vm.specification.name=Java Virtual Machine Specification
java.vm.specification.vendor=Sun Microsystems Inc.
java.vm.version=11.0-b15
java.vm.name=Java HotSpot(TM) 64-Bit Server VM
java.vm.vendor=Sun Microsystems Inc.
java.vm.info=mixed mode, sharing
java.ext.dirs=/usr/java/jdk1.6.0_10/jre/lib/ext:/usr/java/packages/lib/ext
java.endorsed.dirs=/usr/java/jdk1.6.0_10/jre/lib/endorsed
sun.boot.library.path=/usr/java/jdk1.6.0_10/jre/lib/amd64
java.library.path=/usr/java/jdk1.6.0_10/jre/lib/amd64/server:/usr/java/jdk1.6.0_10/jre/lib/amd64:/usr/java/jdk1.6.0_10/jre/../lib/amd64:/usr/java/jdk1.6.0_10.i386/jre/lib/i386/server:/usr/java/jdk1.6.0_10.i386/jre/lib/i386::/usr/lib64/firefox-2.0.0.1:/usr/lib64/firefox-2.0.0.1:/usr/lib64/firefox-2.0.0.1:/usr/java/packages/lib/amd64:/lib:/usr/lib
java.home=/usr/java/jdk1.6.0_10/jre
java.class.path=/home/ijuma/workspaces/likecube/test/bin
sun.boot.class.path=/usr/java/jdk1.6.0_10/jre/lib/resources.jar:/usr/java/jdk1.6.0_10/jre/lib/rt.jar:/usr/java/jdk1.6.0_10/jre/lib/sunrsasign.jar:/usr/java/jdk1.6.0_10/jre/lib/jsse.jar:/usr/java/jdk1.6.0_10/jre/lib/jce.jar:/usr/java/jdk1.6.0_10/jre/lib/charsets.jar:/usr/java/jdk1.6.0_10/jre/classes
file.encoding=UTF-8
sun.java.launcher=SUN_STANDARD
</properties>
</vm_arguments>
<tty>
<writer thread='44530000'/>
<thread_logfile thread='44530000' filename='/tmp/hs_c44530000_pid519.log'/>
<writer thread='43477328'/>
<thread_logfile thread='43477328' filename='/tmp/hs_c43477328_pid519.log'/>
<writer thread='12933456'/>
<task_queued compile_id='1' compile_kind='osr' method='test/PerfTest inner ()J' bytes='19' count='1' backedge_count='14563' iicount='1' osr_bci='5' blocking='1' stamp='0.050' comment='backedge_count' hot_count='14563'/>
<writer thread='43477328'/>
  1%  b   test.PerfTest::inner @ 5 (19 bytes)
<nmethod compile_id='1' compile_kind='osr' compiler='C2' entry='0x000000002ee0d000' size='520' address='0x000000002ee0ced0' relocation_offset='264' code_offset='304' stub_offset='400' consts_offset='420' scopes_data_offset='424' scopes_pcs_offset='456' dependencies_offset='504' oops_offset='512' method='test/PerfTest inner ()J' bytes='19' count='1' backedge_count='14563' iicount='1' stamp='0.054'/>
<writer thread='12933456'/>
<task_queued compile_id='1' method='test/PerfTest inner ()J' bytes='19' count='2' backedge_count='5000' iicount='2' blocking='1' stamp='4.965' comment='count' hot_count='2'/>
<writer thread='43477328'/>
  1   b   test.PerfTest::inner (19 bytes)
<nmethod compile_id='1' compiler='C2' entry='0x000000002ee0d240' size='488' address='0x000000002ee0d110' relocation_offset='264' code_offset='304' stub_offset='368' consts_offset='388' scopes_data_offset='392' scopes_pcs_offset='424' dependencies_offset='472' oops_offset='480' method='test/PerfTest inner ()J' bytes='19' count='2' backedge_count='5000' iicount='2' stamp='4.966'/>
<writer thread='41371984'/>
<destroy_vm stamp='67.400'/>
<tty_done stamp='67.400'/>
</tty>
<compilation_log thread='43477328'>
<start_compile_thread thread='43477328' process='519' stamp='0.046'/>
<task compile_id='1' compile_kind='osr' method='test/PerfTest inner ()J' bytes='19' count='1' backedge_count='14563' iicount='1' osr_bci='5' blocking='1' stamp='0.050'>
<phase name='matcher' nodes='47' stamp='0.051'>
<phase_done nodes='32' stamp='0.051'/>
</phase>
<phase name='regalloc' nodes='37' stamp='0.051'>
<regalloc attempts='0' success='1'/>
<phase_done nodes='47' stamp='0.051'/>
</phase>
<phase name='output' nodes='47' stamp='0.051'>
<phase_done nodes='59' stamp='0.051'/>
</phase>
<phase name='matcher' nodes='48' stamp='0.052'>
<phase_done nodes='33' stamp='0.052'/>
</phase>
<phase name='regalloc' nodes='38' stamp='0.052'>
<regalloc attempts='0' success='1'/>
<phase_done nodes='50' stamp='0.052'/>
</phase>
<phase name='output' nodes='50' stamp='0.052'>
<phase_done nodes='62' stamp='0.052'/>
</phase>
<phase name='matcher' nodes='49' stamp='0.052'>
<phase_done nodes='34' stamp='0.052'/>
</phase>
<phase name='regalloc' nodes='39' stamp='0.052'>
<regalloc attempts='0' success='1'/>
<phase_done nodes='53' stamp='0.052'/>
</phase>
<phase name='output' nodes='53' stamp='0.052'>
<phase_done nodes='65' stamp='0.052'/>
</phase>
<phase name='matcher' nodes='50' stamp='0.052'>
<phase_done nodes='35' stamp='0.052'/>
</phase>
<phase name='regalloc' nodes='40' stamp='0.052'>
<regalloc attempts='0' success='1'/>
<phase_done nodes='56' stamp='0.052'/>
</phase>
<phase name='output' nodes='56' stamp='0.052'>
<phase_done nodes='68' stamp='0.052'/>
</phase>
<phase name='matcher' nodes='51' stamp='0.052'>
<phase_done nodes='36' stamp='0.052'/>
</phase>
<phase name='regalloc' nodes='41' stamp='0.052'>
<regalloc attempts='0' success='1'/>
<phase_done nodes='59' stamp='0.052'/>
</phase>
<phase name='output' nodes='59' stamp='0.052'>
<phase_done nodes='71' stamp='0.052'/>
</phase>
<phase name='matcher' nodes='52' stamp='0.052'>
<phase_done nodes='37' stamp='0.052'/>
</phase>
<phase name='regalloc' nodes='42' stamp='0.053'>
<regalloc attempts='0' success='1'/>
<phase_done nodes='62' stamp='0.053'/>
</phase>
<phase name='output' nodes='62' stamp='0.053'>
<phase_done nodes='74' stamp='0.053'/>
</phase>
<phase name='matcher' nodes='43' stamp='0.053'>
<phase_done nodes='31' stamp='0.053'/>
</phase>
<phase name='regalloc' nodes='36' stamp='0.053'>
<regalloc attempts='0' success='1'/>
<phase_done nodes='46' stamp='0.053'/>
</phase>
<phase name='output' nodes='46' stamp='0.053'>
<phase_done nodes='58' stamp='0.053'/>
</phase>
<phase name='matcher' nodes='47' stamp='0.053'>
<phase_done nodes='33' stamp='0.053'/>
</phase>
<phase name='regalloc' nodes='38' stamp='0.053'>
<regalloc attempts='0' success='1'/>
<phase_done nodes='53' stamp='0.053'/>
</phase>
<phase name='output' nodes='53' stamp='0.053'>
<phase_done nodes='65' stamp='0.053'/>
</phase>
<phase name='matcher' nodes='46' stamp='0.053'>
<phase_done nodes='34' stamp='0.053'/>
</phase>
<phase name='regalloc' nodes='39' stamp='0.053'>
<regalloc attempts='0' success='1'/>
<phase_done nodes='55' stamp='0.053'/>
</phase>
<phase name='output' nodes='55' stamp='0.053'>
<phase_done nodes='67' stamp='0.053'/>
</phase>
<phase name='matcher' nodes='42' stamp='0.053'>
<phase_done nodes='30' stamp='0.053'/>
</phase>
<phase name='regalloc' nodes='35' stamp='0.053'>
<regalloc attempts='0' success='1'/>
<phase_done nodes='43' stamp='0.053'/>
</phase>
<phase name='output' nodes='43' stamp='0.053'>
<phase_done nodes='55' stamp='0.053'/>
</phase>
<phase name='parse' nodes='3' stamp='0.054'>
<type id='489' name='long'/>
<klass id='594' name='test/PerfTest' flags='1'/>
<method id='595' holder='594' name='inner' return='489' flags='8' bytes='19' iicount='1'/>
<parse method='595' uses='1' osr_bci='5' stamp='0.054'>
<bc code='155' bci='14'/>
<branch target_bci='5' taken='11264' not_taken='0' cnt='11264' prob='always'/>
<parse_done nodes='63' memory='26784' stamp='0.054'/>
</parse>
<phase_done nodes='64' stamp='0.054'/>
</phase>
<phase name='optimizer' nodes='64' stamp='0.054'>
<phase name='idealLoop' nodes='64' stamp='0.054'>
<phase_done nodes='68' stamp='0.054'/>
</phase>
<phase name='idealLoop' nodes='68' stamp='0.054'>
<phase_done nodes='68' stamp='0.054'/>
</phase>
<phase name='idealLoop' nodes='68' stamp='0.054'>
<phase_done nodes='68' stamp='0.054'/>
</phase>
<phase name='ccp' nodes='68' stamp='0.054'>
<phase_done nodes='68' stamp='0.054'/>
</phase>
<phase name='idealLoop' nodes='68' stamp='0.054'>
<phase_done nodes='71' stamp='0.054'/>
</phase>
<phase_done nodes='71' stamp='0.054'/>
</phase>
<phase name='matcher' nodes='71' stamp='0.054'>
<phase_done nodes='29' stamp='0.054'/>
</phase>
<phase name='regalloc' nodes='36' stamp='0.054'>
<regalloc attempts='0' success='1'/>
<phase_done nodes='41' stamp='0.054'/>
</phase>
<phase name='output' nodes='41' stamp='0.054'>
<phase_done nodes='56' stamp='0.054'/>
</phase>
<task_done success='1' nmsize='120' count='1' backedge_count='14563' stamp='0.054'/>
</task>
<task compile_id='1' method='test/PerfTest inner ()J' bytes='19' count='2' backedge_count='5000' iicount='2' blocking='1' stamp='4.965'>
<phase name='parse' nodes='3' stamp='4.966'>
<type id='489' name='long'/>
<klass id='594' name='test/PerfTest' flags='1'/>
<method id='595' holder='594' name='inner' return='489' flags='8' bytes='19' iicount='2'/>
<parse method='595' uses='2' stamp='4.966'>
<bc code='155' bci='14'/>
<branch target_bci='5' taken='11264' not_taken='0' cnt='11264' prob='always'/>
<bc code='155' bci='14'/>
<branch target_bci='5' taken='11264' not_taken='0' cnt='11264' prob='always'/>
<parse_done nodes='66' memory='28240' stamp='4.966'/>
</parse>
<phase_done nodes='67' stamp='4.966'/>
</phase>
<phase name='optimizer' nodes='67' stamp='4.966'>
<phase name='idealLoop' nodes='67' stamp='4.966'>
<phase_done nodes='72' stamp='4.966'/>
</phase>
<phase name='idealLoop' nodes='72' stamp='4.966'>
<phase_done nodes='72' stamp='4.966'/>
</phase>
<phase name='idealLoop' nodes='72' stamp='4.966'>
<phase_done nodes='72' stamp='4.966'/>
</phase>
<phase name='ccp' nodes='72' stamp='4.966'>
<phase_done nodes='72' stamp='4.966'/>
</phase>
<phase name='idealLoop' nodes='73' stamp='4.966'>
<phase_done nodes='73' stamp='4.966'/>
</phase>
<phase_done nodes='73' stamp='4.966'/>
</phase>
<phase name='matcher' nodes='73' stamp='4.966'>
<phase_done nodes='22' stamp='4.966'/>
</phase>
<phase name='regalloc' nodes='28' stamp='4.966'>
<regalloc attempts='0' success='1'/>
<phase_done nodes='28' stamp='4.966'/>
</phase>
<phase name='output' nodes='28' stamp='4.966'>
<phase_done nodes='42' stamp='4.966'/>
</phase>
<task_done success='1' nmsize='88' count='2' backedge_count='5000' stamp='4.966'/>
</task>
</compilation_log>
<compilation_log thread='44530000'>
<fragment>
<![CDATA[
<start_compile_thread thread='44530000' process='519' stamp='0.045'/>
]]>
</fragment>
</compilation_log>
<hotspot_log_done stamp='67.400'/>
</hotspot_log>

REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
public class PerfTest {

  public static void main(String[] args) {
    test();
  }

  static void test() {
    for (int i = 0; i < 10; ++i) {
      long time = System.currentTimeMillis();
      long result = inner();
      System.out.println("Time: " + (System.currentTimeMillis() - time));
      System.out.println("Value: " + result);
    }
  }
  
  static long inner() {
    long i = 0L;
    for (; i < 10000000000L; ++i);
    return i;
  }
}
---------- END SOURCE ----------

Comments
Have to check if Graal has the same problem.
08-03-2018

The problem is that loop variable in the test case is Long type which we don't optimize. But we can at least optimize empty loops with constant boundaries. Not for int loop variables we use Long arithmetic to calculate final values and fold loops. That is why we can't do general solution for loop with Long variable (non constant) boundaries - we can't use long expressions - 64 bit is not enough. I thought we had other RFE regarding Long loop variable optimization but I found only one issue JDK-8149745. Anyway this optimization will become important only when huge Java arrays will be introduced with Long type indexing. which is not happening soon. We can treat this case as Starter task and optimize only for constant boundaries when we can use Long expressions to calculate final values.
10-03-2017

EVALUATION The supplied test measures the speed on an empty loop as it contains no work. The test case demonstrates the cost of putting work on the back branch of the loop. Prior to recompilation, the server compiler put the iv increment in the loop body, after recompilation, on the back branch. Fixes forthcoming in 6743900 should greatly mitigate the cost of the backbranch code, however, we should investigate better placement of the iv increment
24-10-2008