United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
JDK-6763452 : loop induction variable incremented on back edge

Details
Type:
Enhancement
Submit Date:
2008-10-24
Status:
Open
Updated Date:
2014-06-17
Project Name:
JDK
Resolved Date:
Component:
hotspot
OS:
linux
Sub-Component:
compiler
CPU:
x86
Priority:
P3
Resolution:
Unresolved
Affected Versions:
6
Targeted Versions:

Related Reports
Relates:

Sub Tasks

Description
FULL PRODUCT VERSION :
java version "1.6.0_10-rc"
Java(TM) SE Runtime Environment (build 1.6.0_10-rc-b28)
Java HotSpot(TM) 64-Bit Server VM (build 11.0-b15, mixed mode)

FULL OS VERSION :
Linux mac.config 2.6.25.9-76.fc9.x86_64 #1 SMP Fri Jun 27 15:58:30 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux

EXTRA RELEVANT SYSTEM CONFIGURATION :
Since this is a JIT optimization issue, the CPU architecture might be useful, my machine is a dual quad core Xeon 5400 series 2.8Ghz (Mac Pro)

A DESCRIPTION OF THE PROBLEM :
When I run the code added to the "Source code for an executable test case" section, the performance from the third iteration onwards is worse than the first two. Based on the output of -XX:+PrintCompilation it seems like the method is JIT'd a second time causing the slowdown. That seems like a bug. Note that this is only relevant with -server and the 64-bit JVM.

THE PROBLEM WAS REPRODUCIBLE WITH -Xint FLAG: Did not try

THE PROBLEM WAS REPRODUCIBLE WITH -server FLAG: Yes

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Run the test case with -server and a 64-bit JVM.

EXPECTED VERSUS ACTUAL BEHAVIOR :
I expect each iteration in the output to perform at similar or better times than the previous one. Instead I get:

Time: 4915
Value: 10000000000
Time: 4869
Value: 10000000000
Time: 7183
Value: 10000000000
Time: 7186
Value: 10000000000
Time: 7184
Value: 10000000000
Time: 7193
Value: 10000000000
Time: 7186
Value: 10000000000
Time: 7190
Value: 10000000000
Time: 7204
Value: 10000000000
Time: 7210
Value: 10000000000

Note that the time after the second iteration is much worse.
ERROR MESSAGES/STACK TRACES THAT OCCUR :
The hotspot.log generated from -XX:+LogCompilation is:

<?xml version='1.0' encoding='UTF-8'?>
<hotspot_log version='160 1' process='519' time_ms='1222474495317'>
<vm_version>
<name>
Java HotSpot(TM) 64-Bit Server VM
</name>
<release>
11.0-b15
</release>
<info>
Java HotSpot(TM) 64-Bit Server VM (11.0-b15) for linux-amd64 JRE (1.6.0_10-rc-b28), built on Jul 21 2008 01:02:51 by &quot;java_re&quot; with gcc 3.2.2 (SuSE Linux)
</info>
</vm_version>
<vm_arguments>
<args>
-Xbatch -XX:+PrintCompilation -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation -XX:MaxPermSize=256m -Xms312m -Xmx5500m -XX:+UseConcMarkSweepGC -Dfile.encoding=UTF-8
</args>
<command>
test.PerfTest
</command>
<launcher>
SUN_STANDARD
</launcher>
<properties>
java.vm.specification.version=1.0
java.vm.specification.name=Java Virtual Machine Specification
java.vm.specification.vendor=Sun Microsystems Inc.
java.vm.version=11.0-b15
java.vm.name=Java HotSpot(TM) 64-Bit Server VM
java.vm.vendor=Sun Microsystems Inc.
java.vm.info=mixed mode, sharing
java.ext.dirs=/usr/java/jdk1.6.0_10/jre/lib/ext:/usr/java/packages/lib/ext
java.endorsed.dirs=/usr/java/jdk1.6.0_10/jre/lib/endorsed
sun.boot.library.path=/usr/java/jdk1.6.0_10/jre/lib/amd64
java.library.path=/usr/java/jdk1.6.0_10/jre/lib/amd64/server:/usr/java/jdk1.6.0_10/jre/lib/amd64:/usr/java/jdk1.6.0_10/jre/../lib/amd64:/usr/java/jdk1.6.0_10.i386/jre/lib/i386/server:/usr/java/jdk1.6.0_10.i386/jre/lib/i386::/usr/lib64/firefox-2.0.0.1:/usr/lib64/firefox-2.0.0.1:/usr/lib64/firefox-2.0.0.1:/usr/java/packages/lib/amd64:/lib:/usr/lib
java.home=/usr/java/jdk1.6.0_10/jre
java.class.path=/home/ijuma/workspaces/likecube/test/bin
sun.boot.class.path=/usr/java/jdk1.6.0_10/jre/lib/resources.jar:/usr/java/jdk1.6.0_10/jre/lib/rt.jar:/usr/java/jdk1.6.0_10/jre/lib/sunrsasign.jar:/usr/java/jdk1.6.0_10/jre/lib/jsse.jar:/usr/java/jdk1.6.0_10/jre/lib/jce.jar:/usr/java/jdk1.6.0_10/jre/lib/charsets.jar:/usr/java/jdk1.6.0_10/jre/classes
file.encoding=UTF-8
sun.java.launcher=SUN_STANDARD
</properties>
</vm_arguments>
<tty>
<writer thread='44530000'/>
<thread_logfile thread='44530000' filename='/tmp/hs_c44530000_pid519.log'/>
<writer thread='43477328'/>
<thread_logfile thread='43477328' filename='/tmp/hs_c43477328_pid519.log'/>
<writer thread='12933456'/>
<task_queued compile_id='1' compile_kind='osr' method='test/PerfTest inner ()J' bytes='19' count='1' backedge_count='14563' iicount='1' osr_bci='5' blocking='1' stamp='0.050' comment='backedge_count' hot_count='14563'/>
<writer thread='43477328'/>
  1%  b   test.PerfTest::inner @ 5 (19 bytes)
<nmethod compile_id='1' compile_kind='osr' compiler='C2' entry='0x000000002ee0d000' size='520' address='0x000000002ee0ced0' relocation_offset='264' code_offset='304' stub_offset='400' consts_offset='420' scopes_data_offset='424' scopes_pcs_offset='456' dependencies_offset='504' oops_offset='512' method='test/PerfTest inner ()J' bytes='19' count='1' backedge_count='14563' iicount='1' stamp='0.054'/>
<writer thread='12933456'/>
<task_queued compile_id='1' method='test/PerfTest inner ()J' bytes='19' count='2' backedge_count='5000' iicount='2' blocking='1' stamp='4.965' comment='count' hot_count='2'/>
<writer thread='43477328'/>
  1   b   test.PerfTest::inner (19 bytes)
<nmethod compile_id='1' compiler='C2' entry='0x000000002ee0d240' size='488' address='0x000000002ee0d110' relocation_offset='264' code_offset='304' stub_offset='368' consts_offset='388' scopes_data_offset='392' scopes_pcs_offset='424' dependencies_offset='472' oops_offset='480' method='test/PerfTest inner ()J' bytes='19' count='2' backedge_count='5000' iicount='2' stamp='4.966'/>
<writer thread='41371984'/>
<destroy_vm stamp='67.400'/>
<tty_done stamp='67.400'/>
</tty>
<compilation_log thread='43477328'>
<start_compile_thread thread='43477328' process='519' stamp='0.046'/>
<task compile_id='1' compile_kind='osr' method='test/PerfTest inner ()J' bytes='19' count='1' backedge_count='14563' iicount='1' osr_bci='5' blocking='1' stamp='0.050'>
<phase name='matcher' nodes='47' stamp='0.051'>
<phase_done nodes='32' stamp='0.051'/>
</phase>
<phase name='regalloc' nodes='37' stamp='0.051'>
<regalloc attempts='0' success='1'/>
<phase_done nodes='47' stamp='0.051'/>
</phase>
<phase name='output' nodes='47' stamp='0.051'>
<phase_done nodes='59' stamp='0.051'/>
</phase>
<phase name='matcher' nodes='48' stamp='0.052'>
<phase_done nodes='33' stamp='0.052'/>
</phase>
<phase name='regalloc' nodes='38' stamp='0.052'>
<regalloc attempts='0' success='1'/>
<phase_done nodes='50' stamp='0.052'/>
</phase>
<phase name='output' nodes='50' stamp='0.052'>
<phase_done nodes='62' stamp='0.052'/>
</phase>
<phase name='matcher' nodes='49' stamp='0.052'>
<phase_done nodes='34' stamp='0.052'/>
</phase>
<phase name='regalloc' nodes='39' stamp='0.052'>
<regalloc attempts='0' success='1'/>
<phase_done nodes='53' stamp='0.052'/>
</phase>
<phase name='output' nodes='53' stamp='0.052'>
<phase_done nodes='65' stamp='0.052'/>
</phase>
<phase name='matcher' nodes='50' stamp='0.052'>
<phase_done nodes='35' stamp='0.052'/>
</phase>
<phase name='regalloc' nodes='40' stamp='0.052'>
<regalloc attempts='0' success='1'/>
<phase_done nodes='56' stamp='0.052'/>
</phase>
<phase name='output' nodes='56' stamp='0.052'>
<phase_done nodes='68' stamp='0.052'/>
</phase>
<phase name='matcher' nodes='51' stamp='0.052'>
<phase_done nodes='36' stamp='0.052'/>
</phase>
<phase name='regalloc' nodes='41' stamp='0.052'>
<regalloc attempts='0' success='1'/>
<phase_done nodes='59' stamp='0.052'/>
</phase>
<phase name='output' nodes='59' stamp='0.052'>
<phase_done nodes='71' stamp='0.052'/>
</phase>
<phase name='matcher' nodes='52' stamp='0.052'>
<phase_done nodes='37' stamp='0.052'/>
</phase>
<phase name='regalloc' nodes='42' stamp='0.053'>
<regalloc attempts='0' success='1'/>
<phase_done nodes='62' stamp='0.053'/>
</phase>
<phase name='output' nodes='62' stamp='0.053'>
<phase_done nodes='74' stamp='0.053'/>
</phase>
<phase name='matcher' nodes='43' stamp='0.053'>
<phase_done nodes='31' stamp='0.053'/>
</phase>
<phase name='regalloc' nodes='36' stamp='0.053'>
<regalloc attempts='0' success='1'/>
<phase_done nodes='46' stamp='0.053'/>
</phase>
<phase name='output' nodes='46' stamp='0.053'>
<phase_done nodes='58' stamp='0.053'/>
</phase>
<phase name='matcher' nodes='47' stamp='0.053'>
<phase_done nodes='33' stamp='0.053'/>
</phase>
<phase name='regalloc' nodes='38' stamp='0.053'>
<regalloc attempts='0' success='1'/>
<phase_done nodes='53' stamp='0.053'/>
</phase>
<phase name='output' nodes='53' stamp='0.053'>
<phase_done nodes='65' stamp='0.053'/>
</phase>
<phase name='matcher' nodes='46' stamp='0.053'>
<phase_done nodes='34' stamp='0.053'/>
</phase>
<phase name='regalloc' nodes='39' stamp='0.053'>
<regalloc attempts='0' success='1'/>
<phase_done nodes='55' stamp='0.053'/>
</phase>
<phase name='output' nodes='55' stamp='0.053'>
<phase_done nodes='67' stamp='0.053'/>
</phase>
<phase name='matcher' nodes='42' stamp='0.053'>
<phase_done nodes='30' stamp='0.053'/>
</phase>
<phase name='regalloc' nodes='35' stamp='0.053'>
<regalloc attempts='0' success='1'/>
<phase_done nodes='43' stamp='0.053'/>
</phase>
<phase name='output' nodes='43' stamp='0.053'>
<phase_done nodes='55' stamp='0.053'/>
</phase>
<phase name='parse' nodes='3' stamp='0.054'>
<type id='489' name='long'/>
<klass id='594' name='test/PerfTest' flags='1'/>
<method id='595' holder='594' name='inner' return='489' flags='8' bytes='19' iicount='1'/>
<parse method='595' uses='1' osr_bci='5' stamp='0.054'>
<bc code='155' bci='14'/>
<branch target_bci='5' taken='11264' not_taken='0' cnt='11264' prob='always'/>
<parse_done nodes='63' memory='26784' stamp='0.054'/>
</parse>
<phase_done nodes='64' stamp='0.054'/>
</phase>
<phase name='optimizer' nodes='64' stamp='0.054'>
<phase name='idealLoop' nodes='64' stamp='0.054'>
<phase_done nodes='68' stamp='0.054'/>
</phase>
<phase name='idealLoop' nodes='68' stamp='0.054'>
<phase_done nodes='68' stamp='0.054'/>
</phase>
<phase name='idealLoop' nodes='68' stamp='0.054'>
<phase_done nodes='68' stamp='0.054'/>
</phase>
<phase name='ccp' nodes='68' stamp='0.054'>
<phase_done nodes='68' stamp='0.054'/>
</phase>
<phase name='idealLoop' nodes='68' stamp='0.054'>
<phase_done nodes='71' stamp='0.054'/>
</phase>
<phase_done nodes='71' stamp='0.054'/>
</phase>
<phase name='matcher' nodes='71' stamp='0.054'>
<phase_done nodes='29' stamp='0.054'/>
</phase>
<phase name='regalloc' nodes='36' stamp='0.054'>
<regalloc attempts='0' success='1'/>
<phase_done nodes='41' stamp='0.054'/>
</phase>
<phase name='output' nodes='41' stamp='0.054'>
<phase_done nodes='56' stamp='0.054'/>
</phase>
<task_done success='1' nmsize='120' count='1' backedge_count='14563' stamp='0.054'/>
</task>
<task compile_id='1' method='test/PerfTest inner ()J' bytes='19' count='2' backedge_count='5000' iicount='2' blocking='1' stamp='4.965'>
<phase name='parse' nodes='3' stamp='4.966'>
<type id='489' name='long'/>
<klass id='594' name='test/PerfTest' flags='1'/>
<method id='595' holder='594' name='inner' return='489' flags='8' bytes='19' iicount='2'/>
<parse method='595' uses='2' stamp='4.966'>
<bc code='155' bci='14'/>
<branch target_bci='5' taken='11264' not_taken='0' cnt='11264' prob='always'/>
<bc code='155' bci='14'/>
<branch target_bci='5' taken='11264' not_taken='0' cnt='11264' prob='always'/>
<parse_done nodes='66' memory='28240' stamp='4.966'/>
</parse>
<phase_done nodes='67' stamp='4.966'/>
</phase>
<phase name='optimizer' nodes='67' stamp='4.966'>
<phase name='idealLoop' nodes='67' stamp='4.966'>
<phase_done nodes='72' stamp='4.966'/>
</phase>
<phase name='idealLoop' nodes='72' stamp='4.966'>
<phase_done nodes='72' stamp='4.966'/>
</phase>
<phase name='idealLoop' nodes='72' stamp='4.966'>
<phase_done nodes='72' stamp='4.966'/>
</phase>
<phase name='ccp' nodes='72' stamp='4.966'>
<phase_done nodes='72' stamp='4.966'/>
</phase>
<phase name='idealLoop' nodes='73' stamp='4.966'>
<phase_done nodes='73' stamp='4.966'/>
</phase>
<phase_done nodes='73' stamp='4.966'/>
</phase>
<phase name='matcher' nodes='73' stamp='4.966'>
<phase_done nodes='22' stamp='4.966'/>
</phase>
<phase name='regalloc' nodes='28' stamp='4.966'>
<regalloc attempts='0' success='1'/>
<phase_done nodes='28' stamp='4.966'/>
</phase>
<phase name='output' nodes='28' stamp='4.966'>
<phase_done nodes='42' stamp='4.966'/>
</phase>
<task_done success='1' nmsize='88' count='2' backedge_count='5000' stamp='4.966'/>
</task>
</compilation_log>
<compilation_log thread='44530000'>
<fragment>
<![CDATA[
<start_compile_thread thread='44530000' process='519' stamp='0.045'/>
]]>
</fragment>
</compilation_log>
<hotspot_log_done stamp='67.400'/>
</hotspot_log>

REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
public class PerfTest {

  public static void main(String[] args) {
    test();
  }

  static void test() {
    for (int i = 0; i < 10; ++i) {
      long time = System.currentTimeMillis();
      long result = inner();
      System.out.println("Time: " + (System.currentTimeMillis() - time));
      System.out.println("Value: " + result);
    }
  }
  
  static long inner() {
    long i = 0L;
    for (; i < 10000000000L; ++i);
    return i;
  }
}
---------- END SOURCE ----------

                                    

Comments
EVALUATION

The supplied test measures the speed on an empty loop as
it contains no work.

The test case demonstrates the cost of putting work on
the back branch of the loop. Prior to recompilation, the
server compiler put the iv increment in the loop body, 
after recompilation, on the back branch.

Fixes forthcoming in 6743900 should greatly mitigate the
cost of the backbranch code, however, we should investigate
better placement of the iv increment
                                     
2008-10-24



Hardware and Software, Engineered to Work Together