United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
Bug ID: JDK-7054211 No loop unrolling done in jdk7b144 for a test update() while loop
JDK-7054211 : No loop unrolling done in jdk7b144 for a test update() while loop

Details
Type:
Bug
Submit Date:
2011-06-13
Status:
Closed
Updated Date:
2011-10-07
Project Name:
JDK
Resolved Date:
2011-09-30
Component:
hotspot
OS:
generic
Sub-Component:
compiler
CPU:
generic
Priority:
P3
Resolution:
Fixed
Affected Versions:
7
Fixed Versions:
hs22 (b06)

Related Reports
Backport:
Backport:
Relates:

Sub Tasks

Description
A perf regression of approximately 13.5% was observed in jdk7b144, when compared with jdk6u25 (score of 553MB/s vs 623MB/s). The
benchmark is actually from the hadoop common community and its a pure java crc32 implementation of update (I changed the
test to only spew scores for 65536 bytes size).
It has a while loop and when I looked at the generated code for jdk7b144
and compared it to jdk6u25, I saw that there was loop unrolling done in jdk6, so then I tried setting
-XX:LoopUnrollLimit=0 for both (to bring them to a common ground). This didn't change jdk7 at all (as expected), and
dropped jdk6's score to 601MB/s (so now the diff is 8.7%). The I tried the with XX-UseLoopPredicate and the score for
jdk6 dropped a bit to 610MB/s and score for jdk7 increased a bit to 572MB/s (so now the difference is 6.6%). Combining
both loopunrolllimit=0 and -looppredicate I get 528MB/s for jdk6 and 542MB/s for jdk7 which is little confusing to me...

I have attached the generate outputs (Solaris Studio print) for both. 

Here's the source:

line# 59:public void update(byte[] b, int off, int len) {
line# 60: while(len > 7) {
line# 61: int c0 = b[off++] ^ crc;
line# 62: int c1 = b[off++] ^ (crc >>>= 8);
line# 63: int c2 = b[off++] ^ (crc >>>= 8);
line# 64: int c3 = b[off++] ^ (crc >>>= 8);
line# 65: crc = (T8_7[c0 & 0xff] ^ T8_6[c1 & 0xff])
line# 66: ^ (T8_5[c2 & 0xff] ^ T8_4[c3 & 0xff]);
line# 67:
line# 68: crc ^= (T8_3[b[off++] & 0xff] ^ T8_2[b[off++] & 0xff])
line# 69: ^ (T8_1[b[off++] & 0xff] ^ T8_0[b[off++] & 0xff]);
line# 70:
line# 71: len -= 8;
line# 72: }
line# 73: while(len > 0) {
line# 74: crc = (crc >>> 8) ^ T8_0[(crc ^ b[off++]) & 0xff];
line# 75: len--;
line# 76: }
line# 77: }

Matching sections: line 2b0 onwards in jdk6u25.txt and line e0 onwards for jdk7b144.txt.

                                    

Comments
EVALUATION

In the fix for 5091921 (b142) I removed unrolling case for CaffeineMark when loop has Xor nodes:

- if (xors_in_loop >= 4 && body_size < (uint)LoopUnrollLimit*4) return true;

The loop size in update() is large (111 nodes) and it is not unrolled without that check. It seems I have to return the check back.

The regression in b136 will be addressed in 7035946 fix.
                                     
2011-06-14
EVALUATION

http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/da6a29fb0da5
                                     
2011-09-07
EVALUATION

See main CR
                                     
2011-09-12
EVALUATION

See main CR
                                     
2011-09-24



Hardware and Software, Engineered to Work Together