Bug ID: JDK-7054211 No loop unrolling done in jdk7b144 for a test update() while loop

Versions (Unresolved/Resolved/Fixed)

The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.

JDK 7	JDK 8	Other
7u2Fixed	8Fixed	hs22Fixed

A perf regression of approximately 13.5% was observed in jdk7b144, when compared with jdk6u25 (score of 553MB/s vs 623MB/s). The
benchmark is actually from the hadoop common community and its a pure java crc32 implementation of update (I changed the
test to only spew scores for 65536 bytes size).
It has a while loop and when I looked at the generated code for jdk7b144
and compared it to jdk6u25, I saw that there was loop unrolling done in jdk6, so then I tried setting
-XX:LoopUnrollLimit=0 for both (to bring them to a common ground). This didn't change jdk7 at all (as expected), and
dropped jdk6's score to 601MB/s (so now the diff is 8.7%). The I tried the with XX-UseLoopPredicate and the score for
jdk6 dropped a bit to 610MB/s and score for jdk7 increased a bit to 572MB/s (so now the difference is 6.6%). Combining
both loopunrolllimit=0 and -looppredicate I get 528MB/s for jdk6 and 542MB/s for jdk7 which is little confusing to me...

I have attached the generate outputs (Solaris Studio print) for both. 

Here's the source:

line# 59:public void update(byte[] b, int off, int len) {
line# 60: while(len > 7) {
line# 61: int c0 = b[off++] ^ crc;
line# 62: int c1 = b[off++] ^ (crc >>>= 8);
line# 63: int c2 = b[off++] ^ (crc >>>= 8);
line# 64: int c3 = b[off++] ^ (crc >>>= 8);
line# 65: crc = (T8_7[c0 & 0xff] ^ T8_6[c1 & 0xff])
line# 66: ^ (T8_5[c2 & 0xff] ^ T8_4[c3 & 0xff]);
line# 67:
line# 68: crc ^= (T8_3[b[off++] & 0xff] ^ T8_2[b[off++] & 0xff])
line# 69: ^ (T8_1[b[off++] & 0xff] ^ T8_0[b[off++] & 0xff]);
line# 70:
line# 71: len -= 8;
line# 72: }
line# 73: while(len > 0) {
line# 74: crc = (crc >>> 8) ^ T8_0[(crc ^ b[off++]) & 0xff];
line# 75: len--;
line# 76: }
line# 77: }

Matching sections: line 2b0 onwards in jdk6u25.txt and line e0 onwards for jdk7b144.txt.

EVALUATION See main CR
24-09-2011
EVALUATION See main CR
12-09-2011
EVALUATION http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/da6a29fb0da5
07-09-2011
EVALUATION In the fix for 5091921 (b142) I removed unrolling case for CaffeineMark when loop has Xor nodes: - if (xors_in_loop >= 4 && body_size < (uint)LoopUnrollLimit*4) return true; The loop size in update() is large (111 nodes) and it is not unrolled without that check. It seems I have to return the check back. The regression in b136 will be addressed in 7035946 fix.
14-06-2011