Hi Tom, Christian, and others, Here's a patch I'd like to contribute: http://cr.openjdk.java.net/~rasbold/69XXXXX/webrev.00/ With it, C2 generates shorter long multiplication sequences on x86_32 when the high 32 bits are known to be zero. Particularly, this applies to the loop in BigInteger.mulAdd(): private final static long LONG_MASK = 0xffffffffL; static int mulAdd(int[] out, int[] in, int offset, int len, int k) { long kLong = k & LONG_MASK; long carry = 0; offset = out.length-offset - 1; for (int j=len-1; j >= 0; j--) { long product = (in[j] & LONG_MASK) * kLong + (out[offset] & LONG_MASK) + carry; out[offset--] = (int)product; carry = product >>> 32; } return (int)carry; } In my measurements, one of our internal microbenchmarks that uses BigInteger.mulAdd sped up about 12%. Also, SPECjvm2008's crypto.rsa and crypto.signverify improved about 7% and 2.3%, respectively.
|