JDK-6823354 : Add intrinsics for {Integer,Long}.{numberOfLeadingZeros,numberOfTrailingZeros}()
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: hs15
  • Priority: P5
  • Status: Closed
  • Resolution: Fixed
  • OS: generic
  • CPU: generic
  • Submitted: 2009-03-27
  • Updated: 2014-02-27
  • Resolved: 2011-03-08
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 6 JDK 7 Other
6u18Fixed 7Fixed hs16Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Relates :  
Description
These methods can be instrinsified by using bit scan, bit test, and population count instructions.  It seems it's not worth to instrinsify lowestOneBit() too, as it already boilds down to about 2 or 3 machine instructions.

Some of these methods can be used in BigInteger and maybe other classes in the JDK.
I removed highestOneBit from that CR as it's not clear the benefit is worth the changes.  I might open another one in the future.

Comments
EVALUATION http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/93c14e5562c4
07-05-2009

EVALUATION These are the numbers with Vladimir's suggested changes on a Nehalem box: 32-bit (instrinsic): $ gamma bits Integer.numberOfLeadingZeros: sum: -2, time: 2317 sum: -2, time: 2190 sum: -2, time: 2560 sum: -2, time: 2560 sum: -2, time: 2561 Integer.numberOfTrailingZeros: sum: -2147483648, time: 1836 sum: -2147483648, time: 1832 sum: -2147483648, time: 1921 sum: -2147483648, time: 1922 sum: -2147483648, time: 1922 Long.numberOfLeadingZeros: sum: -34, time: 6986 sum: -34, time: 6981 sum: -34, time: 7082 sum: -34, time: 7071 sum: -34, time: 7074 Long.numberOfTrailingZeros: sum: -2147483616, time: 5162 sum: -2147483616, time: 5164 sum: -2147483616, time: 5448 sum: -2147483616, time: 5488 sum: -2147483616, time: 5452 64-bit (intrinsic): $ gamma bits Integer.numberOfLeadingZeros: sum: -2, time: 2113 sum: -2, time: 2249 sum: -2, time: 2169 sum: -2, time: 2169 sum: -2, time: 2168 Integer.numberOfTrailingZeros: sum: -2147483648, time: 1882 sum: -2147483648, time: 1876 sum: -2147483648, time: 1830 sum: -2147483648, time: 1831 sum: -2147483648, time: 1830 Long.numberOfLeadingZeros: sum: -34, time: 3952 sum: -34, time: 3952 sum: -34, time: 4065 sum: -34, time: 4122 sum: -34, time: 4085 Long.numberOfTrailingZeros: sum: -2147483616, time: 2419 sum: -2147483616, time: 2407 sum: -2147483616, time: 3705 sum: -2147483616, time: 3788 sum: -2147483616, time: 3717 Numbers are sometimes slightly better, but not significantly.
04-05-2009

EVALUATION These numbers are from a T2 w/ 1.165 GHz. The SPARC implementations of the intrinsics need a hardware supported POPC instruction. 32-bit (Java code): $ gamma bits Integer.numberOfLeadingZeros: sum: -2, time: 86712 sum: -2, time: 86898 sum: -2, time: 87017 sum: -2, time: 87455 sum: -2, time: 86938 Integer.numberOfTrailingZeros: sum: -2147483648, time: 61786 sum: -2147483648, time: 61736 sum: -2147483648, time: 58972 sum: -2147483648, time: 58971 sum: -2147483648, time: 58971 Long.numberOfLeadingZeros: sum: -34, time: 145686 sum: -34, time: 145644 sum: -34, time: 99939 sum: -34, time: 99939 sum: -34, time: 99939 Long.numberOfTrailingZeros: sum: -2147483616, time: 121674 sum: -2147483616, time: 121630 sum: -2147483616, time: 81086 sum: -2147483616, time: 81087 sum: -2147483616, time: 81086 32-bit (intrinsic): $ gamma bits Integer.numberOfLeadingZeros: sum: -2, time: 43354 sum: -2, time: 43353 sum: -2, time: 43307 sum: -2, time: 43307 sum: -2, time: 43307 Integer.numberOfTrailingZeros: sum: -2147483648, time: 26652 sum: -2147483648, time: 26606 sum: -2147483648, time: 26606 sum: -2147483648, time: 26606 sum: -2147483648, time: 26606 Long.numberOfLeadingZeros: sum: -34, time: 90336 sum: -34, time: 90301 sum: -34, time: 64500 sum: -34, time: 64500 sum: -34, time: 64500 Long.numberOfTrailingZeros: sum: -2147483616, time: 79282 sum: -2147483616, time: 79244 sum: -2147483616, time: 42386 sum: -2147483616, time: 42386 sum: -2147483616, time: 42386 Speedup: 2.01, 2.22, 1.55, 1.91 64-bit uses the same instructions as 32-bit.
22-04-2009

EVALUATION These numbers are from an AMD Shanghai w/ 2.6Ghz: 32-bit (Java code): $ gamma bits Integer.numberOfLeadingZeros: sum: -2, time: 9649 sum: -2, time: 10700 sum: -2, time: 10140 sum: -2, time: 11156 sum: -2, time: 11156 Integer.numberOfTrailingZeros: sum: -2147483648, time: 10563 sum: -2147483648, time: 10555 sum: -2147483648, time: 13401 sum: -2147483648, time: 13616 sum: -2147483648, time: 13617 Long.numberOfLeadingZeros: sum: -34, time: 12758 sum: -34, time: 12750 sum: -34, time: 12128 sum: -34, time: 12123 sum: -34, time: 12402 Long.numberOfTrailingZeros: sum: -2147483616, time: 15583 sum: -2147483616, time: 16279 sum: -2147483616, time: 16420 sum: -2147483616, time: 18200 sum: -2147483616, time: 18201 32-bit (instrinsic w/o lzcnt instruction): $ gamma -XX:-UseCountLeadingZerosInstruction bits Integer.numberOfLeadingZeros: sum: -2, time: 5749 sum: -2, time: 5292 sum: -2, time: 5229 sum: -2, time: 5230 sum: -2, time: 5236 Integer.numberOfTrailingZeros: sum: -2147483648, time: 3944 sum: -2147483648, time: 3935 sum: -2147483648, time: 3780 sum: -2147483648, time: 3831 sum: -2147483648, time: 3832 Long.numberOfLeadingZeros: sum: -34, time: 11610 sum: -34, time: 11598 sum: -34, time: 12427 sum: -34, time: 12426 sum: -34, time: 12426 Long.numberOfTrailingZeros: sum: -2147483616, time: 6726 sum: -2147483616, time: 6760 sum: -2147483616, time: 7455 sum: -2147483616, time: 7559 sum: -2147483616, time: 7456 Speedup: 1.94, 3.55, 0.98, 2.44 32-bit (intrinsic w/ lzcnt): $ gamma bits Integer.numberOfLeadingZeros: sum: -2, time: 2325 sum: -2, time: 2028 sum: -2, time: 2071 sum: -2, time: 2071 sum: -2, time: 2071 ... Long.numberOfLeadingZeros: sum: -34, time: 6653 sum: -34, time: 6641 sum: -34, time: 8301 sum: -34, time: 8671 sum: -34, time: 8286 ... Speedup: 4.90, 1.46 Overall speedup: 4.90, 3.55, 1.46, 2.44 64-bit (Java code): $ gamma bits Integer.numberOfLeadingZeros: sum: -2, time: 9226 sum: -2, time: 9284 sum: -2, time: 10742 sum: -2, time: 12271 sum: -2, time: 12273 Integer.numberOfTrailingZeros: sum: -2147483648, time: 11102 sum: -2147483648, time: 11707 sum: -2147483648, time: 10873 sum: -2147483648, time: 10880 sum: -2147483648, time: 10873 Long.numberOfLeadingZeros: sum: -34, time: 10186 sum: -34, time: 10306 sum: -34, time: 11391 sum: -34, time: 12298 sum: -34, time: 12299 Long.numberOfTrailingZeros: sum: -2147483616, time: 14097 sum: -2147483616, time: 14780 sum: -2147483616, time: 15025 sum: -2147483616, time: 17561 sum: -2147483616, time: 17553 64-bit (instrinsic w/o lzcnt): $ gamma -XX:-UseCountLeadingZerosInstruction bits Integer.numberOfLeadingZeros: sum: -2, time: 5772 sum: -2, time: 5083 sum: -2, time: 4867 sum: -2, time: 4866 sum: -2, time: 4867 Integer.numberOfTrailingZeros: sum: -2147483648, time: 3591 sum: -2147483648, time: 3572 sum: -2147483648, time: 3521 sum: -2147483648, time: 3623 sum: -2147483648, time: 3739 Long.numberOfLeadingZeros: sum: -34, time: 5808 sum: -34, time: 5799 sum: -34, time: 6627 sum: -34, time: 6627 sum: -34, time: 6628 Long.numberOfTrailingZeros: sum: -2147483616, time: 4150 sum: -2147483616, time: 4142 sum: -2147483616, time: 5382 sum: -2147483616, time: 5383 sum: -2147483616, time: 6063 Speedup: 2.52, 3.09, 1.86, 3.26 64-bit (intrinsic w/ lzcnt): $ gamma bits Integer.numberOfLeadingZeros: sum: -2, time: 2320 sum: -2, time: 1942 sum: -2, time: 1812 sum: -2, time: 1812 sum: -2, time: 1812 ... Long.numberOfLeadingZeros: sum: -34, time: 2499 sum: -34, time: 2487 sum: -34, time: 3313 sum: -34, time: 3313 sum: -34, time: 3468 ... Speedup: 6.77, 3.71 Overall speedup: 6.77, 3.09, 3.71, 3.26
22-04-2009

EVALUATION On an Intel Nehalem w/ 2.93GHz I get these numbers: 32-bit (Java code): $ gamma bits Integer.numberOfLeadingZeros: sum: -2, time: 6347 sum: -2, time: 6674 sum: -2, time: 7442 sum: -2, time: 7429 sum: -2, time: 7471 Integer.numberOfTrailingZeros: sum: -2147483648, time: 8767 sum: -2147483648, time: 8881 sum: -2147483648, time: 8155 sum: -2147483648, time: 8159 sum: -2147483648, time: 8144 Long.numberOfLeadingZeros: sum: -34, time: 10080 sum: -34, time: 10075 sum: -34, time: 9703 sum: -34, time: 9702 sum: -34, time: 9702 Long.numberOfTrailingZeros: sum: -2147483616, time: 10944 sum: -2147483616, time: 10932 sum: -2147483616, time: 10870 sum: -2147483616, time: 10859 sum: -2147483616, time: 10868 32-bit (intrinsic): $ gamma bits Integer.numberOfLeadingZeros: sum: -2, time: 2759 sum: -2, time: 2367 sum: -2, time: 2334 sum: -2, time: 2334 sum: -2, time: 2334 Integer.numberOfTrailingZeros: sum: -2147483648, time: 2066 sum: -2147483648, time: 2059 sum: -2147483648, time: 2059 sum: -2147483648, time: 2059 sum: -2147483648, time: 2060 Long.numberOfLeadingZeros: sum: -34, time: 6623 sum: -34, time: 6610 sum: -34, time: 7170 sum: -34, time: 7172 sum: -34, time: 7162 Long.numberOfTrailingZeros: sum: -2147483616, time: 5180 sum: -2147483616, time: 5159 sum: -2147483616, time: 5714 sum: -2147483616, time: 5731 sum: -2147483616, time: 5724 That's a speedup of 3.18 (Integer.numberOfLeadingZeros), 3.96 (Integer.numberOfTrailingZeros), 1.36 (Long.numberOfLeadingZeros), and 1.90 (Long.numberOfTrailingZeros) respectively. 64-bit (Java code): $ gamma bits Integer.numberOfLeadingZeros: sum: -2, time: 6247 sum: -2, time: 7895 sum: -2, time: 7884 sum: -2, time: 7884 sum: -2, time: 7884 Integer.numberOfTrailingZeros: sum: -2147483648, time: 6642 sum: -2147483648, time: 6720 sum: -2147483648, time: 7033 sum: -2147483648, time: 7022 sum: -2147483648, time: 7022 Long.numberOfLeadingZeros: sum: -34, time: 8518 sum: -34, time: 8596 sum: -34, time: 8677 sum: -34, time: 8677 sum: -34, time: 8677 Long.numberOfTrailingZeros: sum: -2147483616, time: 8485 sum: -2147483616, time: 8478 sum: -2147483616, time: 7941 sum: -2147483616, time: 7944 sum: -2147483616, time: 7945 64-bit (intrinsic): $ gamma bits Integer.numberOfLeadingZeros: sum: -2, time: 2306 sum: -2, time: 2250 sum: -2, time: 2059 sum: -2, time: 2059 sum: -2, time: 2059 Integer.numberOfTrailingZeros: sum: -2147483648, time: 1883 sum: -2147483648, time: 1876 sum: -2147483648, time: 1876 sum: -2147483648, time: 1876 sum: -2147483648, time: 1876 Long.numberOfLeadingZeros: sum: -34, time: 3569 sum: -34, time: 3697 sum: -34, time: 4375 sum: -34, time: 4343 sum: -34, time: 4292 Long.numberOfTrailingZeros: sum: -2147483616, time: 2405 sum: -2147483616, time: 2400 sum: -2147483616, time: 3711 sum: -2147483616, time: 3667 sum: -2147483616, time: 3797 Speedup: 3.83, 3.74, 2.02, 2.17
20-04-2009

EVALUATION I used this small micro-benchmark for testing: public class bits { public static void main(String[] args) { Integer.highestOneBit(0); Integer.lowestOneBit(0); Integer.numberOfLeadingZeros(0); Integer.numberOfTrailingZeros(0); Long.numberOfLeadingZeros(0); Long.numberOfTrailingZeros(0); System.out.println("Integer.numberOfLeadingZeros:"); for (int i = 0; i < 5; i++) doleadi(); System.out.println("Integer.numberOfTrailingZeros:"); for (int i = 0; i < 5; i++) dotraili(); System.out.println("Long.numberOfLeadingZeros:"); for (int i = 0; i < 5; i++) doleadl(); System.out.println("Long.numberOfTrailingZeros:"); for (int i = 0; i < 5; i++) dotraill(); } static void doleadi() { long start = System.currentTimeMillis(); int sum = 0; for (int i = 0; i < Integer.MAX_VALUE; i++) { sum += lead(i); } long end = System.currentTimeMillis(); System.out.println("sum: " + sum + ", time: " + (end - start)); } static void dotraili() { long start = System.currentTimeMillis(); int sum = 0; for (int i = 0; i < Integer.MAX_VALUE; i++) { sum += trail(i); } long end = System.currentTimeMillis(); System.out.println("sum: " + sum + ", time: " + (end - start)); } static void doleadl() { long start = System.currentTimeMillis(); int sum = 0; for (long l = 0; l < Integer.MAX_VALUE; l++) { sum += lead(l); } long end = System.currentTimeMillis(); System.out.println("sum: " + sum + ", time: " + (end - start)); } static void dotraill() { long start = System.currentTimeMillis(); int sum = 0; for (long l = 0; l < Integer.MAX_VALUE; l++) { sum += trail(l); } long end = System.currentTimeMillis(); System.out.println("sum: " + sum + ", time: " + (end - start)); } static int high(int i) { return Integer.highestOneBit(i); } static int low(int i) { return Integer.lowestOneBit(i); } static int lead(int i) { return Integer.numberOfLeadingZeros(i); } static int trail(int i) { return Integer.numberOfTrailingZeros(i); } static int lead(long l) { return Long.numberOfLeadingZeros(l); } static int trail(long l) { return Long.numberOfTrailingZeros(l); } }
20-04-2009