United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
Bug ID: JDK-6378821 bitCount() should use POPC on SPARC processors and AMD+10h
JDK-6378821 : bitCount() should use POPC on SPARC processors and AMD+10h

Details
Type:
Enhancement
Submit Date:
2006-01-30
Status:
Closed
Updated Date:
2011-03-08
Project Name:
JDK
Resolved Date:
2011-03-08
Component:
hotspot
OS:
solaris,windows_xp
Sub-Component:
compiler
CPU:
x86,sparc
Priority:
P4
Resolution:
Fixed
Affected Versions:
6,7
Fixed Versions:
hs15 (b04)

Related Reports
Backport:
Backport:
Relates:
Relates:
Relates:

Sub Tasks

Description
bitCount() should use POPC on SPARC processors where POPC is implemented directly in hardware.  (The existing bitCount() implementation comes from "Hacker's Delight" and is fairly fast).  Beware, however, that POPC is implemented by kernel-level trap-based emulation on some processors.  In those environments we want to use the existing bitCount() implemenation.  isainfo (try "isainfo -x") should allow the JVM to identify those processors that support POPC in hardware.

x86 processors include a popcnt instruction in SSE4a for AMD and SSE4.2 for Intel.

                                    

Comments
EVALUATION

A very simple micro-benchmark like this:

public class test {
    public static void main(String[] args) {
        int sum = 0;
        long start = System.currentTimeMillis();
        for (int i = 0; i < 2000 * 1000000; i++) {
            sum += Integer.bitCount(i);
        }
        long end = System.currentTimeMillis();
        System.out.println("sum: " + sum);
        System.out.println("time: " + (end - start));
    }
}

shows a 5x speedup on a Nehalem processor:

$ gamma -XX:-UsePopCountInstruction test
VM option '-UsePopCountInstruction'
sum: 629085184
time: 8132

$ gamma -XX:+UsePopCountInstruction test
VM option '+UsePopCountInstruction'
sum: 629085184
time: 1604

And with disabled loop unrolling to get more accurate numbers:

$ gamma -XX:-UsePopCountInstruction -XX:LoopUnrollLimit=1 test
VM option '-UsePopCountInstruction'
VM option 'LoopUnrollLimit=1'
sum: 629085184
time: 8657

$ gamma -XX:+UsePopCountInstruction -XX:LoopUnrollLimit=1 test
VM option '+UsePopCountInstruction'
VM option 'LoopUnrollLimit=1'
sum: 629085184
time: 1458

It's interesting to see that a tighter loop with popcnt is faster.
                                     
2009-02-19
PUBLIC COMMENTS

Just for the record to see how much slower the kernel-level trap-based emulation on SPARC is (with 20 * 1000000 loops):

$ gamma -XX:-UsePopCountInstruction test
VM option '-UsePopCountInstruction'
sum: 238869248
time: 1011

$ gamma -XX:+UsePopCountInstruction test
VM option '+UsePopCountInstruction'
sum: 238869248
time: 76985
                                     
2009-03-03
EVALUATION

The same numbers on a T2:

$ java -XX:-UsePopCountInstruction test
VM option '-UsePopCountInstruction'
sum: 629085184
time: 35676

$ java -XX:+UsePopCountInstruction test
VM option '+UsePopCountInstruction'
sum: 629085184
time: 20007

And without loop unrolling:

$ java -XX:-UsePopCountInstruction -XX:LoopUnrollLimit=1 test
VM option '-UsePopCountInstruction'
VM option 'LoopUnrollLimit=1'
sum: 629085184
time: 41509

$ java -XX:+UsePopCountInstruction -XX:LoopUnrollLimit=1 test
VM option '+UsePopCountInstruction'
VM option 'LoopUnrollLimit=1'
sum: 629085184
time: 29470

The speedup is 1.78 and 1.41 respectively.
                                     
2009-03-03
PUBLIC COMMENTS

And the numbers for AMD Shanghai:

$ gamma -XX:-UsePopCountInstruction test
sum: 629085184
time: 8504

$ gamma -XX:+UsePopCountInstruction test
sum: 629085184
time: 1807

4.7x speedup.

$ gamma -XX:-UsePopCountInstruction -XX:LoopUnrollLimit=1 test
sum: 629085184
time: 9622

$ gamma -XX:+UsePopCountInstruction -XX:LoopUnrollLimit=1 test
sum: 629085184
time: 2577

3.73x speedup.
                                     
2009-03-12
EVALUATION

http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/c771b7f43bbf
                                     
2009-03-13



Hardware and Software, Engineered to Work Together