JDK-6812207 : Possible bug in a floating point (float) arithmetic in 32-bit server HotSpot
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 6u10
  • Priority: P3
  • Status: Closed
  • Resolution: Duplicate
  • OS: windows_vista
  • CPU: x86
  • Submitted: 2009-03-03
  • Updated: 2011-02-16
  • Resolved: 2009-03-12
Related Reports
Duplicate :  
Description
FULL PRODUCT VERSION :
java version "1.6.0_12"
Java(TM) SE Runtime Environment (build 1.6.0_12-b04)
Java HotSpot(TM) Client VM (build 11.2-b01, mixed mode, sharing)

FULL OS VERSION :
Microsoft Windows [Version 6.0.6001]

A DESCRIPTION OF THE PROBLEM :
There might be a bug in a floating point (float) arithmetic in 32-bit server HotSpot. Incorrect results appear for 2-D, single precision, not power-of-two sizes, real input, Fast Fourier Transform (http://piotr.wendykier.googlepages.com/jtransforms).
The error occurs only under the following conditions:
- Sun JVM (32-bit)
- "-server" flag
- JIT enabled
- single precision code

Here is a link to the forum, where this problem has been discussed: http://forums.sun.com/thread.jspa?threadID=5368823&tstart=0

THE PROBLEM WAS REPRODUCIBLE WITH -Xint FLAG: No

THE PROBLEM WAS REPRODUCIBLE WITH -server FLAG: Yes

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
1. Download jtransforms-2.2.jar from http://piotr.wendykier.googlepages.com/jtransforms

2. Compile the attached source code:

javac -cp jtransforms-2.3.jar AccuracyCheckFloatFFT_2D.java

3. Run the code in the client mode (there should be no error):

java -cp .;jtransforms-2.3.jar AccuracyCheckFloatFFT_2D

4. Run the code in the server mode (there should be a few errors):

java -server -cp .;jtransforms-2.3.jar AccuracyCheckFloatFFT_2D


EXPECTED VERSUS ACTUAL BEHAVIOR :
Expected output from step 3:

Checking accuracy of 2D, single precision, real input FFTs
size =   2 x   2; OK
size =   3 x   3; OK
size =   4 x   4; OK
size =   5 x   5; OK
size =   6 x   6; OK
size =   7 x   7; OK
size =   8 x   8; OK
size =   9 x   9; OK
size =  10 x  10; OK
size =  11 x  11; OK
size =  12 x  12; OK
size =  13 x  13; OK
size =  16 x  16; OK
size =  32 x  32; OK
size =  64 x  64; OK
size = 100 x 100; OK
size = 120 x 120; OK
size = 128 x 128; OK
size = 256 x 256; OK
size = 310 x 310; OK
size = 511 x 511; OK

Expected output from step 4:

Checking accuracy of 2D, single precision, real input FFTs
size =   2 x   2; OK
size =   3 x   3; OK
size =   4 x   4; OK
size =   5 x   5; OK
size =   6 x   6; OK
size =   7 x   7; OK
size =   8 x   8; OK
size =   9 x   9; OK
size =  10 x  10; OK
size =  11 x  11; OK
size =  12 x  12; OK
size =  13 x  13; OK
size =  16 x  16; OK
size =  32 x  32; OK
size =  64 x  64; OK
size = 100 x 100; OK
size = 120 x 120; error = 0.230260
size = 128 x 128; OK
size = 256 x 256; OK
size = 310 x 310; error = 0.244690
size = 511 x 511; OK

REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
import java.util.Random;

import edu.emory.mathcs.jtransforms.fft.FloatFFT_2D;
import edu.emory.mathcs.utils.ConcurrencyUtils;

/**
 * This is the source code that demonstrates a possible bug in Sun JVM (32-bit
 * server HotSpot). To reproduce the error, 32-bit JVM has to be used with
 * "-server" flag and JIT compiler enabled.
 *
 * @author Piotr Wendykier (###@###.###)
 *
 */
public class AccuracyCheckFloatFFT_2D {

    public static void main(String[] args) {
        int[] sizes = { 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 16, 32, 64, 100, 120, 128, 256, 310, 511 };
        ConcurrencyUtils.setNumberOfThreads(1); //use only sequential code
        System.out.println("Checking accuracy of 2D, single precision, real input FFTs");
        for (int i = 0; i < sizes.length; i++) {
            run(sizes[i], sizes[i]);
        }
    }

    /**
     * Checks the accuracy of 2D, single precision, real input FFTs
     *
     * @param rows
     * @param columns
     */
    private static void run(int rows, int columns) {
        float[][] a = new float[rows][2 * columns];
        Random rand = new Random(0);
        for (int r = 0; r < rows; r++) {
            for (int c = 0; c < columns; c++) {
                a[r][c] = rand.nextFloat();
            }
        }
        float[][] b = new float[rows][2 * columns];
        for (int r = 0; r < rows; r++) {
            for (int c = 0; c < columns; c++) {
                b[r][2 * c] = a[r][c];
            }
        }

        FloatFFT_2D fft = new FloatFFT_2D(rows, columns);
        fft.realForwardFull(a);
        fft.complexInverse(a, true);
        double err = computeRMSE(a, b);
        double eps = 1e-6;
        if (err > eps) {
            System.out.println(String.format("size = %3d x %3d; error = %f", rows, columns, err));
        } else {
            System.out.println(String.format("size = %3d x %3d; OK", rows, columns));
        }
    }

    /**
     * Computes the Root Mean Square Error of two arrays
     *
     * @param a
     * @param b
     * @return
     */
    private static double computeRMSE(float[][] a, float[][] b) {
        if (a.length != b.length || a[0].length != b[0].length) {
            throw new IllegalArgumentException("Arrays are not of the same size.");
        }
        double rms = 0;
        double tmp;
        for (int r = 0; r < a.length; r++) {
            for (int c = 0; c < a[0].length; c++) {
                tmp = (a[r][c] - b[r][c]);
                rms += tmp * tmp;
            }
        }
        return Math.sqrt(rms / (a.length * a[0].length));
    }

}

---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
I have not found one yet.

Comments
EVALUATION I was wrong in last comment. One of the statement can be superwordized. After unrolling, part of the loop looks something like: ... a[i-1] = tmp; a[i-1] = a[i-2]; a[i-2] = tmp; ... We can use superword for the two stores of ... = tmp; The problem is to schedule the sandwitched store a[i-1] = a[i-2]; a[i-1] must be scheduled after the superword store because of the write-write dependence, and a[i-2] must be scheduled before the superword store due to anti dependence. The new memory surgery algorithm will handle this issue.
10-03-2009

EVALUATION Apparently, the loop could not be superwordized because dependence cycle exists. It seems the dependence graph is incomplete.
09-03-2009

PUBLIC COMMENTS I forgot to mention that the assert fails because the mem instruction is a vector instruction pointing to the same memory address: this: 1118 StoreF === 1147 1192 1137 1119 [[ 1116 1115 ]] @float[int:>=0]:exact+any *, idx=8; Memory: @float[int:>=0]:exact+any *, idx=8; !orig=812,739,258 !jvms: FloatFFT_1D::realForward @ bci:188 is_Vector(): 0, adr_type=135115820 mem: 1192 Store2F === 1147 1121 1137 1189 [[ 1119 1118 ]] @float[int:>=0]:exact+any *, idx=8; Memory: @float[int:>=0]:exact+any *, idx=8; !orig=[1120],[1048],[282],[774] !jvms: FloatFFT_1D::realForward @ bci:196 is_Vector(): 1, adr_type=135115820
04-03-2009

EVALUATION The two consecutive stores are because the method was written like that. The following method produces the wrong code (the loop body was taken from jtransforms): static void fcomp(float[] a) { int n = a.length; int offa = 0; for (int k = n - 1; k >= 2; k--) { int idx = offa + k; float tmp = a[idx]; a[idx] = a[idx - 1]; a[idx - 1] = tmp; } }
04-03-2009

EVALUATION Btw. the failing method is: edu.emory.mathcs.jtransforms.fft.FloatFFT_1D::realForward
04-03-2009

EVALUATION Turning superword off makes the problem go away. The following code is generated (the code is the end sequence of the loop): Without superword: 31e MOVSS XMM3a,[EBX + #12] ... 3ed MOVSS XMM1a,[EBX + #-48] 3f2 MOVSS [EBX + #-44],XMM1a 3f7 MOVSS [EBX + #-48],XMM3a 3fc MOVSS XMM0a,[EBX + #-52] 401 MOVSS [EBX + #-48],XMM0a 406 MOVSS [EBX + #-52],XMM3a (XXX is the value of XMM3a) Address: -52 -48 -44 1. write: [ ][ ][ -48 ] 2. write: [ ][ XXX ][ -48 ] 3. write: [ ][ -52 ][ -48 ] 4. write: [ XXX ][ -52 ][ -48 ] With superword: 300 MOVSS XMM2a,[EBX + #12] ... 317 PSHUFD XMM2a,XMM2a,0xe0 ! replicate2F ... 3b5 MOVSS XMM0a,[EBX + #-48] 3ba MOVSS [EBX + #-44],XMM0a 3bf MOVSS XMM1a,[EBX + #-52] 3c4 MOVSS [EBX + #-48],XMM1a 3c9 MOVQ [EBX + #-52],XMM2a ! packed2F (YYY is the value of XMM2a) Address: -52 -48 -44 1. write: [ ][ ][ -48 ] 2. write: [ ][ -52 ][ -48 ] 3. write: [ YYY ][ YYY ][ -48 ] It seems the Store2F is scheduled too late. Another question is why there are always two stores in a row into one location.
04-03-2009

PUBLIC COMMENTS I can verify the bug triggers on Solaris too and the assert still happens in HS15: # Internal Error (/export/home/twisti/hotspot-comp/hotspot/src/share/vm/opto/memnode.cpp:2047), pid=2760, tid=9 # Error: assert(Opcode() == mem->Opcode() || phase->C->get_alias_index(adr_type()) == Compile::AliasIdxRaw,"no mismatched stores, except on raw memory")
04-03-2009