JDK-8185029 : Incorrect result from program when JIT takes effect
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 8
  • Priority: P3
  • Status: Closed
  • Resolution: Duplicate
  • OS: linux
  • CPU: x86_64
  • Submitted: 2017-07-20
  • Updated: 2017-07-21
  • Resolved: 2017-07-21
Related Reports
Duplicate :  
Description
FULL PRODUCT VERSION :
java version "1.8.0_141"
Java(TM) SE Runtime Environment (build 1.8.0_141-b15)
Java HotSpot(TM) 64-Bit Server VM (build 25.141-b15, mixed mode)


FULL OS VERSION :
Linux centos7 3.10.0-514.16.1.el7.x86_64 #1 SMP Wed Apr 12 15:04:24 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

A DESCRIPTION OF THE PROBLEM :
The attached testcase fails when run in compiled mode on 64 bit Intel Linux using jdk1.8.0_141

It does NOT fail when run with -Xint.

The testcase has taken significant effort to extract. We have a COBOL product (NTT DATA Enterprise COBOL, formerly Dell Enterprise COBOL) that translates COBOL to Java for execution and the problem originates from a customer COBOL program.

The slightly unusual coding patterns reflect our implementation of COBOL "packed decimal" numerics, in which a decimal number is represented in memory with two digits stored per byte (This also necessitates our use of 'Unsafe').

This *may* be related to an earlier issue we raised  JDK-8178047 : Aliasing problem with raw memory accesses'. Our problem is that *WE CANNOT TELL* if it's the same issue, and despite JDK-8178047 apparently being marked for back-port to JDK8, no such build is available.

This issue was found by a different customer of ours running a different application to that reported in JDK-8178047

We have been unable to reproduce the problem under jdk-9u178 (which contains the fix to JDK-8178047) but we have no way of telling if that is coincidental.

This is causing very significant impact as a customer site (a major insurance company), It has taken several man-weeks of effort on their part to distil the problem down to a testcase they could report to us, and it has taken another week for me to produce the attached test.

The test simple performs a simple subtraction multiple times. It is calculating (615479 - 237892) which should give 377587.

Initially the correct answer is obtained, later after JIT kicks in, the (incorrect) answer 300587 is obtained.

The calculations are done using variables stored in unsafe memory in COBOL packed decimal format. I can guarantee the the unsafe access are within valid bounds.

Please, please please can I plead for a fix to this issue in JDK 8 (and also a fix to JDK-8178047 if that proves to be a different problem) ?

(It's impossible to tell from what's externally visible whether you have any intention of actually back-porting JDK-8178047 to JDK8, or when, despite the fact that it was apparently marked for such back-porting.)

Finally, note that the problem is incredibly slippery - almost any change at all no matter how slight or apparently inconsequential will hide the problem.



THE PROBLEM WAS REPRODUCIBLE WITH -Xint FLAG: No

THE PROBLEM WAS REPRODUCIBLE WITH -server FLAG: Yes

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
$ javac REPRO.java Chunk.java ChunkImpl.java
$ java -server REPRO
377587
377587
300587
300587
300587
300587
300587
300587
300587
300587


EXPECTED VERSUS ACTUAL BEHAVIOR :
The result 377587 should be printed every time, as can be seen with -Xint:

$ java -Xint REPRO
377587
377587
377587
377587
377587
377587
377587
377587
377587
377587

ERROR MESSAGES/STACK TRACES THAT OCCUR :
No crash

REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
public class REPRO {
  public static void main(String[] args) {
    REPRO r = new REPRO();
    r.go();
  }

  public void go() {
    ch.put_Long_PSI(82, 6, 237892L);
    ch.put_Long_PSI(88, 6, 615479L);
    int r_0;
    for (r_0 = 0; r_0 < 10; r_0++) {
      adj();
      System.out.println(ch.get_PSI_Long(149, 6));
    }
  }

  private void adj() {
    int r_0;
    for (r_0 = 0; r_0 < 500000; r_0++) {
      ch.put_Long_PSI(106, 6, 100L);
      ch.put_Long_PSI(106, 6, ((ch.get_PSI_Long(88, 6) - ch.get_PSI_Long(82, 6)) % 100000000000L));
      ch.put_Long_PSI(149, 6, ch.get_PSI_Long(106, 6));
    }
  }

  private final Chunk ch;

  public REPRO() {
    ch = new ChunkImpl(155);
  }
}



public interface Chunk {
   public void put_Long_PSI(int p, int l, long v);
   public long get_PSI_Long(int p, int l);
}


import java.lang.reflect.Field;
import sun.misc.Unsafe;

public class ChunkImpl implements Chunk {

   private static final Unsafe UNSAFE;

   private final long baseAddress;
   private final int length;

   static {
       Unsafe u = null;
       try {
          final Class<Unsafe> uc = Unsafe.class;
          final Field field = uc.getDeclaredField("theUnsafe");
          field.setAccessible(true);
          u = (Unsafe) field.get(uc);
       } catch (NoSuchFieldException | IllegalAccessException | RuntimeException ex) {
          ex.printStackTrace();
       }
       UNSAFE = u;
   }

   /**
    * Lookup take from natural integers to packed-decimal representations.
    */
   private static final byte[] I_TO_P = new byte[256];

   static {
      // Fill in the I_TO_P table by pre-calculating the values.
      for (int i = 0; i < 10; i++) {
         for (int j = 0; j < 10; j++) {
            final int packed = (i * 16) + j;
            final int binary = (i * 10) + j;
            I_TO_P[binary] = (byte) packed;
         }
      }
   }

   public void put_Long_PSI(int p, int l, long v) {
      final long a = baseAddress + p;

      final int sign;
      if (v < 0) {
         sign = 0x0d; // Negative
         v = Math.abs(v);
      } else {
         sign = 0x0c; // Positive
      }

      // Do the least significant digit and sign
      final int b = (int) (((v % 10) << 4) | sign);
      v /= 10;
      UNSAFE.putByte(a + l - 1, (byte) b);

      // Do any remaining digits
      for (int i = l - 2; i >= 0; i--) {
         // We use a lookup here to determine the byte value. Without
         // this, extra division is necessary which is expensive.
         // Note that the equivalent get method does NOT use this algorithm.
         final int intValue = (int) (v % 100L);
         v /= 100L;
         UNSAFE.putByte(a + i, I_TO_P[intValue]);
      }
   }


   public long get_PSI_Long(int p, int l) {
      long a = baseAddress + p;

      long ret = 0;
      for (int i = 0; i < l - 1; i++) {
         final int b = UNSAFE.getByte(a++);
         final int hi = ((b & 0xf0) >> 4);
         ret += hi;
         ret *= 10;
         final int lo = b & 0x0f;
         ret += lo;
         ret *= 10;
      }
      final int b = UNSAFE.getByte(a);
      final int hi = ((b & 0xf0) >> 4);
      ret += hi;

      // Sort out the sign
      if ((b & 0x0f) == 0x0d) {
         ret = -ret;
      }

      return ret;
   }

   public ChunkImpl(int length) {
      baseAddress = UNSAFE.allocateMemory(length);
      this.length = length;
   }


}

---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
None. Customer cannot move to Java 9 as it is pre-release, and also compatibility issues caused by with the new module system.

This is causing out end customer and us (NTT DATA services) significant impact.


Comments
Code review for backport in progress. I confirmed that it fixes this testcase. I should be able to push it to 8u-dev on Monday.
21-07-2017

The problem was not fixed with JDK 9 b112 but is just hidden by changes to the Unsafe API (JDK-8149159). I verified that this *is* a duplicate of JDK-8178047 by building JDK 9 b111 and manually applying the fix.
21-07-2017

This issue is not duplicate of JDK-8178047 which is slightly different. Below result confirms the same. 8u131 - Fail 8u141 - Fail 9 ea b 111 - Fail 9 ea b 112 - Pass //Fixed here 9 ea b 178 - Pass
21-07-2017