JDK-8199920 : 140x slow-down using -Xcheck:jni and java.util.zip.DeflaterOutputStream
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.util.jar
  • Affected Version: 9.0.4
  • Priority: P3
  • Status: Open
  • Resolution: Unresolved
  • OS: linux
  • CPU: x86_64
  • Submitted: 2018-03-21
  • Updated: 2018-04-09
Description
FULL PRODUCT VERSION :


A DESCRIPTION OF THE PROBLEM :
https://bugs.openjdk.java.net/browse/JDK-6311046

added implicit copying of primitive arrays when running with -Xcheck:jni that yields a 140x slow-down in the code below that uses a deflater output stream to compress an 8MB array.

The underlying issue is that the JNI code is called twice for every 1KB of input, and the input byte array is copied for each call because of the feature above. This yields GBs of copying and a slow-down that makes -Xcheck:jni unusable in a large number of situations.

A mailing list discussion of this exists here:
http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-March/051898.html


THE PROBLEM WAS REPRODUCIBLE WITH -Xint FLAG: Yes

THE PROBLEM WAS REPRODUCIBLE WITH -server FLAG: Yes

REGRESSION.  Last worked in version 7u80

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Run program in description with and without -Xcheck:jni and see that the performance is > 100x worse.

EXPECTED VERSUS ACTUAL BEHAVIOR :
-Xcheck:jni shouldn't be orders of magnitude slower than non -Xcheck:jni
ERROR MESSAGES/STACK TRACES THAT OCCUR :
No error/crash this is a performance issue.

REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.util.Random;
import java.util.zip.DeflaterOutputStream;

public final class CheckJniTest {
 static void deflateBytesPerformance() throws IOException {
   byte[] inflated = new byte[1 << 23];
   new Random(71917367).nextBytes(inflated);
   ByteArrayOutputStream deflated = new ByteArrayOutputStream();
   try (DeflaterOutputStream dout = new DeflaterOutputStream(deflated)) {
     dout.write(inflated, 0, inflated.length);
   }
   if (8391174 != deflated.size()) {
     throw new AssertionError();
   }
 }

 public static void main(String args[]) throws IOException {
   int n = 5;
   if (args.length > 0) {
     n = Integer.parseInt(args[0]);
   }
   for (int i = 0; i < n; i++) {
     long startTime = System.currentTimeMillis();
     deflateBytesPerformance();
     long endTime = System.currentTimeMillis();
     System.out.println("Round " + i + " took " + (endTime - startTime) +
                        "ms");
   }
 }
}
---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
Changing the code to use GetByteArrayRegion yields similar performance, removes a time to safepoint (TTSP) issue and is on the mailing list thread.