FULL PRODUCT VERSION :
java version "1.8.0_45"
Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)
FULL OS VERSION :
Dev:
Linux jlent 3.2.0-23-generic #36-Ubuntu SMP Tue Apr 10 20:39:51 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
EC2:
Linux ip-10-79-138-239 3.14.35-28.38.amzn1.x86_64 #1 SMP Wed Mar 11 22:50:37 UTC 2015
A DESCRIPTION OF THE PROBLEM :
I have created a simple program which allocates one 500M static object and then allocates many small objects. Some of the small objects references are cached. What I observe is long and increasing ParNewGC pauses. There are several odd variables:
1) Can not reproduce using a 1G heap. A 2G heaps (and larger) do trigger the issue.
2) If the size of the static object is (approximately) less than 500,000,00 bytes or greater than 540,000,000 bytes I can not reproduce this issue.
3) Not reproducible with Oracle 6. Is reproducible with OpenJDK 6, Oracle & OpenJDK 7, and Oracle & OpenJDK 8.
Problem manifested itself as a session timeout when starting Kafka with log compaction enabled. Kafka log compaction just happens to allocate a 500M static object at startup.
THE PROBLEM WAS REPRODUCIBLE WITH -Xint FLAG: Yes
THE PROBLEM WAS REPRODUCIBLE WITH -server FLAG: Yes
REGRESSION. Last worked in version 6u45
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
1) Compile the program below.
2) Run it with the following JVM options:
-verbose:gc
-XX:+PrintGCTimeStamps
-XX:+PrintGCDetails
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-Xmx2G
-Xms2G
3) Note the ParNew GC times
EXPECTED VERSUS ACTUAL BEHAVIOR :
0.512: [GC (Allocation Failure) 0.512: [ParNew: 272640K->722K(306688K), 0.0067778 secs] 784640K->512722K(2063104K), 0.0068382 secs] [Times: user=0.02 sys=0.01, real=0.01 secs]
... 15 removed ...
11.678: [GC (Allocation Failure) 11.678: [ParNew: 274869K->2229K(306688K), 1.8754075 secs] 790029K->517662K(2063104K), 1.8754565 secs] [Times: user=7.50 sys=0.00, real=1.87 secs]
13.591: [GC (Allocation Failure) 13.591: [ParNew: 274869K->2229K(306688K), 2.0347897 secs] 790302K->517934K(2063104K), 2.0348399 secs] [Times: user=8.13 sys=0.00, real=2.04 secs]
15.663: [GC (Allocation Failure) 15.663: [ParNew^C: 274869K->2229K(306688K), 2.2072114 secs] 790574K->518207K(2063104K), 2.2072618 secs] [Times: user=8.83 sys=0.00, real=2.21 secs]
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
import java.util.ArrayList;
public class LongParNewPause {
static byte[] bigStaticObject;
public static void main(String[] args) throws Exception {
int bigObjSize = args.length > 0 ? Integer.parseInt(args[0]) : 524288000;
int littleObjSize = args.length > 1 ? Integer.parseInt(args[1]) : 100;
int saveFraction = args.length > 2 ? Integer.parseInt(args[2]) : 1000;
bigStaticObject = new byte[bigObjSize];
ArrayList<byte[]> holder = new ArrayList<byte[]>();
int i = 0;
while (true) {
byte[] local = new byte[littleObjSize];
if (i++ % saveFraction == 0) {
holder.add(local);
}
}
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
Configure Kafka to use a log de-dupe buffer outside the "problem" range:
log.cleaner.dedupe.buffer.size=
The real problem is that this issue was very difficult to debug. Also I am concerned this is just a special case of a more generic issue.