I am using G1 GC along with JDK 8 to do Coherence analysis with very large heaps and am seeing an odd problem. The verbose GC logs contain fewer GC entries than what I can see happening with JVisualVM. When I close the running JVMs, additional entries get flushed to the log file. This seems to be a bug to me and hinders analysis since I do not wish to have to shut down JVMs simply to collect complete GC data.
Here's an example from a recent run, where *no* data was in the log file at the end of the run:
mwm@nit1-ld3:~$ ls -lrt coh1213/logs
total 102
-rw-r--r-- 1 mwm mwm 0 Nov 26 08:58 nit1-ld3.1.gc
-rw-r--r-- 1 mwm mwm 228 Nov 26 09:00 nit1-ld3.1.out
-rw-r--r-- 1 mwm mwm 49880 Nov 26 09:40 nit1-ld3.1.log
mwm@nit1-ld3:~$ jdk/bin/jps
50359 Jstatd
50357 DefaultCacheServer
50715 Jps
mwm@nit1-ld3:~$ pkill -f java
mwm@nit1-ld3:~$ jdk/bin/jps
50717 Jps
mwm@nit1-ld3:~$ ls -lrt coh1213/logs
total 336
-rw-r--r-- 1 mwm mwm 228 Nov 26 09:00 nit1-ld3.1.out
-rw-r--r-- 1 mwm mwm 50018 Nov 26 09:41 nit1-ld3.1.log
-rw-r--r-- 1 mwm mwm 119401 Nov 26 09:41 nit1-ld3.1.gc
All of the log files are in an attachment to the bug, explained at the bottom of the description section. Previous runs had some, but not all, of the output in the GC log file when the measurement phase completed.
Here is how one of the JVMs was started:
/export/home/mwm/jdk/bin/java -server -d64 -Xms192g -Xmx192g -XX:NewSize=10g -XX:MaxGCPauseMillis=800 -XX:InitiatingHeapOccupancyPercent=75
-XX:+AggressiveOpts -XX:+UseCompressedOops -XX:+UseG1GC -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseLargePages -XX:+AlwaysPreTouch
-XX:-TieredCompilation -XX:-UseBiasedLocking -XX:+UnlockExperimentalVMOptions -XX:G1LogLevel=finest
-Dtangosol.coherence.cacheconfig=examples-cache-config.xml -Dtangosol.coherence.override=tangosol-coherence-override.xml -Dtangosol.coherence.mode=prod
-Dtangosol.coherence.distributed.localstorage=true -Dtangosol.coherence.distributed.transport.reliable=sdmb -Djava.net.preferIPv4Stack=true
-Dtangosol.coherence.grid.preprocess=true -Dtangosol.coherence.distributed.busOptimizedBackup=false -Dtangosol.coherence.distributed.asyncbackup=false
-Dtangosol.coherence.localhost=nit1-ld2-ib -Dtangosol.coherence.log=/export/home/mwm/coh1213/logs/nit1-ld2.1.log -server
-Xloggc:/export/home/mwm/coh1213/logs/nit1-ld2.1.gc -showversion
-cp /export/home/mwm/config:/export/home/mwm/coh1213/lib/coherence.jar:/export/home/mwm/faban/benchmarks/CoherenceBench/lib/CoherenceBench.jar
-Dtangosol.coherence.role=server com.tangosol.net.DefaultCacheServer > /export/home/mwm/coh1213/logs/nit1-ld2.1.out 2>&1
and here are some additional details about the test / measurement scenario:
I am running CoherenceBench using 2 x 192 GB heaps (one each on 2 T4 1
socket LDOMs with 256 GB RAM). I am using JDK8 u0 build 63:
Java HotSpot(TM) 64-Bit Server VM warning: Max heap size too large for Compressed Oops
java version "1.8.0-ea"
Java(TM) SE Runtime Environment (build 1.8.0-ea-b63)
Java HotSpot(TM) 64-Bit Server VM (build 25.0-b07, mixed mode)
Here are some of the benchmark details:
CoherenceBench using "complex" objects, ~8KB each in size
80 / 20 % mix of gets / puts
2 x 192 GB heaps (1 server on each of 2 LDOMs)
The cluster is preloaded with is preloaded with 6M entries --> 6M * 8KB == ~48 GB of primary data + ~48GB backup data
* so each heap has about ~48 GB of cache data, so is ~25% occupied
Run duration: 180 seconds "warmup", 300 seconds "steady state", 60 seconds "cooldown"
The preload phase (6M entries for about 50 GB data per JVM) of the workload takes just under 4 minutes, followed by 9 minutes of load being applied (as noted above).
JVisualVM on my laptop is running JDK 8 build 64, so should be as compatible as possible.
Here is the contents of the zip file that is attached to this bug. The *.gchisto files are output of Monica's GC post-processing script. run56E.gcpause.eden.parsed is my post processing of the "Eden" lines in the GC log, suitable for putting into a spreadsheet to plot heap growth per GC. *.out is the showversion output. The *.log files are the Coherence log output --- useful for correlating times with PrintGCDateStamps output.
Archive: cb.run55W.logs.zip
Length Date Time Name
--------- ---------- ----- ----
116030 11-26-2012 15:42 nit1-ld2.1.gc
968 11-26-2012 15:43 nit1-ld2.1.gchisto
973 11-26-2012 15:43 run56E.gcpause.eden.parsed
650 11-26-2012 16:00 nit1-ld3.screen.shot
119401 11-26-2012 15:42 nit1-ld3.1.gc
994 11-26-2012 15:43 nit1-ld3.1.gchisto
228 11-26-2012 15:18 nit1-ld2.1.out
228 11-26-2012 15:18 nit1-ld3.1.out
48919 11-26-2012 16:03 nit1-ld2.1.log
50018 11-26-2012 16:03 nit1-ld3.1.log
--------- -------
338409 10 files