On a customer test system we can see increased fragmentation with 6u21+
leading to promotion failures, which could not be observed with 6u20-
I attach two logfiles which show the issue. Here is the interesting
section:
Statistics for IndexedFreeLists:
--------------------------------
Total Free Space: 475074
Max Chunk Size: 254
Number of Blocks: 8584
Av. Block Size: 55
free=105894392 frag=1,0000 <<<<<<<<<<<
Before GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 0
Max Chunk Size: 0
Number of Blocks: 0
Tree Height: 0
Statistics for IndexedFreeLists:
--------------------------------
Total Free Space: 0
Max Chunk Size: 0
Number of Blocks: 0
free=0 frag=0,0000
9421,892: [ParNew (promotion failed) <<<<<<<<<<<<<<<<<<
Desired survivor size 163840 bytes, new threshold 0 (max 0)
: 48512K->48512K(48832K), 0,2599337 secs]9422,152: [CMS (concurrent mode failure)size[2]
A second customer reports an issue with 6u22 with these jvm options.
-XX:+CMSConcurrentMTEnabled -XX:CMSInitiatingOccupancyFraction=60 -XX:CMSMaxAbortablePrecleanTime=3600000 -XX:+CMSParallelRemarkEnabled -XX:+CMSParallelSurvivorRemarkEnabled -XX:+CMSScavengeBeforeRemark -XX:+HeapDumpOnOutOfMemoryError -XX:InitialHeapSize=32212254720 -XX:MaxHeapSize=32212254720 -XX:MaxNewSize=2147483648 -XX:MaxTenuringThreshold=4 -XX:NewSize=2147483648 -XX:ParallelCMSThreads=8 -XX:PermSize=67108864 -XX:+PrintCommandLineFlags -XX:PrintFLSStatistics=1 -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintTenuringDistribution -XX:+UseBiasedLocking -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseLargePages -XX:+UseMembar -XX:+UseNUMA -XX:+UseParNewGC
After 14 CMS cycles with 6u22, they report a promotion failure, where a failure happened when tenured space filled to 15G. This is strange as the heap usually fills to 20G before a CMS cycle is triggered.
Here is the pertinent section from the gc log.
2010-11-18T19:30:38.703-0600: [GC Before GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 2004318869
Max Chunk Size: 57061
Number of Blocks: 711866
Av. Block Size: 2815
Tree Height: 90
Before GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 1786368
Max Chunk Size: 1786368
Number of Blocks: 1
Av. Block Size: 1786368
Tree Height: 1
[ParNew (promotion failed)
Desired survivor size 107347968 bytes, new threshold 1 (max 4)
- age 1: 152613256 bytes, 152613256 total
: 1861587K->1887488K(1887488K), 1.7214606 secs][CMSCMS: Large block 0xfffffd797ec6d6a0
: 13801830K->2261237K(29360128K), 46.1976857 secs] 15542483K->2261237K(31247616K), [CMS Perm : 47619K->47461K(79492K)]After GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: -826325716
Max Chunk Size: -826325716
Number of Blocks: 1
Av. Block Size: -826325716
Tree Height: 1
After GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 0
Max Chunk Size: 0
Number of Blocks: 0
Tree Height: 0
, 47.9259930 secs] [Times: user=50.62 sys=0.02, real=47.93 secs]
Engineering reported that it almost seems like some kind of bug in CMS allocation because there is plenty of free space (and comparatively not that much promotion) when the promotion failure occurs (although full gc logs would be needed before one could be confident of this pronouncement). So it would be worthwhile to investigate this closely to see why this is happening. I somehow do not think this is a tuning issue, but something else.
Also, in 6u21 we integrated 6631166 which is aimed at reducing fragmentation.
If at all possible, could you down-rev to 6u20 (i.e. pre-6u21) and see
if the onset of fragmentation changes in any manner?
The customer repeated the test with 6u20. There were no promotion failures, even after 7000 cycles. CMS cycles were always triggered after tenured gen filled to 20G.