The scenario is that a CMS marking cycle starts, not only it blocks the application, but the cycle is somehow also very slow. I can easily reproduce it with jbb2005 on an 8-way SPARC with the following cmd line parameters:
set heap_size = 64m
-XX:ParallelGCThreads=8
-XX:+DisableExplicitGC
-Xms${heap_size}
-Xmx${heap_size}
-XX:TLABSize=32k
-XX:+UseConcMarkSweepGC
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:PrintCMSStatistics=2
-XX:+CMSScavengeBeforeRemark
-XX:+PrintClassHistogram
Here's the output:
Loading Warehouse 1...
1.550: [GC 1.550: [ParNew: 13184K->1600K(14784K), 0.0214458 secs] 13184K->3095K(63936K), 0.0217076 secs]
1.805: [GC 1.805: [ParNew: 14784K->1600K(14784K), 0.0301589 secs] 16279K->6038K(63936K), 0.0303027 secs]
2.082: [GC 2.082: [ParNew: 14784K->1600K(14784K), 0.0476443 secs] 19222K->11914K(63936K), 0.0477794 secs]
2.359: [GC 2.359: [ParNew: 14784K->1600K(14784K), 0.0633704 secs] 25098K->19846K(63936K), 0.0635313 secs]
2.642: [GC 2.643: [ParNew: 14784K->1600K(14784K), 0.0599469 secs] 33030K->27771K(63936K), 0.0600886 secs]
2.703: [GC [1 CMS-initial-mark: 26171K(49152K)] 28015K(63936K), 0.0132906 secs]
2.717: [CMS-concurrent-mark-start]
=> the app gets stuck here for a long time, but eventually it gets unstuck
Finished cms space scanning in 1th thread: 28.661 sec
Finished cms space scanning in 0th thread: 28.662 sec
Finished perm space scanning in 1th thread: 0.052 sec
Finished perm space scanning in 0th thread: 0.054 sec
Finished work stealing in 1th thread: 0.003 sec
Finished work stealing in 0th thread: 0.000 sec
31.433: [CMS-concurrent-mark: 28.338/28.716 secs]
(CMS-concurrent-mark yielded 591546 times)
31.433: [CMS-concurrent-preclean-start]
Notice the very long marking cycle (28.6 sec!!! for a 64MB heap) and the high number of yields (almost 600,000).
For comparison, the normal behavior is this (with -XX:ParallelCMSThreads=1)
Loading Warehouse 1...
1.554: [GC 1.555: [ParNew: 13184K->1600K(14784K), 0.0255298 secs] 13184K->3094K(63936K), 0.0257375 secs]
1.815: [GC 1.815: [ParNew: 14784K->1600K(14784K), 0.0349030 secs] 16278K->6038K(63936K), 0.0350438 secs]
2.096: [GC 2.096: [ParNew: 14784K->1600K(14784K), 0.0644119 secs] 19222K->11914K(63936K), 0.0645546 secs]
2.383: [GC 2.383: [ParNew: 14784K->1600K(14784K), 0.0737211 secs] 25098K->19846K(63936K), 0.0738495 secs]
2.673: [GC 2.673: [ParNew: 14784K->1600K(14784K), 0.0755298 secs] 33030K->27771K(63936K), 0.0756629 secs]
2.749: [GC [1 CMS-initial-mark: 26171K(49152K)] 28015K(63936K), 0.0136178 secs]
2.763: [CMS-concurrent-mark-start]
3.107: [CMS-concurrent-mark: 0.329/0.344 secs]
(CMS-concurrent-mark yielded 6538 times)
3.107: [CMS-concurrent-preclean-start]
(cardTable: 971 cards, re-scanned 971 cards, 1 iterations)
3.116: [CMS-concurrent-preclean: 0.007/0.009 secs]
(CMS-concurrent-preclean yielded 225 times)
3.116: [CMS-concurrent-abortable-preclean-start]
(cardTable: 191 cards, re-scanned 191 cards, 1 iterations)
(cardTable: 2 cards, re-scanned 2 cards, 1 iterations)
I managed to hit this bug with SPECjvm98 too. These are the command line parameters I used.
java -XX:+UseConcMarkSweepGC -XX:NewSize=8m -XX:+DisableExplicitGC -XX:+PrintGCDetails -Xms32m -Xmx32m -XX:CMSInitiatingOccupancyFraction=10 SpecApplication -g _201_compress _202_jess _209_db _213_javac _222_mpegaudio _227_mtrt _228_jack
Notice that I start a CMS cycle very early, to increase the probability of a CMS cycle taking place during class loading and, hence, during direct allocation.
This is what I get with the original JVM:
[CMS-concurrent-mark: 2.613/2.647 secs]
(CMS-concurrent-mark yielded 53247 times)
...and this is the output with the fix:
[CMS-concurrent-mark: 0.011/0.096 secs]
(CMS-concurrent-mark yielded 9 times)
Again, this is very reproducible.