In many cases the pause time predictions for mixed collections are off by a lot, causing missed pause time goals.
Further, young gen size for the next pause is very small for a few collections.
This can be seen e.g. in specjbb2013, but also in loads like CRM Fuse.
Problem seems to be wrong prediction (and slow adaption to changed realities).
A particular problematic point is prediction of RS length, which has a big impact on pause time.
The attached predication.png figure shows the problem: at first, during young-only gcs actual rs length and prediction match well; mixed gcs however cause a large change in the actual RS length (depending on the application it is a real step-function like in the figure) at t_1. The prediction does not follow immediately (actually very slowly), so the eden size is not adjusted as it should.
During mixed gcs, rs length prediction slowly starts to match the actual value, but often actually never reaching the actual value (at e.g. t_2 mixed gcs are done), i.e. never matching the pause time goal.
After mixed gcs, actual rs length goes down again, but the prediction takes some time to adapt to the change. So the young gen size is kept small for some time after the mixed gcs too (loosing more throughput).
As in these cases rs_length prediction seems to be the main problem - try adding separate prediction for young and mixed gcs.