In the current scheme to reduce initial card marks (-XX:+ReduceInitialCardMarks) G1 marks entire objArrays as dirty. I.e. the current code pushes and enqueues the entire array as a sequence of cards.
This wastes a lot of time and space at every stage (splitting it into a sequence of cards, time to do this, space to do this, and then for every card have the given overhead to find the object start etc). It also adds latency during the gc pause due to handling of the most recent "deferred card marks".
Following options:
1) improve the deferred card mark algorithm to not have that much overhead, or that that overhead is not always deferred until the start of the gc pause; one option would be to disable the mechanism for (large) objArrays
2) improve the mechanism to invalidate large areas of memory. At the moment it is very time and space intensive. Note that even with option 1), this may be interesting for cases when these large amounts of cards are not generated by the deferred card marking algorithm.