The remembered sets trade off between iteration time and memory footprint involves coarsening bitmaps when running out of bitmaps. This model is a bit crude, and use cases exist where excessive coarsening leads to lots of unnecessary card scanning.
A more smooth model for lowering the precision when remembered sets become saturated would be to use a trie data structure.
A remembered set would then consist of a predetermined number of nodes to use in the trie, and when nodes run out, a single node could be coarsened, and the parent would consider the coarsened child as "present and 100% saturated". This would be a more smooth coarsening model, hopefully resulting in lowered remembered set scan times.
The initial suggestion is to have the branching factors 5, 5, 6, adding up to 16 bits, which is the current maximal number of cards in a region.
Another benefit of this would be that remembered sets would have a completely deterministic size, and can have their memory allocated up front. Also, the remembered set size could be configurable, allowing for fine tuning depending on application requirements.
Worth to consider: keeping the sparsePRT for when there are very few entries in a given region. The amount of overhead for sparse insertions could be too large.