JDK-6725714 : par compact - add a table to speed up bitmap searches
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: hs14
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • OS: generic
  • CPU: generic
  • Submitted: 2008-07-15
  • Updated: 2013-11-15
  • Resolved: 2013-05-31
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 7 JDK 8 Other
7u25Fixed 8Fixed hs24Fixed
Related Reports
Relates :  
Relates :  
Description
The parallel compacting collector (par compact, or par old) uses a table to track data about contiguous regions of the gc heap such as how much data is live in the region, whether a live object crosses onto the region, etc.  Each region is currently fairly small to limit the portion of the marking bitmap that must be searched to compute the new (after compaction) location of an object.  Regions could be made significantly larger (100x or more) by adding a very small table which holds data about object starts.  Making regions larger would shrink the size of the data used by par compact, would translate directly into speed ups of phases that operate on regions, and may also speed up pointer updates by reducing the length of bitmap searches.

Comments
The implementation under development increases the region size to 64K words, which allows the block table elements to be unsigned short. This reduces the size of the main data structures used for compaction (the sum of region table size + block table size) by ~74%. Since the data structures are sized based on the java heap size, the reduction amounts to ~1.1% of the heap size.
23-05-2013

EVALUATION The additional table, likely to be called the Block table, would have one entry for each heap Region (formerly called a Chunk; see 6725697). Each entry would hold a non-negative integer which is the amount of data in the Region that is to the left of the first object that starts in the Block. When updating pointers, the length of the bitmap that would have to be searched is then determined by the Block size (likely 128 words or 256 words), instead of the Region size. This allows regions to become much larger, reducing data overhead since each Region table entry contains 5 or 6 word-sized fields. For a Region size of up to 64K, the Block table can be an unsigned short (16 bits). The problem is filling in the new table. A single-threaded pass over the bitmap is simple, but would take too long and limit scalability. A parallel pass over the bitmap would be significantly better, but has the downside of introducing an additional barrier sync and would also spend time updating portions of the block table that would never be used (because the region contains a large array and so no object starts within the region). Filling the table lazily during compaction implicitly parallelizes it and avoids filling unused sections, but requires synchronization when filling sections of the table or runs the risk of some sections being filled more than once; lazy filling also adds a branch to the very hot pointer update path.
15-07-2008