Bug ID: JDK-8272083 G1: Move precise BOT updates in evacuation to concurrent phase

Type: Enhancement
Component: hotspot
Sub-Component: gc

Priority: P4
Status: Closed
Resolution: Won't Fix

Submitted: 2021-08-06
Updated: 2022-01-22
Resolved: 2022-01-22

Created on behalf of yude.lyd@alibaba-inc.com
----
When we call block_start(addr), we ask the BOT to give us the starting address of the block that covers 'addr'.
I think BOT is not efficiently handling the queries if we call block_start(addr) sequentially, that is, call it multiple times
with ascending addresses and close to each other.


For example, for two addresses addr1 and addr2, we might have the following heap layout:


| q     | addr1          | addr2
v       v                v
——————————————————————————————————————————
|       |                |
——————————————————————————————————————————
^
| gc alloc block ------------------- |


It's possible that BOT only records the large gc-allocated blocks but not individual objects in them.
So when we call block_start() with either addr1 or addr2, it will return q.
BOT has the ability to fix itself. When we call block_start(addr1), it will fix from q to addr1, looking
for all of the real objects in the range and update the entries. After that BOT will not return the
gc-allocated block address q but the real objects' addresses.
So calling block_start(addr1) and block_start(addr2) in different orders result in different work:

Case 1: {
call block_start(addr1)

block_start returns q

enter slow path {
  fix bot from q to addr1
  return head_of(addr1)
}

call block_start(addr2)

block_start returns q

enter slow path {
  fix bot from q to addr2
  return head_of(addr2)
}
}

Case 2: {
call block_start(addr2)

block_start returns q

enter slow path {
  fix bot from q to addr2
  return head_of(addr2)
}

call block_start(addr1)

will not enter slow path because range q to addr1 is already fixed
return head_of(addr1)
}


The difference between Case 1 and 2 is the repeated traversal between q and addr1.
This affects G1ScanHRForRegionClosure::scan_heap_roots and potentially any future code that walks card table for dirty oops in
ascending address order. It won't matter a lot if the address are sparse enough. But, say if we do -XX:G1ConcRefinementGreenZone=1000000
and log the function forward_to_block_containing_addr_slow(), we will find there are some entries with the same q.

The original issue has been resolved by updating BOT when allocating into a PLAB during evacuation (JDK-8276098). This bug id has been repurposed to discuss how move the updates out of safepoint (that is updating concurrently) might further reduce pause time. However, I haven't been able to find very strong data that shows there is going to be an improvement. And the added complexity will discourage us from doing concurrent update. I'm going to close this shortly if there is no objection.

19-01-2022

Blocks :	JDK-8276229 - Stop allowing implicit updates in G1BlockOffsetTable
Relates :	JDK-8276098 - Do precise BOT updates in G1 evacuation phase