During an upgrade from JDK11 to JDK15 (and JDK17) we experience an increase in garbage collection latency from approximately 10-15ms at p99.9 to approximately 45-50ms. Logs showed the object copy phase consuming most of that time.
Enabling task queue stats on the build and increasing the log level showed that one evacuation thread is doing far more work than the other threads.
For example, in this cycle we see one thread spend 67.2ms on Object Copy, while the others spend far less time:
{code}
148905 [2021-12-06T22:12:38.338+0000][debug][gc,phases        ] GC(581)     Object Copy (ms):              Min: 13.3, Avg: 16.6, Max: 67.2, Diff: 53.8, Sum: 546.7, Workers: 33
148906 [2021-12-06T22:12:38.338+0000][trace][gc,phases,task   ] GC(581)                                    14.8 15.4 15.3 16.0 14.7 14.6 16.6 15.3 16.3 15.2 67.2 13.8 14.1 16.5 14.5 14.3 14.1 14.9 15.3 15.8 13.3 15.6 15.0 15.6 14.7 15.6 14.6 14.3 15.1 15.2 14.4 13.5 14.9
{/code}
Looking back, we see a pattern in these cycles where the thread doing most of the work scans only a 2 or 3 "blocks" (here, just 2):
{code}
148899 [2021-12-06T22:12:38.338+0000][debug][gc,phases        ] GC(581)       Scanned Blocks:                Min: 2, Avg: 1203.9, Max: 1875, Diff: 1873, Sum: 39730, Workers: 33
148900 [2021-12-06T22:12:38.338+0000][trace][gc,phases,task   ] GC(581)                                       949  1838  1397  1548  1450  1875  821  1067  1312  1463  2  1282  1319  38  1177  1199  819  1170  897  1343  1860  1070  1059  1552  1217  1296  1068  1092  1645  1166  1002  1140  1597
{/code}
And in the task queue stats this thread performs an order of magnitude more operations on the task queue:
{code}
149005 [2021-12-06T22:12:38.339+0000][trace][gc,task,stats    ] GC(581)   8      51757      51143       9173     121241       9379          0          0
149006 [2021-12-06T22:12:38.339+0000][trace][gc,task,stats    ] GC(581)   9      37328      36045       6291     113112       8332          0          0
149007 [2021-12-06T22:12:38.339+0000][trace][gc,task,stats    ] GC(581)  10     350079      77644          0        132          0     267666     226166
149008 [2021-12-06T22:12:38.339+0000][trace][gc,task,stats    ] GC(581)  11      38889      37932       7100     110650       7940          0          0
149009 [2021-12-06T22:12:38.339+0000][trace][gc,task,stats    ] GC(581)  12      53762      53174       9375     110255       8637          0          0
{/code}
We traced the origin of this behavior to: https://bugs.openjdk.java.net/browse/JDK-8213108
We tried changing `ParGCArrayScanChunk`, but to no avail. Introducing a flag to override the ergonomics that select chunks per region was effective.