During class unloading pending ICBuffer elements are put into a global pending queue to be released during a cleanup safepoint at some point later.
This is a simple linked list which is protected by the global InlineCacheBuffer_lock.
During code cache cleaning obtaining the lock is responsible for ~40% of total time clearing the ic callsites when unlinking is done in parallel (e.g. in G1, Shenandoah).
(Clearing ic callsites is the largest contributor of time spent even after the change)
A simple solution for this problem is to make the enqueuing lock free as cleaning the list is always done in the separate Cleanup safepoint.