Since JDK-8322732, there is a growth of calls to Unsafe.loadFence in ForkJoinPool.runWorker. For architectures that use weak memory modules, load fences mean memory barriers that require core-to-core traffics. Thus, there are regressions in some testcases.
The attached files are flame graphs of a MultiJVM mode of SPECjbb2015. Compared with flamegraph-before-8322732.svg, the cpu cycles of runWorker itself increase obviously in flamegraph-since-8322732.svg.
There is a simple workaround for this example as the attached patch, which changes two continue statements to break statements when the current thread fails to steal works, so there is less loops and less calls to Unsafe.loadFence. With the patch, the cpu cycles largely reduce as shown in flamegraph-patched-8322732.svg. A thorough patch may require more works.