I was surprised to see that CTW times have regressed considerably in recent JDKs. For example, Linux x86_64 fastdebug time is extremely bad:
$ time CONF=linux-x86_64-server-fastdebug make test TEST="applications/ctw/modules/java_base.java"
real 26m13s
user 71m22s
sys 185m55s
For comparison, the same run in current 17u-dev does it in:
real 2m24s
user 7m32s
sys 0m18s
A brief profiling shows the code spends lot of time spinning for deoptimization here: https://github.com/openjdk/jdk/blob/1358850aa63a2874031ca33eba278432fd09d6ab/src/hotspot/share/runtime/deoptimization.cpp#L193-L195 -- code added by JDK-8300926 in JDK 21. CTW runner deoptimizes methods often to let newer versions of the method to compile.
JDK-8300926 likely explains why "sys" time is this bad: CTW runner executes multiple threads, most of them spending time in kernel yielding. Adding -XX:ActiveProcessorCount=1 helps to avoid this:
$ time CONF=linux-x86_64-server-fastdebug make test TEST="applications/ctw/modules/java_base.java" TEST_VM_OPTS="-XX:ActiveProcessorCount=1"
real 31m42s
user 32m31s
sys 0m17s
If we cannot figure out if JDK-8300926 could be made better, maybe we should be running CTW with 1 thread by default, and rely on external parallelism to utilize resources better.
Note that "real" time is still bad. It might have some relation to JDK-8290025, which removed the sweeper, and that now relies on GC to unload the methods promptly. We might be running into the same issue we fixed earlier when Sweeper was present (JDK-8238247). Might need to call GC explicitly now?