Current ReferenceProcessor implementation does several passes over several lists of references, in order of their weakness.
In MT mode, it submits the processing task to the supplied executor, in hope that parallelism would be beneficial. However, in many cases, either the lists of references is empty, or the number of actual references is zero (e.g. after CMS-style pre-cleaning), or the number of references falls to zero in the following phases.
This makes RP to start the executor task, even if we know there is no work. We would then waste time waiting for threads to wake up, discover there is no work, rendezvous and terminate.
It makes sense to check the amount of work before even submitting work to executor, for example, with:
http://cr.openjdk.java.net/~shade/8181214/webrev.02/
This saves 0.5-4 ms of pause time for low-latency GCs like Shenandoah.