(synopsis needs to be updated once we discover what is wrong exactly)
Originally found here:
  http://stackoverflow.com/questions/30392753/forkjoinpool-phaser-and-managed-blocking-to-what-extent-do-they-works-against
The sample code:
import java.util.concurrent.*;
import java.util.concurrent.atomic.*;
public class TestForkJoinPool {
    final static ExecutorService pool = Executors.newWorkStealingPool(8);
    private static volatile long consumedCPU = System.nanoTime();
    private static final AtomicInteger counter = new AtomicInteger();
    public static void main(String[] args) throws InterruptedException {
        final int numParties = 100;
        final Phaser p = new Phaser(1);
        final Runnable r = () -> {
            int idx = counter.incrementAndGet();
            System.out.println(idx + " arrived at register");
            p.register();
            System.out.println(idx + " arrived at awaitAdvance");
            p.arriveAndAwaitAdvance();
            System.out.println(idx + " arrived at deregister");
            p.arriveAndDeregister();
        };
        for (int i = 0; i < numParties; ++i) {
            consumeCPU(1000000);
            pool.submit(r);
        }
        while (p.getArrivedParties() != numParties) {}
    }
    static void consumeCPU(long tokens) {
        // Taken from JMH blackhole
        long t = consumedCPU;
        for (long i = tokens; i > 0; i--) {
            t += (t * 0x5DEECE66DL + 0xBL + i) & (0xFFFFFFFFFFFFL);
        }
        if (t == 42) {
            consumedCPU += t;
        }
    }
}
8u20 works finishes fine.
8u40 gets stuck.
8u40 + jsr166 jar (2015-05-22) bootclasspathed gets stuck.
8u20 + jsr166 jar (2015-05-22) bootclasspathed gets stuck.
This points to an issue in java.util.concurrent somewhere between 8u20 and 8u40.
Note that current jsr166 already contains the fix for JDK-8078490, so it does look like a different issue.
It would seem pre-8u40 FJP produces more threads to handle external submissions, while workers get stuck on Phaser.
post-8u40 seem to produce only ten threads on my test rig, and then the test gets stuck.