JDK-8315740 : ForkJoinPool starvation in commonPool/newWorkStealingPool since java 17
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.util.concurrent
  • Affected Version: 17,20,21
  • Priority: P3
  • Status: Closed
  • Resolution: Won't Fix
  • OS: generic
  • CPU: generic
  • Submitted: 2023-09-05
  • Updated: 2023-11-14
  • Resolved: 2023-11-14
Related Reports
Relates :  
Description
ADDITIONAL SYSTEM INFORMATION :
It is OS independent as far as I know.

A DESCRIPTION OF THE PROBLEM :
When calling execute on a ForkJoinPool from a Runnable which is running inside the pool it is submitted to a local queue. The behavior changed in java 17, where locally submitted work items can starve global work items completely, and prevent them from ever running when all threads are busy. An example where this happens is when implementing actors which submit new messages to their executor, an actor which submits new requests as soon as receiving a reply can prevent all other actors from ever running by taking over all available threads.

As far as I could trace in java 16 there was a fairness bound on the amount of work processed from a local queue:

https://github.com/openjdk/jdk16u/blame/master/src/java.base/share/classes/java/util/concurrent/ForkJoinPool.java#L1012

However in java 17 there is no bound, and as long as there's one more task in the local queue thread spins without ever checking other queues:

https://github.com/openjdk/jdk17u/blame/master/src/java.base/share/classes/java/util/concurrent/ForkJoinPool.java#L1179

I have checked the latest available java 21 early access builds and this bug is still present. The problem does not reproduce when using a fixed thread pool, only with different variants of ForkJoinPool.

REGRESSION : Last worked in version 16

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
1. Compile the provided MadSquirrel.java.
2. Run java MadSquirrel --squirrels 16 (where 16 is e.g. twice as much as the number of cores), you may also add --work-stealing-pool 8 arguments to specify the fixed number of threads when your cpu has too many cores
3. Observe the run stats printed every second, where each value after "Runs per second" corresponds to the number of times a particular squirrel executed that second.

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
All squirrels should run periodically, there should be columns of "0".
ACTUAL -
Starting from java 17 I only ever observe at most the number of cores of squirrels running, others are starved and never given a chance to run after submission. For example:

% JAVA_HOME=~/tmp/jdk-17.0.9+5/Contents/Home java MadSquirrel --squirrels 16 --work-stealing-pool 8
Runs per second: 21288834, 28629451, 14856813, 15717559, 17508366, 25000050, 25665850, 26995270, 0, 0, 0, 0, 0, 0, 0, 0, total = 175662193
Runs per second: 19274901, 44231828, 16977416, 16935325, 6395690, 45072388, 46884795, 25108702, 0, 0, 0, 0, 0, 0, 0, 0, total = 220881045
Runs per second: 16874261, 42383196, 16365358, 16377313, 6774456, 41708803, 41877243, 24265580, 0, 0, 0, 0, 0, 0, 0, 0, total = 206626210
...

The same behavior with the latest java 21:

% JAVA_HOME=~/tmp/jdk-21+35/Contents/Home java MadSquirrel --squirrels 16 --work-stealing-pool 8
Runs per second: 27476990, 34773738, 30294322, 16072980, 15731359, 31293077, 30014221, 34929121, 0, 0, 0, 0, 0, 0, 0, 0, total = 220585808
Runs per second: 44260204, 45125972, 45126419, 12301768, 12364036, 43841939, 44674170, 45869310, 0, 0, 0, 0, 0, 0, 0, 0, total = 293563818
Runs per second: 43018632, 42239571, 43919551, 12150496, 12267957, 42320011, 41783814, 43268962, 0, 0, 0, 0, 0, 0, 0, 0, total = 280968994
...

---------- BEGIN SOURCE ----------
import java.util.ArrayList;
import java.util.concurrent.Executor;
import java.util.concurrent.Executors;
import java.util.concurrent.ForkJoinPool;
import java.util.concurrent.atomic.AtomicInteger;

public class MadSquirrel implements Runnable {
    private final Executor executor;
    private final AtomicInteger counter = new AtomicInteger(0);

    public MadSquirrel(Executor executor) {
        this.executor = executor;
    }

    public int getCount() {
        return counter.getAndSet(0);
    }

    @Override
    public void run() {
        counter.incrementAndGet();
        executor.execute(this);
    }

    public static void main(String[] args) throws InterruptedException {
        int nSquirrels = 1;
        Executor executor = ForkJoinPool.commonPool();
        for (int i = 0; i < args.length; ++i) {
            switch (args[i]) {
                case "--work-stealing-pool": {
                    int threads = Integer.parseInt(args[++i]);
                    executor = Executors.newWorkStealingPool(threads);
                    break;
                }
                case "--fixed-thread-pool": {
                    int threads = Integer.parseInt(args[++i]);
                    executor = Executors.newFixedThreadPool(threads);
                    break;
                }
                case "--squirrels": {
                    nSquirrels = Integer.parseInt(args[++i]);
                    break;
                }
                default: {
                    System.err.printf("Unsupported argument: %s\n", args[i]);
                    System.exit(1);
                }
            }
        }

        var squirrels = new ArrayList<MadSquirrel>();
        for (int i = 0; i < nSquirrels; ++i) {
            var squirrel = new MadSquirrel(executor);
            squirrels.add(squirrel);
            executor.execute(squirrel);
        }

        while (true) {
            Thread.sleep(1000);
            var sb = new StringBuilder();
            int total = 0;
            for (int i = 0; i < squirrels.size(); ++i) {
                if (i != 0) {
                    sb.append(", ");
                }
                int count = squirrels.get(i).getCount();
                sb.append(count);
                total += count;
            }
            System.out.printf("Runs per second: %s, total = %d\n", sb.toString(), total);
        }
    }
}

---------- END SOURCE ----------

FREQUENCY : always



Comments
I've confirmed that using an Executor whose `execute(Runnable r)` calls `ForkJoinPool.externalSubmit(ForkJoinTask.adapt(r))` achieves the desired "fairness": ``` case "--fork-join-externalsubmit": { int threads = Integer.parseInt(args[++i]); var pool = new ForkJoinPool(threads, ForkJoinPool.defaultForkJoinWorkerThreadFactory, null, true); executor = task -> pool.externalSubmit(ForkJoinTask.adapt(task)); break; } ``` ``` java MadSquirrel --squirrels 20 --fork-join-externalsubmit 10 Runs per second: 1950813, 26484034, 2941187, 2940493, 1951152, 24442016, 2940472, 2940922, 1951605, 25275126, 10272354, 22235531, 27480808, 1954165, 1954500, 22612098, 2944212, 2944232, 22283414, 1954014, total = 210453148 Runs per second: 3171805, 14401737, 1246505, 1246505, 3171989, 8985171, 1246460, 1246461, 3171946, 9243859, 22748018, 22179720, 8950316, 3169185, 3169184, 12235240, 1242861, 1242860, 19986538, 3169178, total = 145225538 Runs per second: 2947505, 17408343, 1078550, 1078554, 2954043, 5759816, 1078565, 1078567, 2954109, 6094333, 18308031, 18602744, 9357414, 2955398, 2955404, 12854512, 1079283, 1079285, 18690607, 2958126, total = 131273189 Runs per second: 3221390, 14600618, 1004861, 1004859, 3215126, 6110517, 1004913, 1004912, 3215343, 5312783, 15397929, 19857362, 5768874, 3214214, 3215576, 11758141, 1004710, 1004710, 19125516, 3212944, total = 123255298 Runs per second: 3508079, 14275905, 927952, 927950, 3507726, 5922300, 927882, 927880, 3507469, 7655308, 19765969, 19201687, 8343355, 3507975, 3506608, 11845520, 927443, 927441, 20409728, 3506521, total = 134030698 Runs per second: 3381358, 16394933, 938389, 938389, 3381276, 6107540, 938621, 938621, 3381811, 6654783, 17822243, 19731812, 7497809, 3381160, 3381161, 14305078, 938551, 938551, 17930980, 3381172, total = 132364238 Runs per second: 2650763, 14468504, 973730, 973731, 2650724, 5934123, 973500, 973507, 2650157, 7553126, 18272893, 20126955, 6127118, 2650137, 2650135, 14963756, 973501, 973501, 19911101, 2650119, total = 129101081 Runs per second: 3116438, 13155007, 899638, 899635, 3116613, 6050502, 899632, 899624, 3116665, 6299973, 19239222, 16704649, 7169992, 3116717, 3117280, 13474028, 899663, 899664, 19580334, 3117413, total = 125772689 Runs per second: 3278807, 16743468, 993863, 993867, 3278810, 5982101, 993873, 993876, 3279972, 5736165, 16225781, 20860973, 5427348, 3279932, 3279370, 14719202, 994417, 994416, 18415016, 3279244, total = 129750501 Runs per second: 2607012, 14816890, 943412, 943409, 2606833, 5925504, 943512, 943510, 2605620, 5936487, 20339824, 20009976, 6734920, 2605608, 2605606, 11387450, 942930, 942929, 20850353, 2605600, total = 127297385 Runs per second: 3656319, 14964419, 919513, 919514, 3662194, 6120975, 919409, 919409, 3662203, 5589675, 20719701, 19136639, 5850021, 3662213, 3662215, 14011076, 919411, 919412, 15914426, 3662225, total = 129790969 Runs per second: 3314695, 15772955, 930591, 930590, 3308920, 5744762, 930588, 930588, 3308917, 6677237, 20475978, 21862172, 6337645, 3308922, 3308921, 13773912, 930588, 930587, 19934671, 3308916, total = 136022155 ```
14-11-2023

One thing which could be tried in this case, as a user, would be to explicitly do a ForkJoinPool.externalSubmit instead of when-running-the-task-issue-execute(this) to have a different "fairness profile" during execution (also probably with a different performance profile) https://docs.oracle.com/en/java/javase/20/docs/api/java.base/java/util/concurrent/ForkJoinPool.html#externalSubmit(java.util.concurrent.ForkJoinTask) If you try that out, report back if that solved your use-case.
18-09-2023

ForkJoinPool has never specified any global fairness guarantees, and versions have varied in behavior under unbounded generation of local tasks, unless set up to have more threads (parallelism) than actors. ForkJoin programs may rely on strictly local-first execution of recursive subtasks, bit cannot rely on this not being the case. Upon realizing that there is no universally correct upper bound, some mechanics in some versions to limit this were removed.
08-09-2023

The observations on Windows 11: JDK 17ea+4: Passed, no column of "0" JDK 17ea+5: Failed, observed columns of "0" JDK 20: Failed. JDK 21ea+33: Failed.
06-09-2023