JDK-8330017 : ForkJoinPool stops executing tasks due to ctl field Release Count (RC) overflow
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.util.concurrent
  • Affected Version: 11,17
  • Priority: P4
  • Status: Open
  • Resolution: Unresolved
  • OS: generic
  • CPU: generic
  • Submitted: 2024-04-05
  • Updated: 2025-06-12
Related Reports
Relates :  
Relates :  
Description
ADDITIONAL SYSTEM INFORMATION :
Running on Linux x64, JDK 17.0.10

A DESCRIPTION OF THE PROBLEM :
RC part of ctl field keeps decreasing until it reaches the min 16 bit negative number (-32768) and then on the next decrement, the value overflows to +32767 (value equals to ForkJoinPool.MAX_CAP) and then it stops executing tasks.

I saw this issue in an application that had been running for 2/3 months.
When it happens, the threads that are waiting for the result of a CompletableFuture.join() are blocked forever, because the future never completes.

Cannot reproduce the issue with this test in Java >= 19.0.2.
I think the issue was indirectly fixed in this ticket https://bugs.openjdk.org/browse/JDK-8277090
because the ctl RC field definition changed from:
RC: Number of released (unqueued) workers minus target parallelism
to
RC: Number of released (unqueued) workers

Since RC is not the result of a subtraction anymore, it shouldn't become negative.

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Run the provided script with the command:
java --add-opens java.base/java.util.concurrent=ALL-UNNAMED FJPOverflow

The key to this test is to force as many pool resizes as possible, so I set a low keep-alive time for the threads.
Without this trick, it takes a long time to reproduce the issue.

Until the program prints the string: "If you see this, FJP is executing tasks", ForkJoinPool are correctly executing submitted tasks. When RC overflows to 32767, the string will never be printed again.
It takes about 1 hour to reach this condition naturally.

Output example near the block condition:
CTL=(-9222527629104513024), RC=(10000000 00000010 , -32766), TC=(11111111 11111100 , -4), SS=(00000000 00000000 , 0), ID=(00000000 00000000 , 0)
If you see this, FJP is executing tasks
CTL=(-9222527624809545728), RC=(10000000 00000010 , -32766), TC=(11111111 11111101 , -3), SS=(00000000 00000000 , 0), ID=(00000000 00000000 , 0)
If you see this, FJP is executing tasks
CTL=(9223372019674972185), RC=(01111111 11111111 , 32767), TC=(11111111 11111100 , -4), SS=(00000000 00000001 , 1), ID=(00000000 00011001 , 25)
CTL=(9223372019674972185), RC=(01111111 11111111 , 32767), TC=(11111111 11111100 , -4), SS=(00000000 00000001 , 1), ID=(00000000 00011001 , 25)
CTL=(9223372019674972185), RC=(01111111 11111111 , 32767), TC=(11111111 11111100 , -4), SS=(00000000 00000001 , 1), ID=(00000000 00011001 , 25)

Not recommended:
With the program argument -c, the ctl value is set to -9222809108376190976L (RC=-32767) using reflection to speed up the issue reproduction. I added this condition to perform some tests. Run the program without the -c argument to reproduce the issue naturally.

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
the RC value should not continue to decrease and should not overflow
ACTUAL -
the RC value keeps decreasing until -32768, and then overflows to +32767

---------- BEGIN SOURCE ----------
import java.lang.reflect.Field;
import java.util.concurrent.ForkJoinPool;
import java.util.concurrent.TimeUnit;

public class FJPOverflow {
    public static final int TASKS_PER_ITERATION = 100;
    public static final int TASK_DURATION_MS = 5;
    public static final int ITERATION_DELAY_MS = 100; //Try increment this value if RC is not decreasing
    public static final int THREAD_KEEP_ALIVE_MS = 10; //Low keep alive time to trigger frequents pool resize
    public static final int MAX_CAP = 0x7fff; //(32767) Same as ForkJoinPool.MAX_CAP

    //    RC=-32767   |        TC       |        SS       |        ID
    //10000000 00000001 11111111 11111011 00000000 00000000 00000000 00000000
    public static final long RC_NEG_32767 = -9222809108376190976L;

    //    RC=-32000   |        TC       |        SS       |        ID
    //10000011 00000000 11111111 11111011 00000000 00000000 00000000 00000000
    public static final long RC_NEG_32000 = -9006917801239117824L;

    //      RC=-1     |        TC       |        SS       |        ID
    //11111111 11111111 11111111 11111011 00000000 00000000 00000000 00000000
    public static final long RC_NEG_1 = -21474836480L;

    private static final ForkJoinPool fjp = new ForkJoinPool(
            Runtime.getRuntime().availableProcessors(),
            ForkJoinPool.defaultForkJoinWorkerThreadFactory,
            null,
            false,
            0,
            MAX_CAP,
            1,
            null,
            THREAD_KEEP_ALIVE_MS,
            TimeUnit.MILLISECONDS
    );

    public static void main(String[] args) throws InterruptedException {
        var options = new Options(args);

        var iterationDelay = ITERATION_DELAY_MS;
        if(options.forceCtl) {
            iterationDelay = 1000;
            setCtl(RC_NEG_32767);
        }

        while(true) {
            runTasks();
            printCtl();
            runAsync(() -> System.out.println("If you see this, FJP is executing tasks"));
            Thread.sleep(iterationDelay);
        }
    }

    private static void setCtl(long value) {
        try {
            Field field = fjp.getClass().getDeclaredField("ctl");
            field.setAccessible(true);
            field.setLong(fjp, value);
        } catch (IllegalAccessException | NoSuchFieldException e) {
            throw new RuntimeException(e);
        }
    }

    private static void runTasks() {
        for (int i = 0; i < TASKS_PER_ITERATION; i++) {
            runAsync(FJPOverflow::sleepCallback);
        }
    }

    private static void runAsync(Runnable block) {
        fjp.execute(block);
    }

    private static void sleepCallback() {
        try {
            Thread.sleep(TASK_DURATION_MS);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
    }

    private static void printCtl() {
        try {
            Field field = fjp.getClass().getDeclaredField("ctl");
            field.setAccessible(true);
            long value = (long) field.get(fjp);
            System.out.println(ctlAsBinary(value));
        } catch (NoSuchFieldException | IllegalAccessException e) {
            e.printStackTrace();
        }
    }

    private static String ctlAsBinary(long value) {
        String binaryCtl = longToBinary(value);
        String binaryRc = binaryCtl.substring(0, 16);
        String binaryTc = binaryCtl.substring(16, 32);
        String binarySs = binaryCtl.substring(32, 48);
        String binaryId = binaryCtl.substring(48, 64);

        return "CTL=(" + value + "), " +
                "RC=(" + prettifyBinary(binaryRc) + ", " + binaryToInt(binaryRc) + "), " +
                "TC=(" + prettifyBinary(binaryTc) + ", " + binaryToInt(binaryTc) + "), " +
                "SS=(" + prettifyBinary(binarySs) + ", " + binaryToInt(binarySs) + "), " +
                "ID=(" + prettifyBinary(binaryId) + ", " + binaryToInt(binaryId) + ")";
    }

    private static String longToBinary(long value) {
        // If the value is non-negative, convert it normally
        if (value >= 0) {
            return padLeftZeros(Long.toBinaryString(value), 64);
        }

        // For negative values, calculate the two's complement
        var positiveValue = -value;
        var invertedValue = ~positiveValue;
        var twosComplement = (invertedValue + 1);

        return padLeftZeros(Long.toBinaryString(twosComplement), 64);
    }

    private static String padLeftZeros(String inputString, int length) {
        if (inputString.length() >= length) {
            return inputString;
        }
        StringBuilder sb = new StringBuilder();
        while (sb.length() < length - inputString.length()) {
            sb.append('0');
        }
        sb.append(inputString);

        return sb.toString();
    }

    private static int binaryToInt(String binary) {
        var isNegative = binary.charAt(0) == '1';
        if (!isNegative) {
            return Integer.parseInt(binary, 2);
        }

        StringBuilder bitsInvertedBinary = new StringBuilder();
        for(int i=0; i < binary.length(); i++) {
            bitsInvertedBinary.append(binary.charAt(i) == '0' ? '1' : '0');
        }

        return -(Integer.parseInt(bitsInvertedBinary.toString(), 2) + 1);
    }

    private static String prettifyBinary(String binary) {
        StringBuilder result = new StringBuilder();
        for (int i = 0; i < binary.length(); i += 8) {
            result.append(binary, i, Math.min(i + 8, binary.length())).append(" ");
        }
        return result.toString();
    }

    protected static class Options {
        private boolean forceCtl = false;

        Options(String[] args) {
            for (String arg : args) {
                if (arg.equals("-c")) {
                    forceCtl = true;
                }
            }
        }
    }
}

---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
Restart the application


Comments
As there are corresponding open backport issues, this one can be closed as duplicate. But so far for 11 and 17 - not fixed.
12-06-2025

[~jjose] For my understanding—can this be closed as fixed?
12-06-2025

Done. Sorry for the delay.
19-03-2025

Thanks. Even though not necessary there, the upcoming https://github.com/DougLea/jdk/tree/JDK-8319447 also includes this.
14-03-2025

Thanks Doug. Could you help with the review https://github.com/openjdk/jdk/pull/24034? The idea is to have the bug fix also in update releases (11, 17, 21 for now, 24 may be considered as well).
14-03-2025

I filed https://bugs.openjdk.org/browse/JDK-8351933 to fix only 'c - TC_UNIT' masking with backports to all affected LTS releases. It would be nice to have the fix in upcoming update versions.
13-03-2025

Dmitry: Yes, that's the one causing problems. There are a couple of others that would be clearer and safer wrt future changes, but not strictly necessary.
03-03-2025

In case of 17u it can be like ``` - compareAndSetCtl(c, ((UC_MASK & (c - TC_UNIT)) | + compareAndSetCtl(c, ((c & RC_MASK) | ((c - TC_UNIT) & TC_MASK) | ``` (which seems to work)
03-03-2025

Yes, sorry; there were a few constructions in which, initially, total counts could not over/underflow, but could in later updates up through jdk17. Best to uniformly separately mask rc and tc. I'll put out a patch soon.
03-03-2025

Another suspicious part is not masking '(c - TC_UNIT)' separately. It is present both in mainline: 'cas with (w.stackPred & LMASK) | (UMASK & (c - TC_UNIT))' https://github.com/openjdk/jdk/blob/e43960a0170bf29b28ff4733e1c8c927947fb0bb/src/java.base/share/classes/java/util/concurrent/ForkJoinPool.java#L2080 and update releases: https://github.com/openjdk/jdk17u-dev/blob/fd353e38a820eed00b1a5f28e892a4d6baa3f1d1/src/java.base/share/classes/java/util/concurrent/ForkJoinPool.java#L1728 but on different code paths. Compare with 'cas with ((RC_MASK & (c - RC_UNIT)) | (TC_MASK & (c - TC_UNIT)) | (SP_MASK & c))' https://github.com/openjdk/jdk17u-dev/blob/fd353e38a820eed00b1a5f28e892a4d6baa3f1d1/src/java.base/share/classes/java/util/concurrent/ForkJoinPool.java#L1560
26-02-2025

> I think the issue was indirectly fixed in this ticket > https://bugs.openjdk.org/browse/JDK-8277090 > because the ctl RC field definition changed from: > RC: Number of released (unqueued) workers minus target parallelism > to > RC: Number of released (unqueued) workers > Since RC is not the result of a subtraction anymore, it shouldn't become negative. Yes, this seems to fix the overflow ``` + @jdk.internal.vm.annotation.Contended("fjpctl") // colocate + int parallelism; // target number of workers ``` https://github.com/openjdk/jdk/commit/00e6c63cd12e3f92d0c1d007aab4f74915616ffb#diff-e398beb49cd8d3e6c2f3a8ca8eee97172c57d7f88f3ccd8a3c704632cab32f5fR1520 and related changes. It looks possible to split the 'ctl' field in 11u and 17u as well. That should not require any spec changes (in general, FJP from 17 to 19 is mostly extension except the clarification that values of ThreadLocals are not prerved across tasks), and dynamic 'setParallelism' behavior must not be introduced. 'Implementation Overview' is to be corrected as it describes the fields structure etc. The rest of the change would be a mechanical adjustment in a [few] dozen places. The 2 fields are co-located so the footprint increase is not significant. Field updates for the two fields will be similar to the current mainline, but the performance impact should of course be measured, although some minor slowdown may be a reasonable tradeoff for fixing the overflow.
26-02-2025

This bug can be reproduced with JDK 11.0.21 as well and it took almost 2 hours to observe the change.
11-04-2024

The test case was reproducible on JDK 17.0.10 on Windows 11. But it takes around 1 hour to start observing the change in overflow.
10-04-2024