JDK-8346175 : [21u] juc cannot be woken up when using virtual thread
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.util.concurrent
  • Affected Version: 21
  • Priority: P3
  • Status: Open
  • Resolution: Unresolved
  • OS: linux
  • CPU: x86
  • Submitted: 2024-12-13
  • Updated: 2025-03-27
Related Reports
Relates :  
Relates :  
Relates :  
Description
When using virtual thread, the juc unpark event may not be executed and the whole application will wait forever.

I have constructed a simple testcase to emulate the situation.
I put the two files in the test/jdk/java/lang/Thread/virtual/ and run jtreg test it. The fail rate is 13/30.

I find that there may be a problem in ForkJoinPool.java. 
The `signalWork` may choose do nothing when (c >>> RC_SHIFT) >= pc. (In the testcase, 29 threads are pinned and one is free for doing some work.)
And the one active thread which is in the first lines of the function `awaitWork` (before change the ctl) will not be signaled and choose to park.
Comments
This problem was fix by JDK-8288899 in mainline. It's a big change including API changes so I was suggested to just extract the core changes to fix this problem.
21-03-2025

[~jwtang] Does this problem exist in mainline? If so it should be fixed there first and then backported as needed. If it does not exist in mainline then what changes addressed it? Those should be backported if practical. Even if a specific fix is needed for 21u you need to follow the approval process: https://wiki.openjdk.org/display/JDKUpdates/JDK+21u
18-03-2025

Could anyone approve this issue? I have committed a fix to the jdk21-dev.
18-03-2025

A pull request was submitted for review. Branch: master URL: https://git.openjdk.org/jdk21u-dev/pull/1455 Date: 2025-03-06 10:34:06 +0000
06-03-2025

There have been a couple of issues with missing signal which resulting in FJP underutilization and/or the deadlocks like we see in this report where all but one worker is pinned by a native frame. There have been several significant updates since JDK 21 and the tests in main line to catch this issue when doing FJP updates are: test/jdk/java/util/concurrent/forkjoin/Starvation.java test/jdk/java/lang/Thread/virtual/Starvation.java
17-12-2024

Apologies I misread the test code.
17-12-2024

Hi, but I think it's not same as the problem related to om. This case doesn't use any `synchronized`. I think the problem is in the `signalWork` logic in FJP. I find the problem in our real application and extract it into this testcase. You can see that "ForkJoinPool-1-worker-30" is parking but the "main" vthread is not scheduled on it.
17-12-2024

I have write a new jcmd tool and print more details about the thread. I put the result in the jcmd.log file. It provides more details about the state of all the vthreads.
17-12-2024

This appears to be the well known limitation of pinning for object monitors leading to deadlock. As of JDK 24 use of object monitors no longer pins and so the problem is avoided.
17-12-2024

Hi, java version is 21.0.7-internal. The commit is at 990859cc32776e2d794de539190c9ccced1dfcd9. And I provided a jtreg output .jtr file , it contains the state of all the threads.
17-12-2024

Hello [~jwtang], > In the testcase, 29 threads are pinned and one is free for doing some work. If you can reproduce this, can you generate 2-3 thread dumps of when this "hang" happens? Please use "jcmd <pid> Thread.dump_to_file -format=json <somefile.json>" as the command to generate those thread dumps (the output of that command will tell you where the generated file is located). Please attach those files to this issue. Perhaps that will have some hints? Also, what's the output from "java -version" against the JDK which you use for testing this?
16-12-2024

Seconding [~alanb], neither I am able to reproduce the issue with JDK 24/25 builds (using x86 linux).
16-12-2024

I'm unable to duplicate this with JDK 24/25 builds. The most recent issue with missed signals was JDK-8345294.
16-12-2024