JDK-8355026 : ForkJoinPool threads stall for 10 minutes when running MultiJVM modes of SPECjbb2015
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.util.concurrent
  • Affected Version: 24,25
  • Priority: P4
  • Status: Closed
  • Resolution: Cannot Reproduce
  • OS: generic
  • CPU: generic
  • Submitted: 2025-04-18
  • Updated: 2025-05-27
  • Resolved: 2025-05-27
Related Reports
Relates :  
Relates :  
Relates :  
Description
Since JDK-8344773, there is an occasional stall when running some MultiJVM modes of SPECjbb2015. In the commit of JDK-8344773, the array of WorkQueue increased from 2^6 to 2^9 for shared queues. If we change this back as 0001-Original-size-of-queue-capacity.patch, the stall will disappear. But it does not help if we break out the loop of awaitWork for shared queue as 0001-Prevent-parking-for-shared-work-queues.patch. So it is unclear about the relation of the increase of capacity and the stall.
Comments
Possibly fixed by JDK-8319447.
27-05-2025

Hi all, there are two things that need to be clarified: 1. The length of stall is connected to the parameters of JBB named "specjbb.comm.connect.timeouts". So the work queues may not be the shared queues if I read the code correctly. 2. The stall disappeared since JDK-8319447, although I'm not sure whether JDK-8319447 addressed the issue discussed here. The command-line to reproduce the issue will be discussed offline or via email if not yet.
16-05-2025

Sorry for replying late. I'm verifying whether the problem manifests after JDK-8353659.
06-05-2025

https://bugs.openjdk.org/browse/JDK-8354347 "Increase the default padding size for aarch64 in JDK code" might affect the time to access cache lines that are falsely shared, but it should not affect correctness. The padding was to adjacent cache lines, putting a second field on a cache line that might be pre-fetched along with a first field. That would delay an exclusive access to the second field, but not prevent the exclusive access. The change for JDK-8354347 is to put an unused cache line between the contended fields. The change mirrors the x86 C++ code, and the padding for contended fields in Java code.
02-05-2025

The only impact of the reported experimental changes should be to vary the point at which the first resize occurs, which shouldn't impact correctness. So I suspect that this is due to Ampere-specific VM support. Could someone recheck this in light of other recent changes on this front, including https://bugs.openjdk.org/browse/JDK-8354347
02-05-2025

[~dl] Yeah, the initial sizes shouldn't affect correctness. I'm curious to know if the problem manifests post https://github.com/openjdk/jdk/commit/402103331bcdb1055f89c938fdd6b1df772993b6
02-05-2025

[~lliu] Would it possible to say something about the machine/test env, is this an Ampere system?
18-04-2025

Hi [~dl], could you please take a look? I can help to verify if your patches solve the problem.
18-04-2025