JDK-8331999 : BasicDirectoryModel/LoaderThreadCount.java frequently fails on Windows in CI
  • Type: Bug
  • Component: client-libs
  • Sub-Component: javax.swing
  • Affected Version: 23
  • Priority: P2
  • Status: Closed
  • Resolution: Fixed
  • OS: windows
  • CPU: x86_64,aarch64
  • Submitted: 2024-05-09
  • Updated: 2024-11-12
  • Resolved: 2024-05-09
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11 JDK 17 JDK 21 JDK 23 JDK 8
11.0.25-oracleFixed 17.0.13-oracleFixed 21.0.5-oracleFixed 23 b23Fixed 8u431Fixed
Related Reports
Cloners :  
Relates :  
Relates :  
Description
After JDK-8331142 was integrated, the test/jdk/javax/swing/plaf/basic/BasicDirectoryModel/LoaderThreadCount.java test frequently fails on Windows in CI.

All the failures found on Windows look the same:

Number of snapshots: 20
Number of snapshots where number of loader threads:
  = 1: 19
  = 2: 0
  > 2: 1
Exception in Test Runner: class java.lang.RuntimeException: Detected 1 snapshots with several loading threads
java.lang.RuntimeException: Detected 1 snapshots with several loading threads
        at LoaderThreadCount.runTest(LoaderThreadCount.java:168)
        at LoaderThreadCount.wrapper(LoaderThreadCount.java:108)
        at java.base/java.lang.Thread.run(Thread.java:1575)


A kind of tolerance is needed. For example, fail the test if the number of snapshots with more than 2 loading threads is above 2 or 3. Without the fix, the number of such snapshots is much higher, usually at 20 out of 20, the lowest I've seen over the recent days is 16.

A fraction of the number of (valid) snapshots could be used, for instance SNAPSHOTS / 2 or loaderCount.size() / 2, both usually give 10 on Windows.
Comments
Fix request [17u] I backport this for parity with 17.0.13-oracle. No risk, only tests change. Backport with JDK-8331495, JDK-8331142 together As https://bugs.openjdk.org/browse/JDK-8333880 not fixed, so this test may still fail. SAP nightly testing passed.
06-11-2024

[17u] I'll backport problem listing the test: JDK-8337810. Then we have the test coverage on other platforms, and no failures on win.
06-11-2024

A pull request was submitted for review. Branch: master URL: https://git.openjdk.org/jdk17u-dev/pull/2605 Date: 2024-06-18 08:02:55 +0000
31-10-2024

Fix request [21u] I backport this for parity with 21.0.5-oracle. No risk, only a test change. Clean backport. Test pass. SAP nightly testing passed.
19-06-2024

A pull request was submitted for review. URL: https://git.openjdk.org/jdk21u-dev/pull/741 Date: 2024-06-18 08:16:17 +0000
18-06-2024

I created https://bugs.openjdk.org/browse/JDK-8333880. Thanks for your help!
10-06-2024

[~azeller] It'll take some time… The patch may also resolve the problem on Linux and macOS. Along the lines, I've been thinking about using a custom FileSystemView, it may simplify the test and make it more stable. Now the test relies on enumerating live threads, a custom FileSystemView could track the threads directly: on which thread its listFiles method is called. Could you create a new bug for the Windows failure that you see? Assign it to me directly, please.
10-06-2024

[~aivanov] That sound like a good idea. Do you think you could provide a patch that I can put in our CI?
10-06-2024

[~azeller] Thank you for the details. It's a high failure rate. The problem is that I cannot reproduce the failure myself. What I can think of is that I also need to verify the interrupted status of the thread. If the background thread is already interrupted but is still running, it's not a failure condition… exiting from an interrupted thread could take quite a while, especially on a system under heavy load. Unless something odd is happening, such that the fix for JDK-8325179 isn't working.
06-06-2024

I have seen it 30 times now since 29. May. I only see it in head - not in any older codeline.
06-06-2024

I can still see the failure on windows in our nightly CI tests. It occurs in about 60% of our test runs. A typical output would be: Number of snapshots: 20 Number of snapshots where number of loader threads: = 1: 4 = 2: 4 > 2: 12 Duration: 17,467 The highest failure value I have seen was 13, but I only looked at 10 of our failed runs of the last weeks. Our tests run headless and with concurrency equal to the number of CPUs.
05-06-2024

[~azeller] Is it in mainline? We have only 20 failures of this test in our CI. There have been no failures since 9 May when I resolved this bug. In Oracle CI, this test is also run headless, headless tests are run concurrently but I'm not sure about the number of concurrent tests. This test creates 6 threads. Seeing such a large number of threads is still weird. If the background thread reaches listing files, this operation is performed on the COM thread and the background thread is blocked, thus interrupting the background thread will throw InterruptedException and the thread will exit. It will take some time, yet it shouldn't take too long… perhaps, the heavy load affects this small part…
05-06-2024

Changeset: ffbdfffb Author: Alexey Ivanov <aivanov@openjdk.org> Date: 2024-05-09 18:01:27 +0000 URL: https://git.openjdk.org/jdk/commit/ffbdfffbc702253f32fa45dc1855b663c72074a6
09-05-2024

A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/19156 Date: 2024-05-09 13:01:57 +0000
09-05-2024