JDK-8322484 : 22-b26 Regression in J2dBench-bimg_misc-G1 (and more) on Windows-x64 and macOS-x64
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 22
  • Priority: P2
  • Status: Closed
  • Resolution: Fixed
  • OS: linux,os_x,windows
  • CPU: x86_64
  • Submitted: 2023-12-19
  • Updated: 2024-07-03
  • Resolved: 2024-01-29
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 22 JDK 23
22.0.2Fixed 23 b08Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Description
Integration of JDK-8318706 into 22-b26 has regressed J2dBench-bimg_misc-G1 on Windows-x64 and macOS-x64 by about ~ 1%. Linux-aarch64 also shows the regression, but at a smaller size.

Regression was isolated by measuring CI builds.

Additional benchmarks afftected:
1% J2dBench-bimg_imageio-G1 on Linux-x64
2% J2dBench-vimg_images_opq-G1 on Windows-x64
2% J2dBench-vimg_shapes_gradient-G1 on Windows-x64
2% J2dBench-vimg_shapes_solid-G1 on Windows-x64
9% J2dBench-vimg_copyarea-G1 on Windows-x64

Comments
A pull request was submitted for review. URL: https://git.openjdk.org/jdk22u/pull/36 Date: 2024-01-30 09:13:24 +0000
08-02-2024

jdk22u fix request: Reason: performance regression for any application that uses Get/ReleasePrimitiveArrayCritical. The reason why it only shows up in these tests on Windows is that j2dbench is the only application in our perf testing that does that. Change: Implements a per-thread cache that reduces the overhead of Get/ReleasePrimitiveArrayCritical methods. Risk estimate: low due to fairly high test coverage testing the affected code Test coverage: tier1-7 in jdk-jdk repo, no issues in jdk-jdk since initial push. (Also ran tier1-7 in jdk22 repo with no issues)
06-02-2024

jdk22 defer request Reason: Asking to defer this issue out of jdk22 after getting a rejection on the jdk22 integration request as the risk/reward ratio is too high.
30-01-2024

Fix request for JDK 22 GA rejected. Not simple changes which fix small performance regression only on Windows in some corner cases. I would suggest to give more time for testing in mainline and not limit to JDK 22 GA timeline. We have JDK 22u repo open already and you can push it there when ready.
29-01-2024

Fix request Reason: causes up to 9% performance regression after implementation of JEP 423 (https://bugs.openjdk.org/browse/JDK-8318706) for applications that perform lots of native access via Get/ReleasePrimitiveArrayCritical (read: at least a few 100k/s like for some graphics APIs). The issue affects all platforms, the reason why the regression only shows up in windows is that for other platforms the OS API calls do not do any Get/ReleasePrimitiveArrayCritical accesses. Change: Implements a per-thread cache that reduces the overhead of Get/ReleasePrimitiveArrayCritical methods. Risk estimate: low due to fairly high test coverage testing the affected code Test coverage: tier1-7 in jdk-jdk repo, same in jdk22 repo (tier5-7 still running at time of writing). Given that there is some time left in the rdp2 process, may want to wait for a few more days baking in mainline.
29-01-2024

A pull request was submitted for review. URL: https://git.openjdk.org/jdk22/pull/99 Date: 2024-01-29 09:26:18 +0000
29-01-2024

Changeset: 0d5f5e15 Author: Thomas Schatzl <tschatzl@openjdk.org> Date: 2024-01-29 08:36:51 +0000 URL: https://git.openjdk.org/jdk/commit/0d5f5e15d43f94a79c6133baecd5af217365d176
29-01-2024

The reason why only Windows is affected is that Atomic::add implementations for Windows ignore the memory order argument, always doing a full barrier. That matters in this case where the call has memory_order_relaxed.
25-01-2024

A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/17552 Date: 2024-01-24 12:38:09 +0000
25-01-2024

Particularly that copyarea benchmark does nothing but lock/do short graphics op/unlock hence the large regression (numbers above for that benchmark). A special build that eschews the actual atomic lock/unlock (because there are no garbage collections between complete iterations) gets performance back to previous level.
11-01-2024

The problem seems to be stemming from the increased length of the pin/unpin operations, most likely the majority caused by the additional atomic operations, two per locked objects. Within the 18s the benchmark runs, around 142M objects are locked and unlocked (in total 184M additional atomic operations). Investigating why only Windows is affected, and options for mitigating the impact.
11-01-2024