JDK-8339730 : Windows regression after removing ObjectMonitor Responsible
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • OS: windows
  • CPU: x86
  • Submitted: 2024-09-09
  • Updated: 2024-11-11
  • Resolved: 2024-10-24
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 24
24 b22Fixed
Related Reports
Duplicate :  
Relates :  
Description
We are seeing a quite substantial regression on Windows after removing the ObjectMonitor responsible thread, which needs to be investigated further.

DaCapo23-spring-large    -40.99%
DaCapo23-tomcat-large  -104.48%

Comments
[~swesonga] we use TimerBeginPeriod to enable a higher rate timer interrupt to get the higher resolution on regular timed-wait objects.
11-11-2024

[~fbredberg] [~coleenp] [~dholmes] I received feedback that it is not possible to globally disable high-resolution timers on Windows and so it is not yet clear what the key difference is between your configuration showing the regression and the one without. In addition to that, OpenJDK does not appear to be creating high-resolution timers as far as I can tell since I was informed that calling CreateWaitableTimerExW with the CREATE_WAITABLE_TIMER_HIGH_RESOLUTION flag is the way to do this.
08-11-2024

Changeset: 2c31c8ee Branch: master Author: Fredrik Bredberg <fbredberg@openjdk.org> Date: 2024-10-24 09:51:24 +0000 URL: https://git.openjdk.org/jdk/commit/2c31c8eeb42188ad6fd15eca50db4342cd791fb2
24-10-2024

We have now run several tests on different Windows machines in our performance lab. We see no regression if we apply the proposed PR in which we add calls to enable hi res timer resolution around the call to WaitForSingleObject when doing infinite parking.
24-10-2024

[~swesonga] [~dholmes] Have you seen whether Fredrik's patch with no options causes performance regressions when the windows system is configured with the high-res timer enabled by default?
17-10-2024

I am working on getting a setup that can replicate this behavior (the PR can be merged in the meantime since the issue is well understood. I can follow up with additional questions if I still cannot replicate the issue on a validated configuration).
15-10-2024

[~swesonga] as I wrote earlier: > If you run on machines where the high-res timers are always enabled (my laptop is like this for example) then you will not see any performance issue.
15-10-2024

[~swesonga] Thank you for looking into this. I would suggest running the tests again without "-XX:+ForceTimeHighResolution", on a machine which doesn't always have the high-res timer enabled.
10-10-2024

Hi [~fbredberg], I have not seen the large regression percentages mentioned in this issue. I used a 64vcpu Dldsv5 VM - https://learn.microsoft.com/en-us/azure/virtual-machines/sizes/general-purpose/dldsv5-series Here is a summary of the tomcat numbers I got from 10 invocations of "%JAVA_HOME%/bin/java -jar dacapo-23.11-chopin.jar --size large --iterations 3 tomcat": Source Commit 8340874: Open source some of the AWT Geometry/Button tests https://github.com/openjdk/jdk/commit/e19c7d80f722395583fbdb4cc10dc9051c8602f2 average of 230541 ms, and an average of 220953 ms with -XX:+ForceTimeHighResolution Source Commit 8320318: ObjectMonitor Responsible thread https://github.com/openjdk/jdk/commit/180affc5718c9bf2f009d6a7aa129cc36335384a average of 220657 ms, and an average of 220823 ms with -XX:+ForceTimeHighResolution From 10 invocations of "%JAVA_HOME%/bin/java -jar dacapo-23.11-chopin.jar --size large --iterations 5 spring" Source Commit 8340874: Open source some of the AWT Geometry/Button tests https://github.com/openjdk/jdk/commit/e19c7d80f722395583fbdb4cc10dc9051c8602f2 average of 40400 ms, and an average of 40023 ms with -XX:+ForceTimeHighResolution Source Commit 8320318: ObjectMonitor Responsible thread https://github.com/openjdk/jdk/commit/180affc5718c9bf2f009d6a7aa129cc36335384a average of 37132 ms, and an average of 36947 ms with -XX:+ForceTimeHighResolution Is there a specific environment setup step required to get the large regressions?
09-10-2024

A pull request was submitted for review. Branch: master URL: https://git.openjdk.org/jdk/pull/21357 Date: 2024-10-04 13:19:49 +0000
04-10-2024

In order to mitigate the regression seen on windows I added calls to enable hi res timer resolution around the call to WaitForSingleObject, when doing infinite parking. Got some good results, like these: DaCapo23-tomcat-large 87.11% DaCapo23-spring-large 38.25% Renaissance-Reactors 10.34% DaCapo-h2-large 10.00%
04-10-2024

[~swesonga] I checked with our performance team and got this answer: We are measuring the time by the "PASSED in 58861 msec" which was the same way it was reported in the older versions of Dacapo.
02-10-2024

[~fbredberg] sorry this isn't yet clear to me - what exactly are these regression percentages referring to in https://bugs.openjdk.org/browse/JDK-8320318? DaCapo23-spring-large 24.20% 6.88% -40.29% The output from DaCapo ends with something like this, so I'm wondering if these percentages are changes in tail latencies or the "PASSED in" times? ===== DaCapo 23.11-chopin spring PASSED in 58861 msec ===== ===== DaCapo simple tail latency: 50% 15798 usec, 90% 22208 usec, 99% 36093 usec, 99.9% 42770 usec, 99.99% 56573 usec, max 96122 usec, measured over 131072 events ===== ===== DaCapo metered tail latency: 50% 15798 usec, 90% 22208 usec, 99% 36093 usec, 99.9% 42770 usec, 99.99% 56573 usec, max 96122 usec, measured over 131072 events =====
01-10-2024

Just to be clear the same kind of regression can be seen on mainline by disabling high-resolution timers (counter-intuitivelu this can be achieved by setting -XX:+ForceTimeHighResolution). So this is not a direct regression per-se. If you run on machines where the high-res timers are always enabled (my laptop is like this for example) then you will not see any performance issue.
09-09-2024

An observation: After removing the concept of a the Responsible thread, there is no longer any need to do timed parking. So we started to zoom in on timed vs infinite parking on Windows, and by going back to timed parking (with a very long timeout and only on Windows) the regression went away. // park self NOT_WINDOWS(current->_ParkEvent->park();) WINDOWS_ONLY(current->_ParkEvent->park((jlong) 0x10000000);) This somehow seems to be related to the enabling of high resolution timers. Because if we go back to: // park self current->_ParkEvent->park(); And add calls to enable hi res timer resolution around the call to WaitForSingleObject when doing infinite parking, that also removes the regresion.
09-09-2024