JDK-8273453 : Systematic investigation regarding latency for os::elapsed_counter() vs rdtsc()
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: jfr
  • Affected Version: 12
  • Priority: P4
  • Status: Open
  • Resolution: Unresolved
  • CPU: x86
  • Submitted: 2021-09-07
  • Updated: 2021-09-27
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Related Reports
Relates :  
Relates :  
It might be a good time to, again, systematically investigate the latency related to os::elapsed_counter().

Historically, the underlying counter sources exposed by the OS's (that os::elapsed_counter() maps) have been too slow for JFR purposes.
This is the background for still using raw rdtsc() in JFR, with an awareness of the problems it entails. We have not actively monitored the evolution in regards to the OS exposed counters, there are indications that access latencies have improved on certain platforms (OS / HW) in later years.

We should re-investigate the actual overhead of using os::elapsed_counter() in comparison to rdtsc().

The on-off capability already exist in the code, turning on / off can be handled using the following flag:

  experimental(bool, UseFastUnorderedTimeStamps, false,
          "Use platform unstable time where supported for timestamps only")

We need targeted tests (JFR JMH benchmarks) and access to modern hardware / OS combinations to accomplish this study.

In the best case, the latency is within acceptable bounds (tbd), and if that is the case, we could hopefully abandon rdtsc(). 
The related issue JDK-8185891 has a link to a writeup of someone else's investigation of performance of time access: https://pzemtsov.github.io/2017/07/23/the-slow-currenttimemillis.html

[copied here from JDK-8211240, from [~dholmes]] To add to the "fun": https://cpufun.substack.com/p/fun-with-timers-and-cpuid

[copied here from JDK-8211240, from [~kbarrett] ] The Rdtsc code relies on the frequency in the the brand string when it is available. Apparently that's wrong. "Intel® 64 and IA-32 Architectures, Software Developer’s Manual, Volume 3B: System Programming Guide, Part 2" says: > On certain processors, the TSC frequency may not be the same as the frequency in the brand string. And the fallback bogomips-style frequency calculation used when that brand string frequency isn't available can occasionally be catastrophically wrong.

[copied here from JDK-8211240, from [~kbarrett] ] https://docs.microsoft.com/en-us/windows/win32/sysinfo/acquiring-high-resolution-time-stamps Says that for Windows 8 and Windows Server 2012 the TSC is now used by QueryPerformanceCounter, as the required hardware infrastructure and OS support has improved to the point where that can be reliably done. Windows 7 and Windows Server 2008 will try to use TSC but will fall back to something slower if TSC is found to be unsuitable. Linux has been doing similar things for some time with clock_gettime; the selected clocksource can be found under /sys/devices/system/clocksource.

[copied here from JDK-8211240, from [~dholmes]] Yes this fits with my/our general guidance of using the OS timing facilities and assume the OS knows best about how to handle these issues. Note for Windows it states: "The TSC synchronization algorithm was significantly improved to better accommodate large systems with many processors." suggesting that only use of TSC via QPC is a safe and valid way to access the TSC, rather than attempting to use it in a raw form.