JDK-8006942 : Use CLOCK_MONOTONIC_RAW for nanoTime if available on Linux
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 9
  • Priority: P4
  • Status: Closed
  • Resolution: Won't Fix
  • OS: linux
  • Submitted: 2013-01-25
  • Updated: 2016-12-19
  • Resolved: 2015-07-14
On 25/01/2013 5:42 PM, liang xie wrote:
> "CLOCK_MONOTONIC_RAW" is available from 2.6.28+, it's a better choice while
> ntp slew happened againt CLOCK_MONOTONIC_RAW. This's important for some
> applications, e.g.  zookeeper.  Please see zookeeper--1616 for details.
> diff -r 8389681cd7b1 src/os/linux/vm/os_linux.cpp
> --- a/src/os/linux/vm/os_linux.cpp    Tue Nov 15 16:44:09 2011 -0800
> +++ b/src/os/linux/vm/os_linux.cpp    Fri Jan 25 15:14:55 2013 +0800
> @@ -1444,7 +1444,11 @@
>   jlong os::javaTimeNanos() {
>     if (Linux::supports_monotonic_clock()) {
>       struct timespec tp;
> +    int status = Linux::clock_gettime(CLOCK_MONOTONIC_RAW,&tp);
> +  #else
>       int status = Linux::clock_gettime(CLOCK_MONOTONIC,&tp);
> +  #endif
>       assert(status == 0, "gettime error");
>       jlong result = jlong(tp.tv_sec) * (1000 * 1000 * 1000) +
> jlong(tp.tv_nsec);
>       return result;

Unfortunately today our primary linux build platform is still at 2.6.27 so this would have to wait until we officially update that.

But also we need to be able to run on earlier version so this would have to involve a dynamic runtime check not a simple compile-time check.
After giving this careful thought and consulting with others involved in the VM timing API's and implementation I am closing this as "will not fix" - though "Not an issue" could also have been chosen. Having been bitten by various time-jumping bugs in the past due to NTP (and other) adjustments, my initial reaction was that doing any kind of adjustment to CLOCK_MONOTONIC was "crazy" and obviously we should prefer the unadjusted CLOCK_MONOTONIC_RAW. However after looking into this more carefully I agree with Martin's comment that the NTP-slew that is applied is generally desirable to keep the reported elapsed times in-line with actual elapsed time. This slew simply adjusts the rate at which the reported time is updated to account for hardware that runs faster or slower than true time (which is pretty much all hardware). It does not introduce observable jumps in the time reported and should not be observable to the end user. The Zookeeper issue referred to above actually involved the use of currentTimeMillis for which NTP adjustment is abrupt and disruptive. It also isn't clear that the NTP slew applied to CLOCK_MONOTONIC would be impacted to the same extent given a misbehaving NTP server. Finally, as the single use of CLOCK_MONOTONIC_RAW would be for System.nanoTime, it would introduce a third active clock in the system, and make it impossible to use the same clock to measure elapsed time as is used to perform timed-waits etc (although I doubt the difference would be readily observable). So I find there is no compelling reason to switch to CLOCK_MONOTONIC_RAW.

I'm hardly the expert on any of this, but: - it's not obvious to me that switching to CLOCK_MONOTONIC_RAW is progress. In theory, CLOCK_MONOTONIC is exactly what we need for System.nanoTime - the most precise measurement available of elapsed time duration. - it's counterintuitive that CLOCK_MONOTONIC_RAW could be slower - it's supposed to deliver the raw hardware counter. Is our mental model wrong? I imagine clock_gettime with CLOCK_MONOTONIC does not in fact access the raw hardware counter every time it is called. - do we want ntp adjustments while measuring elapsed time? Not if ntp itself is making adjustments to "slew" the current system time to the authoritative time. But YES for ongoing adjustments that need to be made to account for differences between the hardware device and authoritative source. If the computer is dropped in liquid nitrogen and the hardware slows down, maybe NTP will notice and adjust the adjustments, which is what we want!?

On my system: CLOCK_MONOTONIC_RAW: 1000000 calls took 84204263 nanos (84 ns/call) CLOCK_MONOTONIC : 1000000 calls took 21278984 nanos (21 ns/call)

CLOCK_MONOTONIC_RAW is not yet available on our official build platform (OEL5.5) so we have to allow for that in the build. I also found that CLOCK_MONOTONIC_RAW is not supported for use with pthread_cond_t condition variables so we will have to continue to use CLOCK_MONOTONIC for that case. This mixing of the two clocks should not be an issue as we only use them for elapsed/relative time, and never need to compare their actual values (which on my test system differ by ~3 seconds!). In doing so we do assume that the two clocks advance at the same rate, so that, for example, we can use nanoTime (CLOCK_MONOTONIC_RAW) to see whether a relative timed-wait (Object.wait/Condition.await/LockSupport.park) blocks for the expected time. Of course it also means that we will now be able to detect an early return from such a timed-wait if CLOCK_MONOTONIC was adjusted forwards (it is never adjusted backwards). The glibc bug for lack of CLOCK_MONOTONIC_RAW support is: https://bugzilla.redhat.com/show_bug.cgi?id=879128 (filed Nov 2012) Unfortunately it has been closed as a duplicate of the non-public: https://bugzilla.redhat.com/show_bug.cgi?id=879129

Another unfortunate aspect here is that accessing CLOCK_MONOTONIC_RAW is at least 4x slower than CLOCK_MONOTONIC - the reason for this seems a surprising mystery. There is an excellent article on Linux clock sources ettc here: http://btorpey.github.io/blog/2014/02/18/clock-sources-in-linux/ Dave Dice had also reported similar findings a year earlier. So are we concerned that changing the clock used for nanoTime might slow it down by a factor of 4 or worse?

Investigating availability of CLOCK_MONOTONIC_RAW on our build and test systems.

Actually this should be doable now similar to how we handle other issues where the build platform may not have the desired support but the runtime platform does. We would have to #define CLOCK_MONOTONIC_RAW as per the linux headers and at runtime check to see if it is available.