JDK-8185891 : System.currentTimeMillis() is slow on Linux, especially with the HPET time source
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 8,9,10
  • Priority: P4
  • Status: Closed
  • Resolution: Won't Fix
  • OS: linux
  • CPU: generic
  • Submitted: 2017-08-05
  • Updated: 2019-01-22
  • Resolved: 2019-01-22
Related Reports
Relates :  
See http://pzemtsov.github.io/2017/07/23/the-slow-currenttimemillis.html

System.currentTimeMillis() takes about 650 ns on Linux with the HPET time source, compared to 4 ns on Windows. 

The Linux implementation of currentTimeMillis calls gettimeofday. gettimeofday has higher precision than what's needed for currentTimeMillis. clock_gettime with a COARSE timer would be much faster, which would help a lot with the HPET time, but also with the TSC timer.

If System.currentTimeMillis is called in a time-sensitive manner, the slower behaviour on Linux can be damaging, especially if the HPET timer happens to be used. Using a faster implementation would be beneficial without a drawback, since the granularity of the result is 1ms.

---------- BEGIN SOURCE ----------
See http://pzemtsov.github.io/2017/07/23/the-slow-currenttimemillis.html for extensive tests and analysis.
---------- END SOURCE ----------

Runtime Triage: This is not on our current list of priorities. We will consider this feature if we receive additional customer requirements.

This isn't a bug, so I changed the issue to be an enhancement request. Also lowered priority to P4. The impact is Low IMHO. At 650ns we're talking about 0.065% overhead - hardly significant in the kind of code currentTimeMillis() is intended for - and it isn't intended for high-frequency timing loops. Congratulations on a very good write up. You uncovered all the necessary gory details. Windows is blindingly fast because you just read a global variable that is updated asynchronously. Linux could have done something similar but they don't and instead have a very complex clock/timer subsystem. The TSC is a faster time source than HPET or ACPI-pm-timer, but has been very problematic. While raw hardware now often supports frequency-invariant TSC, there can still be issues with synchronizing the TSC across cores/processors, and even worse virtualized environments often break TSC emulation (adding back all the old problems that had been fixed). This is not a game that the JVM wants to be involved in at all! We rely on the OS API's for these things and let it choose what is deemed best (HPET, TSC, ACPI-pm-timer etc). The use of gettimeofday for the millisecond time-of-day timer is historical: for a long time it was all there was, and even if clock_getime(CLOCK_REALTIME) was available it was something we had to dynamically check for at runtime, and there was no real motivation or incentive to switch away from using gettimeofday.. Even with JDK 10 we still have to account for older POSIX systems which may not have these APIs and/or don't support the required CLOCK_* variants. That said, if clock_gettime(CLOCK_REALTIME_COARSE) is significantly better than gettimeofday then we could look into using it. Though one would hope that if there is a significant difference in performance, and they both provide the same "time line" then Linux would ensure they function the same. There is another excellent article on Linux clocks etc at: http://btorpey.github.io/blog/2014/02/18/clock-sources-in-linux/ including some benchmarking code. Here's the results of that code (with addition of gettimeofday on my system: "Ubuntu 12.04 LTS" Linux xxxx 3.2.0-24-generic #39-Ubuntu SMP Mon May 21 16:52:17 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux (Sorry the formatting is messed up - can't do anything about it in this Jira instance) + ./clocks clock res (ns) secs nsecs gettimeofday 1,000 1,502,087,989 149,067,000 CLOCK_REALTIME 1 1,502,087,989 149,112,824 CLOCK_REALTIME_COARSE 4,000,250 1,502,087,989 145,767,803 CLOCK_MONOTONIC 1 18,960,460 713,125,658 CLOCK_MONOTONIC_RAW 1 18,959,370 211,892,996 CLOCK_MONOTONIC_COARSE 4,000,250 18,960,460 709,741,445 + taskset -c 1 ./ClockBench Method samples min max avg median stdev gettimeofday 255 0.00 1000.00 50.98 500.00 219.96 CLOCK_REALTIME 255 39.00 73.00 41.48 56.00 2.47 CLOCK_REALTIME_COARSE 255 0.00 0.00 0.00 0.00 0.00 CLOCK_MONOTONIC 255 42.00 76.00 42.74 59.00 2.14 CLOCK_MONOTONIC_RAW 255 145.00 178.00 147.39 161.50 2.32 CLOCK_MONOTONIC_COARSE 255 0.00 0.00 0.00 0.00 0.00 cpuid+rdtsc 255 96.00 104.00 103.00 100.00 2.65 rdtscp 255 32.00 40.00 34.01 36.00 3.47 rdtsc 255 24.00 24.00 24.00 24.00 0.00 Using CPU frequency = 1.000000 ClockBench.java: Method samples min max avg median stdev System.nanoTime 255 265.00 283.00 267.47 274.00 4.05 CLOCK_REALTIME 255 268.00 286.00 268.87 277.00 1.34 cpuid+rdtsc 255 120.00 176.00 123.26 148.00 5.10 rdtscp 255 56.00 112.00 58.16 84.00 4.81 rdtsc 255 40.00 88.00 41.19 64.00 3.96 Using CPU frequency = 1.000000 So something not right with the _COARSE variants on my machine! I didn't look too deep into the methodology of the benchmark.

This is very good report, contains good source of information (http://pzemtsov.github.io/2017/07/23/the-slow-currenttimemillis.html) to demonstrate the issue. Taken a small test case (attached one) and executed on windows and Linux under jdk 9 to get the desired results. Windows == Sum = 2634114345848286594; time = 1483; or 14.83 ns / iter Linux == Sum = 2634106325227250724; time = 15164; or 151.64 ns / iter Result shows Linux took 10 times more compared to Windows