Bug ID: JDK-6313903 Thread.sleep(3) might wake up immediately on windows

JDK-6313903 : Thread.sleep(3) might wake up immediately on windows

Type: Bug
Component: hotspot
Sub-Component: runtime
Affected Version: hs13,6

Priority: P4
Status: Resolved
Resolution: Fixed
OS: windows_2000,windows_2003
CPU: x86

Submitted: 2005-08-22
Updated: 2020-08-27
Resolved: 2019-09-04

Versions (Unresolved/Resolved/Fixed)

The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.

JDK 14
14 b13Fixed

Related Reports

Blocks :	JDK-8230424 - Use platform independent code for Thread.interrupt support
Blocks :	JDK-8230423 - Move os::sleep to JavaThread::sleep
Relates :	JDK-5005837 - rework win32 timebeginperiod usage
Relates :	JDK-6824583 - regtest TimeUnit/Basic.java fails intermittently on Windows - again
Relates :	JDK-6498581 - ThreadInterruptTest3 produces wrong output on Windows
Relates :	JDK-8252200 - Thread::sleep(1) time is longer than 1ms in Java 14 on Windows 10
Relates :	JDK-6435126 - ForceTimeHighResolution switch doesn't operate as intended
Relates :	JDK-5068368 - (thread) Thread.sleep should say "at least as long" and implement this guarantee
Relates :	JDK-8229516 - Thread.isInterrupted() always returns false after thread termination

Description

This program:
---------------------------------------------------
public class uu {
    public static void main(String[] args) throws Throwable {
	long t0 = System.currentTimeMillis();
	java.util.concurrent.TimeUnit.MILLISECONDS.sleep(3);
	System.out.println(System.currentTimeMillis()-t0);
    }
}
---------------------------------------------------
should print a number no less than 3, but on windows-amd64
and more rarely, windows-i586, it often prints 0.
Probably a hotspot bug (but it could conceivably be core libraries)

This causes j2se regtest test/java/util/concurrent/TimeUnit/Basic.java to fail
Doug Lea asked,

What happens if you instead just call Thread.sleep(3)?

Good question.

-----------------------------------------------------
public class uu {
    public static void main(String[] args) throws Throwable {
	if (args.length == 1 && args[0].equals("TimeUnit")) {
	    long t0 = System.currentTimeMillis();
	    java.util.concurrent.TimeUnit.MILLISECONDS.sleep(3);
	    System.out.println(System.currentTimeMillis()-t0);
	} else if (args.length == 1 && args[0].equals("Thread")) {
	    long t0 = System.currentTimeMillis();
	    Thread.sleep(3);
	    System.out.println(System.currentTimeMillis()-t0);
	}
    }
}
-----------------------------------------------------
The above program, when invoked with either argument "Thread" or "TimeUnit"
prints "0" on windows-amd64 most of the time.
So this is definitely not a TimeUnit bug.

Comments

URL: https://hg.openjdk.java.net/jdk/jdk/rev/849acc346a1d User: dholmes Date: 2019-09-04 03:43:40 +0000
04-09-2019
David, thanks for continuing to fight the various OSes on this. It's part of java's value to users that we the openjdk engineers endure this sort of pain! I'll add a link here to Aleksey's Nanotrusting the Nanotime https://shipilev.net/blog/2014/nanotrusting-nanotime/
02-09-2019
The final fix for this is just to use the POSIX version and not try to account for the "extra" tick on Windows. The change also affects the setting of the timer tick when Object.wait is used. In theory with a 15ms tick a wait between 1 and 15ms may return anywhere between 0 and 15ms. In practice I observed the opposite on our test systems - the wait() actually consistently returns late: Testing wait Expected 1ms, actual 15.14ms Expected 1ms, actual 15.19ms Expected 1ms, actual 14.92ms Expected 1ms, actual 14.59ms Expected 1ms, actual 15.23ms Expected 2ms, actual 15.19ms Expected 2ms, actual 15.29ms Expected 2ms, actual 15.45ms Expected 2ms, actual 15.23ms Expected 2ms, actual 15.31ms Expected 5ms, actual 15.18ms Expected 5ms, actual 15.36ms Expected 5ms, actual 15.25ms Expected 5ms, actual 15.14ms Expected 5ms, actual 15.07ms Expected 10ms, actual 15.32ms Expected 10ms, actual 15.23ms Expected 10ms, actual 15.23ms Expected 10ms, actual 15.37ms Expected 10ms, actual 15.24ms Expected 11ms, actual 15.05ms Expected 11ms, actual 15.33ms Expected 11ms, actual 15.36ms Expected 11ms, actual 15.26ms Expected 11ms, actual 14.83ms Expected 17ms, actual 30.37ms Expected 17ms, actual 30.86ms Expected 17ms, actual 31.19ms Expected 17ms, actual 30.87ms Expected 17ms, actual 31.04ms Expected 25ms, actual 30.91ms Expected 25ms, actual 31.02ms Expected 25ms, actual 31.05ms Expected 25ms, actual 30.98ms Expected 25ms, actual 31.15ms Expected 50ms, actual 62.26ms Expected 50ms, actual 62.15ms Expected 50ms, actual 62.41ms Expected 50ms, actual 62.26ms Expected 50ms, actual 62.16ms then with the fix we wake up more on time: Testing wait Expected 1ms, actual 2.36ms Expected 1ms, actual 2.32ms Expected 1ms, actual 2.26ms Expected 1ms, actual 2.27ms Expected 1ms, actual 2.32ms Expected 2ms, actual 3.31ms Expected 2ms, actual 3.31ms Expected 2ms, actual 3.32ms Expected 2ms, actual 3.32ms Expected 2ms, actual 3.36ms Expected 5ms, actual 6.31ms Expected 5ms, actual 6.32ms Expected 5ms, actual 6.31ms Expected 5ms, actual 6.27ms Expected 5ms, actual 6.32ms Expected 10ms, actual 15.34ms Expected 10ms, actual 15.36ms Expected 10ms, actual 15.25ms Expected 10ms, actual 15.37ms Expected 10ms, actual 15.28ms Expected 11ms, actual 12.35ms Expected 11ms, actual 12.32ms Expected 11ms, actual 12.33ms Expected 11ms, actual 12.29ms Expected 11ms, actual 12.31ms Expected 17ms, actual 18.32ms Expected 17ms, actual 18.33ms Expected 17ms, actual 18.32ms Expected 17ms, actual 18.32ms Expected 17ms, actual 18.31ms Expected 25ms, actual 26.32ms Expected 25ms, actual 26.32ms Expected 25ms, actual 26.34ms Expected 25ms, actual 26.33ms Expected 25ms, actual 26.33ms Expected 50ms, actual 62.29ms Expected 50ms, actual 62.15ms Expected 50ms, actual 62.14ms Expected 50ms, actual 62.20ms Expected 50ms, actual 62.25ms Note: for the 10ms and 50ms values the timer tick is not adjusted, hence the wait time is still much longer.
02-09-2019
The results on macOS were also interesting in a bad way: park(0) = 10 Expected 10ms, actual 10.23ms park(0) = 10 Expected 10ms, actual 88.12ms park(0) = 10 Expected 10ms, actual 73.32ms park(0) = 10 Expected 10ms, actual 75.34ms park(0) = 10 Expected 10ms, actual 10.10ms park(0) = 11 Expected 11ms, actual 88.29ms park(0) = 11 Expected 11ms, actual 99.18ms park(0) = 11 Expected 11ms, actual 11.21ms park(0) = 11 Expected 11ms, actual 11.15ms park(0) = 11 Expected 11ms, actual 79.22ms I should point out though that we can't tell for certain if the issue is with the sleep or the calls to nanoTime used to measure the sleep.
31-08-2019
It seems we just can't win on Windows. I put my fix and the SleepAccuracy test through the build/test system to try other machines. Here's part of the result on Windows: park(0) = 11 Expected 10ms, actual 12.39ms park(0) = 11 Expected 10ms, actual 12.38ms park(0) = 11 Expected 10ms, actual 12.38ms park(0) = 11 Expected 10ms, actual 12.36ms park(0) = 11 Expected 10ms, actual 12.37ms park(0) = 12 Expected 11ms, actual 13.37ms park(0) = 12 Expected 11ms, actual 13.36ms park(0) = 12 Expected 11ms, actual 13.33ms park(0) = 12 Expected 11ms, actual 13.38ms park(0) = 12 Expected 11ms, actual 13.35ms With the extra 1 ms sleep time we are now consistently oversleeping by 2+ milliseconds! (And this is across a range of sleep times repeated a number of time). This suggests some Windows machines do not return early and so will not benefit from the extra 1 ms time adjustment.
31-08-2019
I restored the logic to the original and tried simply adding the extra 1ms for the initial timed-wait. Here's the result: D:\testcode>d:\ade\apps\Java\jdk-14\fastdebug\bin\java -XX:+UseNewCode SleepAccuracy 11 park(0) = 12 Expected 11ms, actual 11.59ms park(0) = 12 Expected 11ms, actual 12.13ms park(0) = 12 Expected 11ms, actual 11.21ms park(0) = 12 Expected 11ms, actual 11.33ms park(0) = 12 Expected 11ms, actual 11.96ms The "park(0)" lines are instrument of the actual park(millis) call showing how many we needed for each sleep and what the millis value was - so as you can see for a sleep of 11 millis we do one park for 12 millis and the elapsed time varies between 11+ ms and 12+ ms. Here's the output without the addition of the the 1ms: D:\testcode>d:\ade\apps\Java\jdk-14\fastdebug\bin\java SleepAccuracy 11 park(0) = 11 park(1) = 1 park(2) = 1 Expected 11ms, actual 14.35ms park(0) = 11 park(1) = 1 park(2) = 1 Expected 11ms, actual 14.59ms park(0) = 11 park(1) = 1 park(2) = 1 Expected 11ms, actual 15.28ms park(0) = 11 park(1) = 1 park(2) = 1 park(3) = 1 park(4) = 1 Expected 11ms, actual 17.97ms park(0) = 11 park(1) = 1 park(2) = 1 park(3) = 1 Expected 11ms, actual 16.72ms This result is quite erratic and I can't explain what is happening. The multiple parks can be explained if the calls to park(1) return very quickly and so you need a number of them to make up the final millisecond. But in that case the total elapsed time would be expected to be much closer to the 11ms requested value. The extremely long overall times can only really be explained if the last park(1) takes an unusually long time.
31-08-2019
The sleep code on Windows already adjusts the timer resolution during the sleep to be 1ms if the sleep time is not a multiple of 10ms. This is the best case. If we have a 1ms timer tick then setting a timeout of millis+1 for the initial blocking call seems to give the closest actual sleep times. D:\testcode>d:\ade\apps\Java\jdk-14\fastdebug\bin\java SleepAccuracy 11 Expected 11ms, actual 12.34ms Expected 11ms, actual 11.67ms Expected 11ms, actual 12.08ms Expected 11ms, actual 11.73ms Expected 11ms, actual 11.47ms The timer tick resolution issue is discussed in great detail in JDK-5005837. That is one sleeping dog I will let lie. I had hoped it was in the past but it seems not.
29-08-2019
Thanks for continuing to work on this. On Windows it looks like one should call timeGetDevCaps to find the minimum timer resolution and increment any sleep calls by that to get "at least" behavior. It should be possible for SleepAccuracy 11 to never report any duration less than 11 while "typically" reporting a duration less than 12.
29-08-2019
I've reworked the logic that tracks the elapsed time and it seems to be better: D:\testcode>d:\ade\apps\Java\jdk-14\fastdebug\bin\java SleepAccuracy 11 Expected 11ms, actual 12.89ms Expected 11ms, actual 12.20ms Expected 11ms, actual 12.46ms Expected 11ms, actual 12.24ms Expected 11ms, actual 12.80ms but we will always oversleep by 1-2 ms (with some outliers). The initial sleep of 11ms will return early forcing us to re-park for 1ms (the minimum). I wonder if I can add 1ms for the initial sleep just for windows ...
29-08-2019
With the fix in place we no longer have early return: D:\testcode>d:\ade\apps\Java\jdk-14\fastdebug\bin\java SleepAccuracy Expected 11ms, actual 11.14ms Expected 11ms, actual 13.89ms Expected 11ms, actual 12.94ms Expected 11ms, actual 11.03ms Expected 11ms, actual 12.35ms D:\testcode>d:\ade\apps\Java\jdk-14\fastdebug\bin\java SleepAccuracy Expected 11ms, actual 12.32ms Expected 11ms, actual 11.69ms Expected 11ms, actual 11.88ms Expected 11ms, actual 11.14ms Expected 11ms, actual 11.66ms D:\testcode>d:\ade\apps\Java\jdk-14\fastdebug\bin\java SleepAccuracy Expected 11ms, actual 15.52ms Expected 11ms, actual 15.13ms Expected 11ms, actual 12.50ms Expected 11ms, actual 12.03ms Expected 11ms, actual 13.67ms but as you can see the sleep time is far less accurate and much more variable. Compare this to the early return case: D:\testcode>d:\ade\apps\Java\jdk-13\bin\java SleepAccuracy Expected 11ms, actual 10.58ms Expected 11ms, actual 10.08ms Expected 11ms, actual 10.96ms Expected 11ms, actual 10.87ms Expected 11ms, actual 10.85ms This needs further investigation.
28-08-2019
Cleaning this up will also help with disentangling interrupts to aid with JDK-8229516. We can implement a single platform independent sleep operation, with early return prevention. Future RFEs will isolate the Windows _interrupt_event to the Windows OSThread implementation (used only for Process.waitFor after the current sleep change) and provide platform-independent interrupt support. Thread interruption is inherently about java.lang.Threads but interrupt support in the VM applies (potentially at least) to all threads. If we actually move that support to JavaThread we enable the move to storing interrupt state in the j.l.Thread instance. We can also split sleep operations so that we have the interruptible JavaThread::sleep operation, and other users of non-interruptible sleep can use os::naked_short_sleep.
28-08-2019
For reference here's the current documentation on timeouts in Windows "wait" functions: https://docs.microsoft.com/en-us/windows/win32/sync/wait-functions Wait Functions and Time-out Intervals The accuracy of the specified time-out interval depends on the resolution of the system clock. The system clock "ticks" at a constant rate. If the time-out interval is less than the resolution of the system clock, the wait may time out in less than the specified length of time. If the time-out interval is greater than one tick but less than two, the wait can be anywhere between one and two ticks, and so on. To increase the accuracy of the time-out interval for the wait functions, call the timeGetDevCaps function to determine the supported minimum timer resolution and the timeBeginPeriod function to set the timer resolution to its minimum. Use caution when calling timeBeginPeriod, as frequent calls can significantly affect the system clock, system power usage, and the scheduler. If you call timeBeginPeriod, call it one time early in the application and be sure to call the timeEndPeriod function at the very end of the application. --- So the source of early returns is obvious. Also the admonition regarding timebeginPeriod bears thinking about in relation to how we use it with !ForceTimeHighResolution. I wish we could get rid of all that code.
22-08-2019
Yes early return prevention can be simply implemented and is implemented on other platforms e.g see os_posix.cpp (for *NIX platforms we were dealing with the possibility of not using a monotonic clock).
12-08-2019
Implementing Thread.sleep using j.u.c.Semaphore.tryAcquire is still just a thought experiment. It must be possible to implement no-early-return for such a simple operation as JVM_Sleep in native code on windows, if only by retrying until elapsed time as given by JVM_NanoTime is correct. I still think this is an important bug to fix.
12-08-2019
AQS has its own code for avoiding early return. The actual park() call can return early. While it would allow us to strip out a chunk of VM code that exists just for Thread.sleep if it were reimplemented to use a synchronizer, I don't think you could use one due to initialization order problems.
12-08-2019
High level synchronizers hide any spurious wakeups from user code. E.g. we have this test, which works on Windows: /** * timed tryAcquire times out */ public void testTryAcquire_timeout() throws InterruptedException { final boolean fair = randomBoolean(); final Semaphore s = new Semaphore(0, fair); final long startTime = System.nanoTime(); assertFalse(s.tryAcquire(timeoutMillis(), MILLISECONDS)); assertTrue(millisElapsedSince(startTime) >= timeoutMillis()); } So again the thought experiment is having Thread.sleep not even call JVM_Sleep but instead tryAcquire on a Semaphore.
12-08-2019
> I continue to believe early return from any timed wait is a serious bug. For any timed-wait other sleep it would just be seen as a spurious wakeup so not so serious an issue. > Windows may be difficult, but we somehow manage to implement no-early-return for other timed synchronizers on Windows, and that ought to be much harder. The degree of difficulty is exactly the same. But we do not do anything to prevent early-return with the other relative timed-wait APIs on Windows and they can all return early (at least at the VM level)..
12-08-2019
I don't use Windows so I have not observed any actual behavior. I have a no-early-return assertion in a jtreg test that I would like to uncomment during my lifetime. TimeUnit/Basic.java I continue to believe early return from any timed wait is a serious bug. Windows may be difficult, but we somehow manage to implement no-early-return for other timed synchronizers on Windows, and that ought to be much harder. My thought experiment is: could we even implement Thread.sleep on windows by delegating to timed wait on a perpetually locked synchronizer?
12-08-2019
I've made 2006 comments public so the full history can be seen by everyone. [~martin] what exactly did you observe again: early return? general erratic length of sleep times? What specifically would you like to see fixed? (I'm guessing no early returns.)
12-08-2019
Turns out that ForceTimeHighResolution was not implemented correctly, hence the comments "despite the name it turns off high resolution timers" were correct. Once FTHR is working correctly more sensible results are seen with the timing tests. See JDK-6435126
12-08-2019
The following comes from 6440250 (nanoTime can be 25x slower than currentTimeMillis) and more succinctly summarises the different time sources in Windows for reading time values. This doesn't affect the trigerring of time-based interrupts. On Windows, System.currentTimeMillis() is implemented using GetSystemTimeAsFileTime, which essentially just reads the low resolution time-of-day value that windows maintains. Reading this global variable is naturally very quick - around 6 cycles according to reported information. In contrast System.nanoTime() is implemented using the QueryPerformanceCounter API (if available, else it returns currentTimeMillis*10^6). QueryPerformanceCounter is implemented in different ways depending on the hardware you are running on. The default implementation is determined by the HAL, but some systems allow you to explicitly control it using options in boot.ini, such as the /usepmtimer switch that the originator refers to. QueryPerformanceCounter will use either the programmable-interval-timer (PIT), or the ACPI power management timer (PMT), or the CPU-level timestamp-counter (TSC). Accessing the PIT/PMT requires execution of slow I/O port instructions and as a result the execution time for QPC is in the order of microseconds. In contrast reading the TSC is on the order of 100 clock cycles (to read the TSC from the chip and convert it to a time value based on the operating frequency). What the originator is observing (as they should expect given the use of /usepmtimer) is the use of the ACPI power management timer, which takes over a microsecond to read. You can tell if your system uses the ACPI PMT by checking if QueryPerformanceFrequency returns the signature value of 3,579,545 (ie 3.57MHz). If you see a value around 1.19Mhz then your system is using the old 8245 PIT chip. While the TSC is the fastest way to read a high resolution timestamp value, it has problems that make it an unreliable timing source on many systems: - its update rate is determined by the CPU frequency, which on many systems in not a constant (particularly laptops but also many desktop systems with advanced power management support - Windows XP uses ACPI to modify CPU frequency to adapt the power usage of the system to the current workload - hence the TSC should not be used on such systems) - on SMP systems the TSC on different CPU's can be different, hence corrections have to made to account for threads changing CPU's between two timestamp readings According to some sources Windows will synchronize the TSC across different processors, but those same sources also recommend ensuring your start and end timestamp occur on the same processor: http://www.intel.com/cd/ids/developer/asmo-na/eng/209859.htm?prn=y There are also techniques for dealing with frequency variation by using power management API's to disable frequency changing for the duration of the timed section of code, but that requires a paired API of the form: startTimedSection(); <code to measure>; endTimedSection(); which isn't applicable to a simple API like System.nanoTime() or QueryPerformanceCounter itself. Rather than try to solve the TSC problems itself the JVM simply relies on the implementation of QueryPerformanceCounter giving the "best" available result for that platform (Intel themselves recommend this over using RDTSC) - if Windows determines the TSC can be reliably used on your hardware then it will use it, else it will likely use the PMT. Note that QPC, even when not based on TSC can still have problems on some systems eg time leaping forward by several seconds. A good synopsis of the state of Windows timers is given at: http://www.gamedev.net/reference/programming/features/timing/ See also: http://www.mattwalsh.com/twiki/bin/view/Main/HighFrequencyCounterInC See also: http://support.microsoft.com/?id=896256 It seems synchronization of the TSC on MP or HT systems is a hotfix for XP SP2. The general recommendation by Microsoft is to use the /usepmtimer switch to base QueryPerformanceCounter on the ACPI power management timer rather than the TSC.
12-08-2019
I took the small test program for measuring the elapsed time across a small sleep and tried it on my windows laptop using different sleep times, use of nanoTime vs. currentTimeMillis, and with/without -XX:+ForceTimeHighResolution. The results are quite baffling. 1. 3ms sleep, currentTimeMillis, no force C:\testcode\sleep>java Sleep -sleep 3 -useMillis iteration[0]: 10 millis iteration[1]: 10 millis iteration[2]: 10 millis iteration[3]: 0 millis iteration[4]: 0 millis Okay, no surprise. The clock resolution of 10ms means we either see the time change by one tick, or not at all. Hence 10 or 0. 2. 3ms sleep, currentTimeMillis, force high C:\testcode\sleep>java -XX:+ForceTimeHighResolution Sleep -sleep 3 -useMillis iteration[0]: 10 millis iteration[1]: 10 millis iteration[2]: 10 millis iteration[3]: 10 millis iteration[4]: 10 millis Now Dave Dice previously reported that with force-high the 0 sleep times disappeared - true they have. I conjectured this is because the clock is now updated every 1ms and so we always see an update in the clock. However I expected to see values <10ms. But it turns out that regardless of the interrupt rate, windows maintains a constant tick-rate for the t-o-d - either 10ms or 15 ms. I consistently see sleeps <= 10ms will report 10ms, sleeps 11-20ms will report 20ms etc. But that doesn't make sense because it implies my sleeps are synchronized with the clock tick - and for that to be the case it would have to take around 7ms to go around the loop (which it doesn't). So the observed consistent "right answer" is very suspicious. 3. 3ms sleep, nanotime, no force C:\testcode\sleep>java Sleep -sleep 3 iteration[0]: 9 millis iteration[1]: 0 millis iteration[2]: 3 millis iteration[3]: 3 millis iteration[4]: 3 millis Here we see what Kumar reported: a long first sleep, a short second sleep, then settling down. However the above is not consistent eg: C:\testcode\sleep>java Sleep -sleep 3 iteration[0]: 0 millis iteration[1]: 0 millis iteration[2]: 3 millis iteration[3]: 3 millis iteration[4]: 3 millis And the settled values aren't always exactly 3 - every so often a 2 or 1 can appear as well. In short these results seem erratic. 4. sleep 3ms, nanotime, force high C:\testcode\sleep>java -XX:+ForceTimeHighResolution Sleep -sleep 3 iteration[0]: 0 millis iteration[1]: 9 millis iteration[2]: 10 millis iteration[3]: 10 millis iteration[4]: 9 millis This is the most bizarre case. The initial zero is an anomaly that isn't always there, but the first sleep is often short ie 0,1,2. As for the rest I'm totally baffled. I've been doing some research and trying to piece together various bits of information from Microsoft knowledgebase, MS forums etc. The picture that is starting to emerge is that QueryPerformanceCounter's implementation is dependent on the HAL being used. And that QPC doesn't necessarily use the TSC - in fact one of the fixes in XP SP2 was to stop using the TSC for QPC due to the problem of "speed-step" changing the processor frequency and hence messing up the use of QPC. (otherwise the general "rule" was that only ACPI HAL uses TSC, others use PMTimer; though some claim SMP always uses TSC - <sigh>). Now if QPC isn't based on the TSC then it has to be a counter related to one of the timer interrupts - again HAL specific, but could be from RTC, PIT, APIC or PMTimer (power management). If QPC is not based on the TSC then there is an implication that changing the timer interrupt, through timerBeginPeriod, might impact the frequency of updates applied to QPC - it shouldn't as the frequency is supposed to be stable - but an unstable frequency would help explain some of the bizarre results that are being seen. Just for good measure here is yet another problem with using QPC on 64-bit windows: http://support.microsoft.com/kb/895980/en-us "Programs that use the QueryPerformanceCounter() function may perform poorly in x64-based versions of Windows" Not clear why this is so, but the workaround is to boot windows using the /usepmtimer option in boot.ini.
12-08-2019
As Kumar notes in the evaluation you have to use a time measurement device that has a better resolution than the time interval you are trying to measure. So any results based on System.currentTimeMillis() with a timer resolution of 10ms should be ignored. The use of -XX:+ForceTimeHighResolution changes the timer period to be 1ms for the lifetime of the VM, rather than for the duration of the sleep, hence this would improve the resolution of currentTimeMillis as well thereby "fixing" the bug. If Kumar's tests using System.nanoTime were done without -XX:+ForceTimeHighResolution then the likely cause of the initial long sleep is the loading of the winmm.dll the first time the timer resolution is changed. That leaves the question as to why the second sleep appeared to be so short. That might be related to the general clock drift problem that we see when using the high-res timer period, or it might be something else, so this needs further investigation. All other CR's relating to sleep/timer problems on Windows have been closed as duplicates of 5005837, but I'll keep this one open to track the potential "not sleeping long enough" issue.
12-08-2019
Windows remains erratic though in a different way. I see consistent short-sleeps for 1ms: D:\testcode>java -version java version "1.8.0_221" Java(TM) SE Runtime Environment (build 1.8.0_221-b11) Java HotSpot(TM) 64-Bit Server VM (build 25.221-b11, mixed mode) D:\testcode>java SleepAccuracy Expected 1ms, actual 0.79ms Expected 1ms, actual 0.58ms Expected 1ms, actual 0.93ms Expected 1ms, actual 0.40ms Expected 1ms, actual 0.96ms D:\testcode>java SleepAccuracy Expected 1ms, actual 0.11ms Expected 1ms, actual 0.12ms Expected 1ms, actual 0.49ms Expected 1ms, actual 0.41ms Expected 1ms, actual 0.64ms This is on same Windows 7 (with updates of course) but different hardware. But just for good measure there are still glitches: D:\testcode>d:\ade\apps\Java\jdk-13\bin\java SleepAccuracy Expected 1ms, actual 2.78ms Expected 1ms, actual 0.22ms Expected 1ms, actual 0.84ms Expected 1ms, actual 0.27ms Expected 1ms, actual 0.56ms Changing to 10ms it seems this version of Windows on this hardware consistently returns early. D:\testcode>d:\ade\apps\Java\jdk-13\bin\java SleepAccuracy Expected 10ms, actual 9.79ms Expected 10ms, actual 9.07ms Expected 10ms, actual 9.79ms Expected 10ms, actual 9.73ms Expected 10ms, actual 9.62ms D:\testcode>java SleepAccuracy Expected 10ms, actual 9.26ms Expected 10ms, actual 9.58ms Expected 10ms, actual 9.14ms Expected 10ms, actual 9.91ms Expected 10ms, actual 9.03ms
12-08-2019
Can we rely on Thread.sleep yet?
12-08-2019
I rediscovered this bug after all these years. Has it been fixed? David said """Please re-open.""" ... so I will.
12-08-2019
This issue is not specific to Windows XP. This is a general problem on all Windows platforms - the Thread.sleep time is wildly unpredictable. D:\testcode>more SleepAccuracy.java public class SleepAccuracy { public static void main(String[] args) throws Throwable { long delay = 1; for (int i = 0; i < 5; i++) { long start = System.nanoTime(); Thread.sleep(delay); long end = System.nanoTime(); System.out.printf("Expected %dms, actual %.2fms%n", delay, (end-start)/(1000.0*1000)); } } } D:\testcode>d:\tools\apps\Java\64\jdk1.8.0\bin\java SleepAccuracy Expected 1ms, actual 4.33ms Expected 1ms, actual 14.61ms Expected 1ms, actual 13.98ms Expected 1ms, actual 0.74ms Expected 1ms, actual 14.87ms D:\testcode>d:\tools\apps\Java\64\jdk1.8.0\bin\java -XX:+ForceTimeHighResolution SleepAccuracy Expected 1ms, actual 12.51ms Expected 1ms, actual 14.22ms Expected 1ms, actual 14.62ms Expected 1ms, actual 14.89ms Expected 1ms, actual 13.61ms D:\testcode>d:\tools\apps\Java\64\jdk1.8.0\bin\java -XX:+ForceTimeHighResolution SleepAccuracy Expected 1ms, actual 8.92ms Expected 1ms, actual 14.81ms Expected 1ms, actual 14.59ms Expected 1ms, actual 11.79ms Expected 1ms, actual 13.40ms D:\testcode>ver Microsoft Windows [Version 6.1.7601] ie Windows 7. Please re-open.
27-01-2014
We don't support Windows XP anymore, anyone running into this on Windows 2003 probably already has applied the patch from Microsoft.
24-01-2014
EVALUATION Note as per 5068368 there is no explicit requirement that a sleep last for at least the amount of time requested, so an early return is valid according to the JLS third edition. That said, the very short sleeps that have been observed seem to fall outside the realm of expected behaviour and so require further investigation. It is unclear at this stage whether the problem is with the sleep, or the use of System.nanoTime to measure the elapsed time.
02-06-2006
EVALUATION First of all the test is not quite accurate, System.currentTimeMillis has a granularity of 10ms, if the code is changed to use System.nanoTime. The VM (1.6.0-b55) does sleep for the required time with no signs of chronic insomnia, as observed earlier. [K:/L/A/HS/6313903] cat ut.java public class ut { public static void dojuc() throws InterruptedException { //long t0 = System.currentTimeMillis(); long t0 = System.nanoTime(); java.util.concurrent.TimeUnit.MILLISECONDS.sleep(3); //System.out.println("timeunit:" + (System.currentTimeMillis()-t0)); System.out.println("timeunit:" + (System.nanoTime()-t0)/1000000); } public static void main(String[] args) throws Throwable { for (int i = 0 ; i < 10 ; i++) { dojuc(); } } } [K:/L/A/HS/6313903] java ut timeunit:15 timeunit:1 timeunit:3 timeunit:3 timeunit:3 timeunit:3 timeunit:3 timeunit:3 timeunit:3 timeunit:3 However the testing revealed that the first time it sleeps, it seems to sleep for a long time, and the second time its for a very short time and then it stabilizes to 3msec (perhaps thread scheduling/clock transients), this is also consistent with a native test case, as follows: [C:/XXXX] ./Insomnia on MIN res : 1 MAX res : 1000000 sleepy time = 3, elapsed time=12.08533 secs HighRes(on) sleepy time = 3, elapsed time=0.42910 secs HighRes(on) sleepy time = 3, elapsed time=3.85608 secs HighRes(on) sleepy time = 3, elapsed time=3.86530 secs HighRes(on) sleepy time = 3, elapsed time=3.86613 secs HighRes(on) sleepy time = 3, elapsed time=3.86697 secs HighRes(on) sleepy time = 3, elapsed time=3.86865 secs HighRes(on) sleepy time = 3, elapsed time=3.86530 secs HighRes(on) sleepy time = 3, elapsed time=3.86166 secs HighRes(on) sleepy time = 3, elapsed time=3.86837 secs HighRes(on) Hit CR to coninue...... Here is the native test case: // Insomnia.cpp : Defines the entry point for the console application. #pragma comment( lib, "winmm" ) #include <iostream> #include <tchar.h> #include <windows.h> class HighResolutionInterval { // The default timer resolution seems to be 10 milliseconds. // (Where is this written down?) // If someone wants to sleep for only a fraction of the default, // then we set the timer resolution down to 1 millisecond for // the duration of their interval. // We carefully set the resolution back, since otherwise we // seem to incur an overhead (3%?) that we don't need. private: static const int DEFAULT_RESOLUTION=1; public: HighResolutionInterval(long ms) { MMRESULT result = timeBeginPeriod(DEFAULT_RESOLUTION); } ~HighResolutionInterval() { MMRESULT result = timeEndPeriod(DEFAULT_RESOLUTION); } }; int os_sleep(long ms, bool highResMode) { HANDLE events[1]; events[0] = CreateEvent( NULL, // no security attributes FALSE, // auto-reset event object FALSE, // initial state is nonsignaled NULL); // unnamed object int result = -1; HighResolutionInterval phri=NULL; if(highResMode) phri = new HighResolutionInterval( ms ); if (WaitForMultipleObjects(1, events, FALSE, (DWORD)ms) == WAIT_TIMEOUT) { result = -3; } else { printf("Error in WaitForMultipleObjects\n"); result = -2; } delete phri; //if it is NULL, harmless return result; } double gethrTime(LARGE_INTEGER freq) { LARGE_INTEGER hrcount; if (QueryPerformanceCounter(&hrcount) != TRUE) { printf("Error: QueryPerformanceCounter\n"); exit(-12); } return ((double)(hrcount.QuadPart)/(double)(freq.QuadPart))1000; } int _tmain(int argc, _TCHAR* argv[]) { bool highResMode = (argc > 1 && _stricmp(argv[1],"on") == 0) ? TRUE : FALSE; long ms = ((argc > 2 && *argv[2] != '\0') ? atol(argv[2]) : 3L); TIMECAPS tc; memset(&tc,0,sizeof(tc)); if (timeGetDevCaps(&tc, sizeof(TIMECAPS)) != TIMERR_NOERROR) { printf("Error: timeGetDevCaps failed\n"); exit(-1); } printf("MIN res : %d\n", tc.wPeriodMin); printf("MAX res : %d\n", tc.wPeriodMax); LARGE_INTEGER freq; if (QueryPerformanceFrequency(&freq) != TRUE) { printf("Error: QueryPerformanceFrequency\n"); exit(-10); } if (freq.QuadPart == 1000) { printf("Error: QueryPerformanceFrequency resolution not supported\n"); exit(-11); } for (int i = 0 ; i < 10 ; i++) { double t0 = gethrTime(freq); os_sleep(ms, highResMode); double t1 = gethrTime(freq); printf("sleepy time = %d, elapsed time=%2.5f secs HighRes%s\n", ms, t1-t0, (highResMode) ? "(on)" : "(off)" ); } printf("Hit CR to continue......\n"); while (getchar() != '\n'); return 0; } based on the above test result,I am downgrading this bug to P5 and will be reporting this to Microsoft through the established channels, until then.....
10-10-2005