Bug ID: JDK-4985566 Equal priority threads not timslicing properly in Linux

Type: Bug
Component: hotspot
Sub-Component: runtime
Affected Version: 1.4.2,5.0u2,6

Priority: P4
Status: Closed
Resolution: Won't Fix
OS: linux,solaris_2.5.1
CPU: x86

Submitted: 2004-01-29
Updated: 2004-01-30
Resolved: 2004-01-30

Name: jl125535			Date: 01/29/2004


FULL PRODUCT VERSION :
java version "1.4.2_02"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_02-b03)
Java HotSpot(TM) Client VM (build 1.4.2_02-b03, mixed mode)

FULL OS VERSION :
Redhat 9
Kernel updated via redhat's up2date service
Kernel 2.4.20-20.9 #1 Mon Aug 18 11:45:58 EDT 2003 i686 i686 i386 GNU/Linux


EXTRA RELEVANT SYSTEM CONFIGURATION :
Single processor Pentium 4 2.4ghz 512mb
Running java via: java -server -Xmx200m


A DESCRIPTION OF THE PROBLEM :
2 or more threads of the same priority are repeatedly entering a section of code synchronized on a common Object and performing an identical blocking operation such as a sleep or sending data to a common tcp outputstream.

The threads should all timeslice, getting a crack at the synchronized section on a fairly even basis. And they do on windows xp. But on linux, one of the threads tends be annointed to starve the others for long periods of time (a minute) despite the fact that it is executing the exact same code as the other threads.


STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Run the attache source code. 2 threads enter a synchronized block on a common Object, increment separate counters, sleep for 20ms forever. Main thread displays the 2 counters every second. The counters should increment pretty much evenly as the threads are timesliced. They do in fact on windows XP. But for some reason on linux (my system is redhat 9 kernel 2.4.20) one of the threads tends to get all of the processor for terribly long periods of time. It seems related to a blocking operation within the synchronized blocks. If the sleep is moved out of the sync sections, it's all good. Unfortunately, this is no good as I'm developing comms software that must send blocks of data to a single outputstream from multiple threads within a synchronized section. With timeslicing perturbed like this, I cannot acheive reasonable traffic sharing of an outputstream across multiple threads.


EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
------------
c1= 1
c2= 0
------------
c1= 25
c2= 24
------------
c1= 49
c2= 49
------------
c1= 73
c2= 73
------------
c1= 98
c2= 97
------------
c1= 122
c2= 122
------------
c1= 146
c2= 146
------------
c1= 171
c2= 170
------------
c1= 195
c2= 195
------------
c1= 219
c2= 219

ACTUAL -
------------
c1= 0
c2= 0
------------
c1= 34
c2= 0
------------
c1= 68
c2= 0
------------
c1= 101
c2= 0
------------
c1= 135
c2= 0
------------
c1= 169
c2= 0
------------
c1= 202
c2= 0
------------
c1= 236
c2= 0
------------
c1= 270
c2= 0


REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
public class TimeSlicing {
	Object gate = new Object();
	volatile long c1;
	volatile long c2;


	public static void main(String args[]) throws InterruptedException {
		new TimeSlicing().go();
	}

	void go() throws InterruptedException {
		new Thread() {
			public void run () {
				while (true) {
					synchronized (gate) {
						c1++;
						try {
							Thread.sleep(20);
						} catch (InterruptedException e) {
						}
					}
				}
			}
		}.start();

		new Thread() {
			public void run () {
				while (true) {
					synchronized (gate) {
						c2++;
						try {
							Thread.sleep(20);
						} catch (InterruptedException e) {
						}
					}
				}
			}
		}.start();


		while (true) {
			System.out.println("------------");
			System.out.println("c1= " + c1);
			System.out.println("c2= " + c2);
			Thread.sleep(1000);
		}
	}
}

---------- END SOURCE ----------
(Incident Review ID: 227908) 
======================================================================

The JVM does not entirely isolate java threads from the scheduling properites of the underlying operating system.

Under Linux:
1.  One thread acquires Gate and holds Gate for a long period.
2.  Concurrently with 1, the 2nd thread attempts to acquire Gate, but fails.  The 2nd thread enqueues itself on the
    the monitor associated with Gate.
3.  The 1st thread unlocks Gate.  In doing so it removes the 2nd thread from the monitor's queue and wakes it.  
    The 1st thread is running, the 2nd thread is reasdy, and Gate is unlocked. 
4.  The 1st thread immediately tries to reacquire Gate.  Since Gate is unlocked, it succeeds. 
5.  The 1st thread is eventually preempted by the OS or blocks.  More than likely the 1st thread is holding Gate.
6.  The 2nd thread resumes (ready->run transition), attempts to acquire Gate, fails, and (re)queues itself on
    the monitor associated with Gate.

The cycle above repeats, with the 1st thread dominating the lock.  
See www.usenix.org/events/jvm01/full_papers/dice/dice.pdf, section 3.3.
Another excellent reference is Josh Bloch's "Effective Java", Item 51, "Don't depend on the thread scheduler".

On win32 the thread scheduler may give a priority boost to a thread that wakes up after voluntarily blocking. 
In that case at step (3), above, waking the 2nd thread will cause the 1st thread to be immediately preempted
(because of the transient priority boost the 2nd thread may have a higher priority than the 1st thread).  
The 2nd thread will compete for the monitor, usually acquiring the lock.  This results in alternating between the 1st 
and 2nd thread.

On a Win32/MP sytsem I doubt you'd see fair alternation, however.  Lets assume that the 1st thread is running on CPU#1 
at the time it drops the lock.  Thread 1 will dequeue and wake thread 2.  Thread 2 may (depending on affinity and ready queue balance policy) wake immediately on CPU#2.  There's usually some latency involved with waking up (cache reload transient, etc), so the 1st and 2nd thread will race, attempting to acquire the lock.  The 1st thread, having a warm cache on CPU#1 will likely win.  

When multiple threads contend for a monitor, strict alternation of the threads is "fair" (when viewed from a short time-frame), but also results in excessive context switch costs.  It's usually the case that large commercial 
applications show higher throughput with an unfair "competive" succession policy as compared to fair alternation.

If you need truly fair locking with strict alternation then you might consider Doug Lea's
dl.util.concurrent packages.   These are freely downloadable.  In addition much of the code in the 
concurrent packages appears in the 1.5.0 JDK/JRE by virtue of JSR166.   (These constructs provide 
fair locks, regardless of whether the underlying java monitors are fair or not).  
Alternately, it's reasonably easy to do it yourself -- build synthetic fair locks from unfair java monitors.  
I'd recommend Doug's packages, however, as they're mature and well-tested.  

###@###.### 2004-01-30

============================================================================

EVALUATION See description ...

11-06-2004

Duplicate :	JDK-6316090 - 5.0 "synchronized" method causes starvation
Duplicate :	JDK-6541879 - Thread.sleep() within a synchronized block influences subsequent lock aquisition
Relates :	JDK-4799163 - Thread.yield() NOP implementation prerequsites
Relates :	JDK-4790855 - RH8.0: nsk/regression/b4510834 fails
Relates :	JDK-4910535 - -XX:-UseThreadPriorities does not totally disable priority setting
Relates :	JDK-4813310 - Map Thread priorities to system thread/process priorities
Relates :	JDK-4899307 - RegTest-test java/management/ThreadMBean/ThreadCounts.java failed
Relates :	JDK-6855216 - monitor acquisition unfairness and possible thread starvation
Relates :	JDK-4897898 - lwp_set_priority cleanup