United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
Bug ID: JDK-6518490 Solaris TS scheduling class anti-starvation facility does not completely avoid starvation
JDK-6518490 : Solaris TS scheduling class anti-starvation facility does not completely avoid starvation

Details
Type:
Bug
Submit Date:
2007-01-28
Status:
Closed
Updated Date:
2012-10-08
Project Name:
JDK
Resolved Date:
2007-04-24
Component:
hotspot
OS:
solaris_10
Sub-Component:
runtime
CPU:
sparc,generic
Priority:
P3
Resolution:
Fixed
Affected Versions:
solaris_10,5.0u10
Fixed Versions:
hs10 (b12)

Related Reports
Backport:
Backport:
Backport:
Backport:
Relates:
Relates:
Relates:
Relates:

Sub Tasks

Description
(Moved to comments section)
The non-realtime JVM depends on the host operating system scheduler to *eventually* grant cycles to all runnable LWPs, regardless of the assigned Java priority.  (Refer to http://blogs.sun.com/dave/entry/java_thread_priorities_demystified to better understand how the JVM maps Java thread priorities to underlying LWP priorities).   Unfortunately in some exotic and rarely seen circumstances we've recently discovered that ready threads at low(er) priority in the Solaris IA and TS scheduling can starve indefinitely when competing against higher priority threads that park and unpark frequently.  Specifically, the anti-starvation boost -- which solaris applies to threads languishing on the ready list -- is insufficient to overcome differences in the computed effective priority of threads at varying assigned priorities.  (Refer to the Solaris man pages for ts_dptbl or "Solaris Internals", 2E, page 206, or http://cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/disp/ts.c).  
The starvation effect is readily reproducible with a simple "C" test case as well as the simple Java test case attached to the bug report.  That is, there's nothing Java-specific about the underlying problem.  Ultimately, I'd like to see the issue addressed by Solaris in the kernel but in the interim I'll try to modify the JVM to reduce the odds of encountering the problem. 

To help avoid this problem I've changed the default for UseThreadPriorities to FALSE in the 1.7 source tree for Solaris.  This change -- which disables the mapping of Java-level thread priorities to Solaris thread priorities -- should probably be backported into the 6.x update stream.  Users on earlier releases can use -XX:-UseThreadProrities to achieve the same effect.  Starvation will not occur with the assigned priorities of all threads competing for CPU cycles are equal.  Beware that you can still encounter the starvation problem if you make JNI calls to native code that changes LWP priorities, or if you assign a non-default priority to an LWP and then attach that thread to the JVM. 

I believe that both Windows and Linux are immune to similar starvation pathologies.   The windows scheduler seems to provides anti-starvation effective priority boosting with sufficient authority to overcome priorities assigned via the SetThreadPriority() API,  and the design of the "new" linux O(1) scheduler renders it immune to indefinite starvation.  That is, regardless of the assigned priority, threads will eventually be granted CPU cycles by those schedulers.  

As an aside, when the problem manifests the process may not be responsive to CTRL-C and may not be pstack-able.

We should be extremely careful about attributing observed hangs to this bug.  There are a number of pending hotspot issues that can manifest in a similar fashion, including 6519515 and 6546278.  More broadly, any unbounded spinning in the JVM or Java application code could easily trigger the starvation condition.  

I also believe that some earlier bugs such as 6463925 are really instances of this bug (6518490).  

-Dave

                                    

Comments
WORK AROUND

Run java with -XX:-UseThreadPriorities, or adjust the mapping esp. of the
lower java priorities  to OS Priorities upwards  (away from OS Priority 0):
-XX:JavaPriority1_To_OSPriority=20 -XX:JavaPriority2_To_OSPriority=25
etc.
                                     
2007-01-28
EVALUATION

- The fix for 6518490 changes the default for UseThreadPriority to
    "off" for Solaris.  The JVM assumes that native threads -- despite
    their assigned TS or IA priority -- will eventually be scheduled.
    This is usually true as Solaris will provide transient priority
    boosts to threads that appear to be languishing on run queues.
    Unfortunately the anti-languishing feature doesn't have sufficient
    authority to boost a low priority thread above the effective
    priority of threads that loop, yielding.   Given that, a group
    of threads blocking and yielding can indefinitely starve other
    threads running at lower priority.  For now, we'll disable
    thread priority mapping on Solaris.  This change will need mention
    in release notes.
        
    An alternative to disabling thread priorities would be to
    assume some responsibility for scheduling in the JVM itself.
    The VM thread, for instance, might periodically stop-the-world
    and release only a subset of the otherwise runnable threads.
    (As an aside, some experiments show this might be useful in benchmark
    situations.  The JVM restricts the number of runnable threads to
    approximately the number of CPUs, avoiding preemption and context
    switching.  But you can achieve nearly the same effect by using
    dispadmin or low TS/IA priorities to force long quanta).  Another
    option would be to allow threads to set "raw" LWP priority as they do today,
    but have the VM thread periodically check, and, if a thread appears
    to be starving, apply a transient priority boost.  Generally, however,
    scheduling is best managed by the kernel as it has the advantage
    of global information.  We don't want the JVM in the business of
    short-term scheduling.  This path ends in tears.
        
    Note that the Linux O(1) scheduler is immune to this type of starvation.
    The Windows scheduler is similar to the Solaris scheduler, but the
    transient effective priority boost for threads languishing on the ready queue          have sufficient authority to avoid the indefinite starvation phenomena we can          encounter on Solaris.
                                     
2007-04-24



Hardware and Software, Engineered to Work Together