JDK-6518490 : Solaris TS scheduling class anti-starvation facility does not completely avoid starvation
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: solaris_10,5.0u10
  • Priority: P3
  • Status: Closed
  • Resolution: Fixed
  • OS: solaris_10
  • CPU: generic,sparc
  • Submitted: 2007-01-28
  • Updated: 2016-11-01
  • Resolved: 2007-04-24
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other Other JDK 6 JDK 7 Other
5.0u15-rev,hs10Fixed 5.0u17Fixed 6u4Fixed 7Fixed hs10Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Description
(Moved to comments section)
The non-realtime JVM depends on the host operating system scheduler to *eventually* grant cycles to all runnable LWPs, regardless of the assigned Java priority.  (Refer to http://blogs.sun.com/dave/entry/java_thread_priorities_demystified to better understand how the JVM maps Java thread priorities to underlying LWP priorities).   Unfortunately in some exotic and rarely seen circumstances we've recently discovered that ready threads at low(er) priority in the Solaris IA and TS scheduling can starve indefinitely when competing against higher priority threads that park and unpark frequently.  Specifically, the anti-starvation boost -- which solaris applies to threads languishing on the ready list -- is insufficient to overcome differences in the computed effective priority of threads at varying assigned priorities.  (Refer to the Solaris man pages for ts_dptbl or "Solaris Internals", 2E, page 206, or http://cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/disp/ts.c).  
The starvation effect is readily reproducible with a simple "C" test case as well as the simple Java test case attached to the bug report.  That is, there's nothing Java-specific about the underlying problem.  Ultimately, I'd like to see the issue addressed by Solaris in the kernel but in the interim I'll try to modify the JVM to reduce the odds of encountering the problem. 

To help avoid this problem I've changed the default for UseThreadPriorities to FALSE in the 1.7 source tree for Solaris.  This change -- which disables the mapping of Java-level thread priorities to Solaris thread priorities -- should probably be backported into the 6.x update stream.  Users on earlier releases can use -XX:-UseThreadProrities to achieve the same effect.  Starvation will not occur with the assigned priorities of all threads competing for CPU cycles are equal.  Beware that you can still encounter the starvation problem if you make JNI calls to native code that changes LWP priorities, or if you assign a non-default priority to an LWP and then attach that thread to the JVM. 

I believe that both Windows and Linux are immune to similar starvation pathologies.   The windows scheduler seems to provides anti-starvation effective priority boosting with sufficient authority to overcome priorities assigned via the SetThreadPriority() API,  and the design of the "new" linux O(1) scheduler renders it immune to indefinite starvation.  That is, regardless of the assigned priority, threads will eventually be granted CPU cycles by those schedulers.  

As an aside, when the problem manifests the process may not be responsive to CTRL-C and may not be pstack-able.

We should be extremely careful about attributing observed hangs to this bug.  There are a number of pending hotspot issues that can manifest in a similar fashion, including 6519515 and 6546278.  More broadly, any unbounded spinning in the JVM or Java application code could easily trigger the starvation condition.  

I also believe that some earlier bugs such as 6463925 are really instances of this bug (6518490).  

-Dave

Comments
See also: * https://blogs.oracle.com/dave/entry/java_thread_priorities_demystified * https://blogs.oracle.com/dave/entry/java_thread_priority_revisted_in
01-11-2016

EVALUATION - The fix for 6518490 changes the default for UseThreadPriority to "off" for Solaris. The JVM assumes that native threads -- despite their assigned TS or IA priority -- will eventually be scheduled. This is usually true as Solaris will provide transient priority boosts to threads that appear to be languishing on run queues. Unfortunately the anti-languishing feature doesn't have sufficient authority to boost a low priority thread above the effective priority of threads that loop, yielding. Given that, a group of threads blocking and yielding can indefinitely starve other threads running at lower priority. For now, we'll disable thread priority mapping on Solaris. This change will need mention in release notes. An alternative to disabling thread priorities would be to assume some responsibility for scheduling in the JVM itself. The VM thread, for instance, might periodically stop-the-world and release only a subset of the otherwise runnable threads. (As an aside, some experiments show this might be useful in benchmark situations. The JVM restricts the number of runnable threads to approximately the number of CPUs, avoiding preemption and context switching. But you can achieve nearly the same effect by using dispadmin or low TS/IA priorities to force long quanta). Another option would be to allow threads to set "raw" LWP priority as they do today, but have the VM thread periodically check, and, if a thread appears to be starving, apply a transient priority boost. Generally, however, scheduling is best managed by the kernel as it has the advantage of global information. We don't want the JVM in the business of short-term scheduling. This path ends in tears. Note that the Linux O(1) scheduler is immune to this type of starvation. The Windows scheduler is similar to the Solaris scheduler, but the transient effective priority boost for threads languishing on the ready queue have sufficient authority to avoid the indefinite starvation phenomena we can encounter on Solaris.
24-04-2007

WORK AROUND Run java with -XX:-UseThreadPriorities, or adjust the mapping esp. of the lower java priorities to OS Priorities upwards (away from OS Priority 0): -XX:JavaPriority1_To_OSPriority=20 -XX:JavaPriority2_To_OSPriority=25 etc.
28-01-2007