JDK-8157478 : add option to change value returned by java.lang.Runtime.availableProcessors()
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 9
  • Priority: P4
  • Status: Closed
  • Resolution: Duplicate
  • Submitted: 2016-05-20
  • Updated: 2017-11-30
  • Resolved: 2017-11-30
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11
11Resolved
Related Reports
Duplicate :  
Description
It would be nice to have a command-line option to set the value returned by java.lang.Runtime.availableProcessors() and its Hotspot internal equivalent. Various things in the system, such as the number of threads in the fork-join common pool, and I believe also the number of GC threads and such, are derived from the number of "processors." For example, a SPARC T5-2 machine might have 32 cores with 8 threads each. The OS returns 256 from sysconf(_SC_NPROCESSORS_CONF) and this value is in turn reflected in Runtime.availableProcessors(). The resulting fork-join common pool size of 255 is based on this number.

This isn't wrong, but it's potentially misleading. It's unlikely that there will be a parallel speedup of anywhere near 256. For uniform workloads a baseline speedup of around 32x is more likely, since there are 32 "real" processors. For complex or non-uniform workloads, a speedup of greater than 32 is possible, because of interleaved usage of idle functional units on the processors. (This is the advantage of chip multithreading.)

Another way availableProcessors() can be misleading -- or be misused -- is by performance tests to scale their workload. Suppose a test generates a workload that's intended to run for one minute on a single CPU. It might multiply the workload by availableProcessors() in order to run for one wall clock minute on a multi-core machine. But if the workload is multiplied by 256, and there is only 32x parallelism, the benchmark will run for 8 minutes of wall clock time. This is clearly the wrong result.

The prevailing assumption of policies that use availableProcessors() is that they can use the entire resources of the system. This is a flawed assumption. Consider a case of running test jobs on this 32-core system. It might be configured to run (say) 12 jobs in parallel, in separate JVMs. But if each test scales itself so that it takes 8x wall clock time (as described above), the whole job will end up taking 96 times as long as expected in wall clock time.

I don't know what the right answer is. What would be helpful, though, is to allow a diagnostic option of some sort to alter the value returned by availableProcessors() and its equivalent internal interface. When misbehaviors occur on large multicore/multithreaded systems, then this option could be employed to try to learn more about the phenomenon.

This is a request for more of a diagnostic option than for some kind of API to expose number of "real" CPUs vs "virtual" CPUs, and such. That seems to be covered by JDK-5048379. 

For information about multi-core vs multi-threaded architectures, see this Oracle white paper on the SPARC T5:

http://www.oracle.com/technetwork/server-storage/sun-sparc-enterprise/documentation/o13-024-sparc-t5-architecture-1920540.pdf

Comments
A new VM flag (-XX:ActiveProcessorCount=xx) that allows the number of CPUs to be overridden has been added under (https://bugs.openjdk.java.net/browse/JDK-8146115). If this covers the functionality requested in this RFE, please close it as a duplicate.
30-11-2017

Ok. :) BTW if this is problematic due to argument parsing, we could still have the VM lie to the Java level based on the -XX:PretendAvailableProcessors=N flag, by hooking into the JVM_ActiveProcessorCount method.
01-06-2016

Re "distortion" I assure you that this was purely unintentional. :-) I really should have written something like, "... historically, people thought that availableProcessors() meant 'all the processors on the machine'. "
01-06-2016

Yes taskset on linux was only honoured fairly recently when Docker made use of cpusets more common. Re: "Bottom line is that historically, availableProcessors() has meant "all the processors on the machine." More recently this has started to change to "the processors available to this JVM" which is in line with the way this method is specified." I think this is a distortion. availableProcessors() has always meant "the processors available to the VM". For a long time this also equated to the number of processors on the machine. The VM implementation has drifted from this over time as different OS provided different means to limit the number of CPUs available to the JVM process. Solaris processor sets were one way to do this and we changed the VM on Solaris to account for them. Solaris resource pools introduced a different way and the problem was flagged but not fixed as it was a low priority. Fixing resource pools never became high priority because Solaris zones became the preferred way to partition resources and the Solaris folk made it so asking "how many cpu's are configured" returned the number allocated to the zone not the physical machine - so everything "just worked". Windows has never been an issue as it has always used GetProcessAffinityMask as that was the only API available. Linux, granted, was a bit of a mess and languished until Docker started making uses of processor-sets easier to do, and by this time pthreads was implementing sched_getaffinity to "do the right thing" as well. I don't see how j.l.Runtime.availableProcessors() and j.l.management.OperatingSystemMXBean.getAvailableProcessors() are redundant. The MXBean provides a management API for querying system information potentially from remote processes. The MXBean could call the j.l.R version but why bother when it can call the native layer directly? Please note all my commentary above is trying to provide you with information and guidance on how you might achieve your diagnostic goals today with the existing VM functionality and with a "small" modification at the Java level. A VM flag as suggested would need to be investigated (as I said it may be problematic due to the initialization sequence) but such an investigation, given we are past FC for JDK 9, will not occur in the short-term. And as I said much earlier, lying about the number of processors may not help in practice as you still have no control over how threads are allocated to virtual processors - all 8 threads may run on the same socket, for example.
30-05-2016

The taskset stuff is "interesting" because it wasn't until earlier this year that availableProcessors() started respecting it. This, combined with the Linux taskset command, makes it convenient to control the behavior of availableProcessors() externally. The taskset command doesn't appear to require any privileges. On Solaris, availableProcessors() respects processor sets. This is controlled by psrset(1m) and maybe pbind(1m). It looks like psrset requires privileges. There doesn't appear to be any comparable mechanism on MacOS or AIX. There's some processor affinity stuff on Windows that affects availableProcessors() but I don't know how to control it. Bottom line is that historically, availableProcessors() has meant "all the processors on the machine." More recently this has started to change to "the processors available to this JVM" which is in line with the way this method is specified. I think this is fine as far as it goes. But the story is incomplete; it remains the case that on some platforms, there's no way to affect the behavior of the JVM and JDK with respect to the number reported by availableProcessors(). Having a set of diagnostic options to control GC threads, compiler threads, etc. is a start. Having something at the Java level would also be helpful. There are only two relevant native methods in the JDK, as far as I can see: j.l.Runtime.availableProcessors() and j.l.management.OperatingSystemMXBean.getAvailableProcessors(). They both simply return the value from JVM_ActiveProcessorCount(). In that sense they seem redundant. They could both be modified consistently to return some other value, or one could be modified to call the other.
27-05-2016

First, re taskset. I'm not sure why it is "interesting to note" that taskset affects availableProcessors - as the whole point of availableProcessors is to report what is available. Given there are a number of ways, on different OS, that the available CPUs can be constrained, the VM doesn't always do this correctly but we do take steps to address when possible eg recent issues with running in a Docker environment. Also taskset operates in different modes - exec a process with a binding, change a process - and I've not seen any permission issues for regular uses when using the "exec a process with a binding" - so for your diagnostic purposes I would hope this is available to you. There are a number of subsystems in the VM which use ergonomics policies to "size" themselves during VM initialization and which take the number of available processors as an input. Those policies get refined over time - in particular on very large CPU-count systems we used to create an excessive number of GC threads (now we only create a very large number ;-) ). Those policies can generally be overridden by VM options in particular: product(uint, ParallelGCThreads, 0, \ "Number of parallel threads parallel gc will use") \ constraint(ParallelGCThreadsConstraintFunc,AfterErgo) \ product(uint, ConcGCThreads, 0, \ "Number of threads concurrent gc will use") \ constraint(ConcGCThreadsConstraintFunc,AfterErgo) \ product(intx, CICompilerCount, CI_COMPILER_COUNT, \ "Number of compiler threads to run") \ range(0, max_jint) \ constraint(CICompilerCountConstraintFunc, AtParse) \ For parallel GC there are also: product(bool, UseDynamicNumberOfGCThreads, false, \ "Dynamically choose the number of parallel threads " \ "parallel gc will use") \ \ diagnostic(bool, ForceDynamicNumberOfGCThreads, false, \ "Force dynamic selection of the number of " \ "parallel threads parallel gc will use to aid debugging") \ For GC the GC selected obviously has a bearing too. I had hoped (without checking) that j.l.R.availableProcessors could be easily reworked at the Java level without needing to mess with native code etc. It would be a quick simple means to adjust the Java library level uses. Supporting this in the VM may also have initialization issues - it requires argument parsing to happen before any uses.
24-05-2016

Maybe I should step back and rephrase this in the form of a problem statement. The entire JDK (JVM + class libraries + tests) has a bunch of policies that sense things about the runtime environment and scale things automatically. One of the inputs to these policies is the "number of available processors" whatever that means for the given environment. Sometimes these policies do the right thing, and sometimes they do something counterproductive. It's interesting to note that Linux tasksets (and also Solaris processor sets and Windows processor affinity) affect the availableProcessors value. Unfortunately I believe using tasksets and processor binding requires privileges, which aren't necessarily available in all environments. This would make it difficult to use these mechanisms to customize the system's runtime behavior. From previous comments it sounds like there are several things in the JVM that size themselves according to the availableProcessors value, such as GC and compiler threads. If these already have diagnostic options to override their values, then that might be sufficient. It would be helpful to see these enumerated. If auto-scaling by the JVM can already be controlled by diagnostic options, then a Java-level override might indeed be sufficient. The obvious thing is a system property, but that might not work if availableProcessors ends up being called too early in startup. j.l.Runtime.availableProcessors() is a native method that ends up calling JVM_ActiveProcessorCount(). j.l.management.OperatingSystemMXBean has getAvailableProcessors(), which is documented to be equivalent to j.l.Runtime.availableProcessors(). This boils down to a different native method that ends up calling JVM_ActiveProcessorCount(). Some things, like the size of the fork-join common pool, already have system properties that control them. There are a variety of other uses of j.l.R.availableProcessors in the libraries, though, that don't appear to have properties overrides. Most of these are in java.util.concurrent. The sun.nio.ch.ThreadPool auto-sizes itself using j.l.R.availableProcessors. Some of the samples in the JDK use j.l.R.availableProcessors(). It's probably not worthwhile worrying about these. A number of the tests in the JDK regression suite use j.l.R.availableProcessors(). These bear further inspection, but some of them (e.g., test/java/nio/channels/Selector/HelperSlowToDie.java, and several of the j.u.concurrent tests) use this value to scale their workload. This practice in the tests should be made overridable using a property that can be set by whoever invokes the test suite.
23-05-2016

Also note that having availableProcessors return a fake number would also be quite misleading in itself because it would only constrain the number of threads created but still allow them to run on all available CPUs. Though maybe that is what you want versus actually reducing CPUs via taskset. You can already control the number of threads used by the VM via various VM options for GC and Compiler threads. You could then hack Runtime.availableProcessors at the Java level to perform your diagnostic experiments. ie you have a way to do this experimentation today albeit slightly less convenient. I'm not saying we couldn't or shouldn't provide a diagnostic VM option to do this, just that: a) that isn't going to happen in the short term b) it may not be that useful on its own
21-05-2016

As a diagnostic you can also run the VM on a subset of CPUs using taskset on linux (or use processor sets on Solaris).
21-05-2016

I agree this would be a mere band-aid if this were proposed as a solution. And I agree; accurately tuning a JVM or system does require lots more knowledge, and more knobs to control. But I don't think we're looking for a solution here. We're seeing anomalous behavior on multi-core, multi-threaded systems, and we're trying to explain it. A hypothesis is that there are a bunch of things in the system that are self-tuning based on availableProcessors(). If there were a way to adjust this, we could gather more information about the anomalous behaviors. This would be a diagnostic option, not an API.
21-05-2016

I'm somewhat sympathetic but this would be band-aid. You really can't escape the problem that accurately tuning the JVM or a workload requires detailed knowledge of physical topology of the system. Further just because you make it return 8 on a 8-cpu-quad-core system that doesn't mean you will be assigned 8 cores on 8 different cpu's by the OS scheduler. So your benchmark may still take 8 minutes. So detailed knowledge of topology is of little use without some form of placement control.
20-05-2016