JDK-8133666 : OperatingSystemMXBean reports abnormally high machine CPU consumption on Linux
  • Type: Bug
  • Component: core-svc
  • Sub-Component: java.lang.management
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • OS: linux_2.6
  • Submitted: 2015-08-14
  • Updated: 2016-01-14
  • Resolved: 2015-08-18
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 7 JDK 8 JDK 9
7u95Fixed 8u72Fixed 9 b81Fixed
Related Reports
Relates :  
Description
The code in UnixOperatingSystem.c used to calculate various CPU usage statics depends on out-of-date interpretation of various fields in /proc/stats that no longer hold true since Linux 2.6. This can result in issues like getSystemCpuLoad() reporting 100% CPU consumption for IO heavy loads.

Specifically, we should add irq and sirq to the system CPU time. And we should add IO wait, irq and sirq (all three new fields) to the total time. This is similar to what tools like top and sar do on Linux.
Comments
Marking this for release-note=no, JDK-8133527 will be release noted which is very similar
20-11-2015

The current code assumes the older (pre-2.6 linux) output where the cpu line of proc/stat looked like this: === user nice system idle === But since the 2.6 kernels, this line has been expanded to add 3 new fields, like this: === user nice system idle iowait irq softirq === To be specific, a subset of the time that was reported as "idle" in 2.4, may now be only reported as "iowait" Similarly, some of the time that used to be reported as "system" may now only be reported as irq or softirq. So now the "true" total time is the sum of all 7 of these numbers, not just the first 4. Also, the true system/kernel time should include both "irq" and "softirq" in addition to "system". I confirmed that this is the correct interpretation of these numbers by examining the source code for various tools like vmstat, the kernel (kernel/sched.c) and various documentation. Because our code measures CPU load as a ratio of total CPU time, not including the new numbers in the total figure can REALLY throw off our calculations. The easiest way to see this is to run something very IO intensive (even copying a file can be enough) and the system CPU usage reported by getSystemCpuLoad() will spike. All other tool like sar or top do not report time spent waiting for IO as CPU consumption, and neither should we. To test this fix, I just added a println into the main loop in /jdk9-dev/jdk/test/com/sun/management/OperatingSystemMXBean/GetSystemCpuLoad.java and ran it with high IO activity. One last note, since the 2.6 kernel release, there have been even more fields added to the cpu line of proc/stat, but after careful review I do not believe any of them should be included in our calculations. For example, the time reported for guest operating systems is already included in the "user" or "nice" fields. Also, it doesn't seem reasonable to try and include ticks "stolen" by a hypervisor or host OS.
17-08-2015