JDK-8080623 : CPU overhead in FJ due to spinning in awaitWork
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.util.concurrent
  • Affected Version: 8u40,9
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2015-05-18
  • Updated: 2015-09-29
  • Resolved: 2015-05-20
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 8 JDK 9
8u60 b18Fixed 9Fixed
Related Reports
Duplicate :  
Relates :  
Relates :  
Description
The patch JDK-8056248 improved FJ handling of threads and removed the risk over starting and activating too many threads. This came with a slight performance impact for streams and FJ work on smaller data sets, which was mitigated by adding extra spinning checking for work before the FJworker-threads go to sleep. The claiming of a task is much faster from a spinning thread than from a thread that first need to be unparked.

Unfortunately the spinning has shown to be fairly CPU intensive causing overall performance regressions in applications due to less CPU available for the other parts of the application not running as FJ tasks.

The recommendation which has been discussed with Doug Lea is to disable the spinning for JDK 8, and for JDK 9 continue the ongoing Work and research to solve the active waiting without spinning. Depending on timing of this work it may potentially be back ported to a later release of 8u.
Comments
Raw data and graphs showing the CPU usage and score when summing integers over different collections. The spinning helps for the smaller collections, but at the cost of CPU (the graph shows how the CPU usage changes for different stages during benchmarking, the first half is the parallelstream with each step increasing the number of entries). In the case of small data and quick calculation using a stream is most of the time a better way to go, as it can be 10-100x faster.
18-05-2015

Analysis of FJ Benchmarks
18-05-2015