JDK-8147832 : JEP 285: Spin-Wait Hints
  • Type: JEP
  • Component: core-libs
  • Sub-Component: java.lang
  • Priority: P4
  • Status: Closed
  • Resolution: Delivered
  • Fix Versions: 9
  • Submitted: 2016-01-20
  • Updated: 2023-01-10
  • Resolved: 2016-06-06
Related Reports
Blocks :  
Relates :  
Relates :  
Relates :  
Sub Tasks
JDK-8147844 :  
Description
Summary
-------

Define an API to allow Java code to hint that a spin loop is being executed.

Goals
-----

Define an API that would allow Java code to hint to the run-time system that it
is in a spin loop. The API will be a pure hint, and will carry no semantic
behaviour requirements (for example, a no-op is a valid implementation). Allow
the JVM to benefit from spin loop specific behaviours that may be useful on
certain hardware platforms. Provide both a no-op implementation and an intrinsic
implementation in the JDK, and demonstrate an execution benefit on at least one
major hardware platform.

Non-Goals
---------

It is not a goal to look at performance hints beyond spin loops. Other 
performance hints, such as prefetch hints, are outside the scope of this JEP.

Motivation
----------

Some hardware platforms benefit from software indication that a spin loop is in 
progress. Some common execution benefits may be observed:

  1. The reaction time of a spin loop may be improved when a spin hint is used 
     due to various factors, reducing thread-to-thread latencies in spinning wait 
     situations;

  2. The power consumed by the core or hardware thread involved in the spin loop 
     may be reduced, benefiting the overall power consumption of a program, and 
     possibly allowing other cores or hardware threads to execute at faster 
     speeds within the same power consumption envelope.

While long term spinning is often discouraged as a general user-mode programming
practice, short term spinning prior to blocking is a common practice (both
inside and outside of the JDK). Furthermore, as core-rich computing platforms
are commonly available, many performance and latency sensitive applications,
such as the [Disruptor](https://lmax-exchange.github.io/disruptor), use a pattern
that dedicates a spinning thread to a latency critical function and may involve
long term spinning as well.

As a practical example and use case, current x86 processors support a `PAUSE` 
instruction that can be used to indicate spinning behavior. Using a `PAUSE` 
instruction demonstrably reduces thread-to-thread round trips. Due to its 
benefits and widely recommended use, the x86 `PAUSE` instruction is commonly
used in kernel spinlocks, in POSIX libraries that perform heuristic spins prior 
to blocking, and even by the JVM itself. However, due to the inability to hint 
that a Java loop is spinning, its benefits are not available to regular Java 
code.

We include specific supporting evidence: In simple tests performed on a 
E5-2697 v2, measuring the round trip latency behavior between two threads that 
communicate by spinning on a volatile field, round-trip latencies were 
demonstrably reduced by 18-20 nsec across a wide percentile spectrum (from the 
10%'ile to the 99.9%'ile). This reduction can represent an improvement as high 
as 35%-50% in best-case thread-to-thread communication latency, for example
when two spinning threads execute on two hardware threads that share a physical
CPU core and an L1 data cache. The full listing of the test may be found
[here](https://github.com/giltene/GilExamples/blob/master/SpinWaitTest).

> <a href="https://bugs.openjdk.java.net/secure/attachment/56567/SpinLoopLatency_E5-2697v2_sharedCore%60-600x288.png"><img src="https://bugs.openjdk.java.net/secure/attachment/56567/SpinLoopLatency_E5-2697v2_sharedCore%60-600x288.png"/></a>

The above image shows an example latency measurement comparing the reaction
latency of a spin loop that includes an intrinsic `spinLoopHint()` call
(intrinsified as a `PAUSE` instruction) to the same loop executed without using
a `PAUSE` instruction, along with the measurements of the time it takes to
perform an actual `System.nanoTime()` call to measure time.

Description
-----------

We propose to add a method to the JDK which would hint that a spin loop is being
performed: `java.lang.Thread.onSpinWait()`.

An empty method would be a valid implementation of the
`java.lang.Thread.onSpinWait()` method, but an intrinsic implementation is the
obvious goal for hardware platforms that can benefit from it. We intend to
produce an intrinsic x86 implementation for the JDK as part of this JEP. A
prototype implementation already exists and results from initial testing show
promise. Refer to [JBS bug JDK-8147844](https://bugs.openjdk.java.net/browse/JDK-8147844)
for pointers to webrevs with the proposed changes in class libraries and JVM.

Alternatives
------------

JNI can be used to loop with a spin-loop-hinting CPU instruction, however the
JNI-boundary crossing overhead tends to be larger than the benefit provided by
the instruction, at least where latency is concerned.

We could attempt to have the JIT compilers deduce spin-loop situations and
automatically include spin-loop-hinting CPU instructions with no Java code hints
required. We suspect that the complexity of automatically and reliably detecting
spinning situations, coupled with questions about potential tradeoffs in using
the hints on some platforms, would significantly delay the availability of
viable implementations.


Testing
-------

Testing of a "vanilla" no-op implementation will obviously be fairly simple.

We believe that given the very small footprint of this API, testing of an
intrinsified x86 implementation will also be straightforward. We expect testing
to focus on confirming both the code generation correctness and latency benefits
of using the spin loop hint with an intrinsic implementation.

Should this API be accepted as a Java SE API (e.g. for inclusion in the `java.*`
namespace in a future Java SE 9 or Java SE 10), we expect to develop associated
TCK tests for the API for potential inclusion in the Java SE TCK.


Risks and Assumptions
---------------------

The "vanilla" no-op implementation is obviously fairly low risk. An intrinsic
x86 implementation will involve modifications to multiple JVM components and as
such they carry some risks, but no more than other simple intrinsics added to
the JDK.

Comments
[~ikrylov]/[~psandoz], from my point of view, existing tests are not enough, so there should be plans to develop new performance and functional tests. I've noticed that you started the conversation here and sent you the email. to have all related information in one place, I'm posting the email here as well. please let me know which communication channel would you prefer. On Mar 28, 2016, at 8:53 PM, Igor Ignatyev <igor.ignatyev@oracle.com> wrote: Hi Ivan, I’ve looked at JEP 285[1] and the changes[2-3] you proposed, and I have a few questions about planned tests, the JEP says > We expect testing to focus on confirming both the code generation correctness and latency benefits of using the spin loop hint with an intrinsic implementation. - how are you going to test latency benefits? are there any microbenchmarks which you plan to implemented/integrated - there is only one test in your changes — ‘hotspot/test/compiler/onSpinWait/TestOnSpinWait.java’[4] which checks that j.l.Thread::onSpinWait is intrinsified on x86. I doubt that it’s enough to test 'the code generation correctness’. do you plan to develop any other tests? e.g. the tests which check that onSpinWait works fine on all architectures? onSpinWait isn’t intrinsified on non-x86 architectures? [1] http://openjdk.java.net/jeps/285 [2] http://cr.openjdk.java.net/~ikrylov/8147844.hs.04/ [3] http://cr.openjdk.java.net/~ikrylov/8147844.jdk.03/ [4] http://cr.openjdk.java.net/~ikrylov/8147844.hs.04/test/compiler/onSpinWait/TestOnSpinWait.java.html Thanks, — Igor
28-03-2016

The JEP moves to the integrated state once code has been pushed and appears in the main JDK 9 repo. The JEP then moves to the completed state after any performance, functional and conformance tests have been delivered. In this case there are no current plans to deliver any performance or functional tests, and any conformance test (developed by the JCK team) should be trivial. We will also perform an internal audit of the code.
28-03-2016

Question: in case of this bug, what is the difference between "Integrated" and "Complete" ? I have only one regression test and it is a part of the original patch.
28-03-2016

JDK Webrev with proper JavaDoc tags: http://cr.openjdk.java.net/~ikrylov/8147844.jdk.04/
28-03-2016

I have adopted the proposed changesets according to the latest naming change - method being added to j.l.Thread. JDK Webrev: http://cr.openjdk.java.net/~ikrylov/8147844.jdk.03/ Hotspot Webrev: http://cr.openjdk.java.net/~ikrylov/8147844.hs.04/ These webrevs already passed code reviews except for the latest target class change. I am proposing to target the JEP to 9.
08-03-2016

Mark, thank you for your review, edits and for accepting the JEP.
04-03-2016

While the name and the placement of the method are still being discussed at core-libs mail list I consider this JEP a complete proposal. These details can be settled later i.e. at the time of actual JEP implementation and integration.
24-02-2016

Mark, thank you for the comments. I have adopted those, I did another round of review to fix some grammar errors and I also added a couple of hyperlinks. Hope this looks cleaner.
22-02-2016

This mostly looks good; just a few minor comments on form: - The JEP template does not have a "References" section. Please mention references in-line rather than at the end. - The Scope of this JEP should be "SE", not "JDK", since what you're proposing is intended to become part of the standard Java SE API. - Check uses of "it's" vs. "its".
19-02-2016

Updated the reference binary bundles to jdk9b105 for 3 platforms: linux, mac, windows.
15-02-2016

Both the JDK class libraries and the jvm changes passed code reviews with the relevant teams. The "References" section now contains links to the latest webrevs that passed reviews.
12-02-2016

Same image scaled to 600x288. Simple test performed on a E5-2697 v2, measuring the round trip latency behavior between two threads that communicate by spinning on a volatile field, with and without the onSpinWait intrinsic enabled
20-01-2016

Same image scaled to 600x288. Simple test performed on a E5-2697 v2, measuring the round trip latency behavior between two threads that communicate by spinning on a volatile field, with and without the onSpinWait intrinsic enabled
20-01-2016

Simple test performed on a E5-2697 v2, measuring the round trip latency behavior between two threads that communicate by spinning on a volatile field, with and without the onSpinWait intrinsic enabled
20-01-2016