A posting to the concurrency-interest mailing list drew attention to this piece of the Java Memory Model:
On p341 of JCIP, the Interruption Rule states:
"A thread calling interrupt on another thread happens-before the interrupted thread detects the interrupt (either by having InterruptedException thrown, or invoking isInterrupted or interrupted)"
This is referring to the final bullet point in JLS 3, 17.4.4.
While the exact semantics of this statement are unclear the expert opinion was that it means that the interrupted state of a thread should act like a Java volatile variable: interrupt() will involve a volatile-write, while explicitly checking the interrupt state will involve a volatile-read. (This does not fully define the semantics because it doesn't indicate if you can elide the volatile-write in interrupt() if the thread is already interrupted - but that is not important for this CR.)
In light of the recent memory-barrier issue with the park/unpark code (CR 6822370) the thread interruption was code was examined to see if the basic JMM requirements were being met. We discovered one definite issue:
- in os::interrupted(Thread* t, bool clear_interrupt) there is a missing fence after clearing the interrupt state
There is also some discussion over whether reading the interrupt state requires some additional barriers. Doug Lea writes:
> Stepping back a bit first: If interrupt status were a Java
> volatile, we'd place a MemBarAcquire after each read.
> For writes we'd precede with a MemBarRelease and follow
> with a MemBarVolatile. The mechanics (see for example
> insert_mem_bar in graphKit.cpp, as called in
> the usual case in parse3.cpp) prevent reorderings,
> kill opportunities to reuse reads, and maybe
> generate processor barrier instructions.
>
> Since osthread._interrupted is VM-level field, we
> can't do that, but can get the same effect.
>
> In the write case, which is all done at VM-level
> (as in os_solaris.cpp approx line 3950)
> we do need a trailing storeLoad fence inside the
> VM-level interrupted, which we have. We don't need
> any further reordering control since the compiler
> must conservatively treat it as an opaque call, which
> disables any reorderings that could matter here.
>
> The inlined part of read case (library_calls.cpp),
> manually does the reordering prevention, but doesn't
> necessarily have the reads-kill or barrier-instruction
> effects.
>
> Officially, it ought to, although the only cases
> I can see where it might make a difference are
> racy to start with.
>
> Because the inlined forms of Thread.interrupted() may
> branch to opaque call to VM-level, the entire if-else
> context can't be reordered in any interesting way.
> However, I'm not so sure about the case of
> non-clearing form isInterrupted returning true:
> r = x.aField;
> if (Thread.currentThread().isInterrupted()) {
> r = x.aField; // maybe is optimized away?
> ...
>
> Also, on machines that actually need acquire fences
> for volatile reads, like ARM and PPC, they
> ought to be generated, although because interrupt
> polling is racy anyway, it is unlikely to matter much.
>
> Anyway, for the sake of making sure it is right,
> it would be good to ask someone in c2 if/how we
> can add something like
> insert_mem_bar(Op_MemBarAcquire, p);
> in LibraryCallKit::inline_native_isInterrupted()
> ("p" is the read of _interrupted).
>
It is worth noting that for some thread t, where t is not the current thread, that t.interrupt() and t.isInterrupted() both result in the holding of the Threads_lock inside the VM - and consequently there are strong barriers already present there.
Further the c2 intrinsic is only used for the (in practice rare) case of Thread.currentThread().isInterrupted().
In that light it may suffice to make the following changes in osThread.hpp:
bool interrupted() const {
return OrderAccess::load_acquire(& _interrupted) != 0;
}
void set_interrupted(bool z) {
OrderAccess::release_store_fence(&_interrupted, z ? 1 : 0);
}
and remove the explicit release()/fence() actions in os::interrupt and os::is_interrupted.
If a change is also needed to the C2 intrinsic then a simple load_acquire should also suffice.