As discussed in JDK-8265753 we have code in the thread state transition logic that does:
thread->frame_anchor()->make_walkable(thread);
thread->set_thread_state(_thread_blocked);
this logically requires a storestore barrier in between them, or more generally for the set_thread_state to have release semantics.
Now set_thread_state actually does have release semantics on PPC and Aarch64, but that is not how we normally handle memory barriers in shared code. Shared code should be written to express all needed barriers assuming the loosest memory model. It is then up to the implementation code for the barriers to reduce it to nothing on platforms that don't need it.
There is some history as to how the current situation came about when the PPC and ARM ports were merged into mainline, but I think it would be much clearer if the barriers were more directly evident in the shared code.