Long time ago, safepoint polls worked simply by polling the page that will be MPROT_NONE by the runtime when needed. This means we could treat the polling page address as a constant. We even did RIP-based encoding for it. C2 code just put out the constant as Safepoint node input.
With JDK-8189941 (Thread-local handshakes), VM handles safepoints by creating two pages, readable and unreadable: https://github.com/openjdk/jdk/blob/8174cbd5cb797a80d48246a686897ef6fe64ed57/src/hotspot/share/runtime/safepointMechanism.cpp#L65-L75
This is smart, as we do not have to mprotect things when safepoint is needed, we only need to swap the _address_ of the unreadable page into the thread-local field: https://github.com/openjdk/jdk/blob/8174cbd5cb797a80d48246a686897ef6fe64ed57/src/hotspot/share/runtime/safepointMechanism.inline.hpp#L94-L102
...and then we should just read the address of the current page and try to poll it. If it is unreadable, we trap. So far so good.
Now we go to C2 code. C2 makes the load of page address from a thread-local field and feeds that address into Safepoint node: https://github.com/openjdk/jdk/blob/8174cbd5cb797a80d48246a686897ef6fe64ed57/src/hotspot/share/opto/parse1.cpp#L2271-L2276
Correctness question: What guarantees that the load of the page address and the safepoint poll stay close? In other words, what guarantees we _do not_ hoist the load of *currently readable page* away from the Safepoint node, and thus do not miss the transition of that thread-local location to *unreadable page*? This seems to work well currently, since polling page load and safepoint share control, and safepoint produces control as well.
Performance question: Are we risking pulling something heavy-weight between read of polling page address and the poll, so that we delay reacting to safepoint?
I think both can be solved if we push the polling address load down to Safepoint match rules. Example for AArch64:
https://github.com/openjdk/jdk/compare/master...shipilev:jdk:JDK-8342605-c2-safepoint-poll