Bug ID: JDK-8244376 possibly stale comment above "struct SharedGlobals" in synchronizer.cpp

JDK-8244376 : possibly stale comment above "struct SharedGlobals" in synchronizer.cpp

Type: Bug
Component: hotspot
Sub-Component: runtime
Affected Version: 15

Priority: P4
Status: Resolved
Resolution: Fixed
OS: generic
CPU: generic

Submitted: 2020-05-04
Updated: 2024-12-13
Resolved: 2020-11-16

Versions (Unresolved/Resolved/Fixed)

The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.

JDK 16
16 b25Fixed

Related Reports

Relates :

JDK-8049717 - expose L1_data_cache_line_size for diagnostic/sanity checks

Description

src/hotspot/share/runtime/synchronizer.cpp contains the
following comment today:

// Performance concern:
// OrderAccess::storestore() calls release() which at one time stored 0
// into the global volatile OrderAccess::dummy variable. This store was
// unnecessary for correctness. Many threads storing into a common location
// causes considerable cache migration or "sloshing" on large SMP systems.
// As such, I avoided using OrderAccess::storestore(). In some cases
// OrderAccess::fence() -- which incurs local latency on the executing
// processor -- is a better choice as it scales on SMP systems.

Kim B. happened to notice this comment recently and thinks it
might be stale or wrong.

I did spelunking about the history behind this comment:

Before the fix for JDK-8049717 was pushed, this comment was:

394 // Performance concern:
395 // OrderAccess::storestore() calls release() which STs 0 into the global volatile
396 // OrderAccess::Dummy variable. This store is unnecessary for correctness.
397 // Many threads STing into a common location causes considerable cache migration
398 // or "sloshing" on large SMP system. As such, I avoid using OrderAccess::storestore()
399 // until it's repaired. In some cases OrderAccess::fence() -- which incurs local
400 // latency on the executing processor -- is a better choice as it scales on SMP
401 // systems. See http://blogs.sun.com/dave/entry/biased_locking_in_hotspot for a
402 // discussion of coherency costs. Note that all our current reference platforms
403 // provide strong ST-ST order, so the issue is moot on IA32, x64, and SPARC.

During the code review cycle for JDK-8049717, we considered
deleting part of the comment and we also considered various
rewrites of the comment.

https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2014-July/012138.html

We finally settled on the following rewrite from David H:

I think the performance comment has lost a bit of context. May I suggest just a slight change to the original that puts it into historical perspective:
// Performance concern:
// OrderAccess::storestore() calls release() which at one time stored 0
// into the global volatile OrderAccess::Dummy variable. This store was
// unnecessary for correctness. Many threads storing into a common location
// causes considerable cache migration or "sloshing" on large SMP systems.
// As such, I avoided using OrderAccess::storestore(). In some cases
// OrderAccess::fence() -- which incurs local latency on the executing processor
// -- is a better choice as it scales on SMP systems. See
// http://blogs.sun.com/dave/entry/biased_locking_in_hotspot for a discussion
// of coherency costs. Note that all our current reference platforms provide
// strong ST-ST order, so the issue is moot on IA32, x64, and SPARC.

Though I also note that there is no a single OrderAccess call in synchronizer.cpp

I didn't find any further tweaks to the wording as part of the
JDK-8049717 code review cycle, but it looks like other minor
edits have occurred since then (done by other bug fixes).

Comments

Changeset: 1d7ed03d Author: Daniel D. Daugherty <dcubed@openjdk.org> Date: 2020-11-16 14:10:08 +0000 URL: https://github.com/openjdk/jdk/commit/1d7ed03d
16-11-2020
There is now a single call to an OrderAccess function in the code: $ grep OrderAccess src/hotspot/share/runtime/synchronizer.cpp OrderAccess::loadload_for_IRIW(); However, that call is not related to the comments in question. I'm going with [~dholmes] last comment and deleting (almost) the entire comment block. I left the "Hash Code handling " header.
13-11-2020
Just a bit more historical context. If we go back to JDK 7 version of the code then immediately after the comment we have: static int MBFence (int x) { OrderAccess::fence(); return x; } but that static file function is not used. If we go back to JDK 1.6.0 then we note the storestore performance issue but then elide all barriers as we rely on only supporting TSO systems. The unused MBFence static file function also exists, unused. I suspect this code was updated around 6u18 or 6u23 when we started supporting other platforms. Bottom line: I think the entire comment block can just be removed now.
04-05-2020
I'm leaning toward deleting the first two paragraphs so that we have just: // ----------------------------------------------------------------------------- // Hash Code handling // // As a general policy we use "volatile" to control compiler-based reordering // and explicit fences (barriers) to control for architectural reordering // performed by the CPU(s) or platform. struct SharedGlobals { char _pad_prefix[OM_CACHE_LINE_SIZE]; // These are highly shared mostly-read variables. // To avoid false-sharing they need to be the sole occupants of a cache line. volatile int stw_random; volatile int stw_cycle; DEFINE_PAD_MINUS_SIZE(1, OM_CACHE_LINE_SIZE, sizeof(volatile int) * 2); // Hot RW variable -- Sequester to avoid false-sharing volatile int hc_sequence; DEFINE_PAD_MINUS_SIZE(2, OM_CACHE_LINE_SIZE, sizeof(volatile int)); }; static SharedGlobals GVars; Although, even that final paragraph is showing it's age since we're in the process of migrating code away from "volatile" to explicit Atomic::load() and Atomic::store() calls. See JDK-8234192 undefined behavior: C++ volatile keyword
04-05-2020
The fix for JDK-8049717 was pushed to JDK9 on 2014.07.15 so even the revised version of this comment is old. Here's the entire block as of 2020.05.04: src/hotspot/share/runtime/synchronizer.cpp (revision 4198213fc371): // ----------------------------------------------------------------------------- // Hash Code handling // // Performance concern: // OrderAccess::storestore() calls release() which at one time stored 0 // into the global volatile OrderAccess::dummy variable. This store was // unnecessary for correctness. Many threads storing into a common location // causes considerable cache migration or "sloshing" on large SMP systems. // As such, I avoided using OrderAccess::storestore(). In some cases // OrderAccess::fence() -- which incurs local latency on the executing // processor -- is a better choice as it scales on SMP systems. // // See http://blogs.oracle.com/dave/entry/biased_locking_in_hotspot for // a discussion of coherency costs. Note that all our current reference // platforms provide strong ST-ST order, so the issue is moot on IA32, // x64, and SPARC. // // As a general policy we use "volatile" to control compiler-based reordering // and explicit fences (barriers) to control for architectural reordering // performed by the CPU(s) or platform. struct SharedGlobals { char _pad_prefix[OM_CACHE_LINE_SIZE]; // These are highly shared mostly-read variables. // To avoid false-sharing they need to be the sole occupants of a cache line. volatile int stw_random; volatile int stw_cycle; DEFINE_PAD_MINUS_SIZE(1, OM_CACHE_LINE_SIZE, sizeof(volatile int) * 2); // Hot RW variable -- Sequester to avoid false-sharing volatile int hc_sequence; DEFINE_PAD_MINUS_SIZE(2, OM_CACHE_LINE_SIZE, sizeof(volatile int)); }; static SharedGlobals GVars; As David H. pointed in the JDK-8049717 review, the entire "Performance concern" paragraph talks about a past version of OrderAccess and OrderAccess isn't even used in the synchronizer.cpp file at all. The second paragraph has a link to Dice's blog entry for a discussion about coherency costs and it's not clear (to me anyway) why that paragraph is here either. The last paragraph is about the use of "volatile" and that makes sense since "struct SharedGlobals" has some volatile fields.
04-05-2020