Bug ID: JDK-8160369 Memory fences needed around setting and reading object lengths

Type: Bug
Component: hotspot
Sub-Component: gc
Affected Version: 9

Priority: P2
Status: Resolved
Resolution: Fixed

Submitted: 2016-06-27
Updated: 2016-11-23
Resolved: 2016-11-23

JDK 9
9Fixed

The size of arrays and java.lang.Class instances are stored in "length" and "oop_size" fields within the object. The contract for object initialization is that once the object header has been set non-zero (the klass field in particular), the object is parsable by concurrent GCs.

Although the source code is writing the fields in the correct order (after JDK-8158946 is fixed), there are no memory fences to ensure that the compiler or CPU doesn't reorder the writes or reads of these fields.

All subtasks have been fixed.
23-11-2016
Partial, in progress webrev at http://cr.openjdk.java.net/~drwhite/8160369/webrev.03 This is probably not the way to go, but didn't want to loose the in-progress work...
22-07-2016
Disagree with your first point. Regular objects also need the barriers. I.e. the writing of the layout helper of the Klass must happen before publishing the Klass pointer to internal data structures (i.e. it is accessible by the mutator threads). (Edit: that may already be the case, but needs to be verified) This is because regular (humongous) objects may be allocated directly into the old gen too. The issue is a bit simpler here, because we can assume that there is an implicit loadload barrier when the reader accesses the layout helper. Also, maybe it helps that at least in G1, regular (huge) objects are always allocated through the runtime. And java.lang.Class instances also only need the barrier if allocated into the old gen (afaik they are not). Not sure if it is theoretically possible to have humongous java.lang.Class instances. Also those "other" cases where the layout helper is < 0, and we take the virtual call must be considered too (whatever these are) to make sure that a loadload barrier is executed (assuming that instances of these kind can be somehow allocated in old gen). Please, also let's discuss webrevs on the mailing lists. This is not the place to do that.
29-06-2016
ok, moving discussion to mailing list...
29-06-2016
Following on to Thomas' points: - Two types of objects need to have these memory barriers: - Arrays - java.lang.Class instances (which store the instances' size in an magic "oop_size" field.) - Only old gen needs the store barriers: -> So we don't need to worry about the fast path allocators in the compiler, only deal with slow path in CollectedHeap::array_allocate and (new) class_allocate. -> These are relatively rare, so should not have performance implications. Load barriers should only be needed by concurrent GC readers. It is already the case that the concurrent readers need to handle running into a partially initialized object, i.e. an object with a zero "klass" field in the header. So this code already calls oopDesc::klass_or_null(). Once again, the load barriers are only needed reading the array length (or size), and for the j.l.Class instances. The j.l.Class case can be handled by unconditionally adding a loadload barrier in InstanceMirrorKlass::oop_size(). This should be relatively rare and not be a performance issue. To handle arrays I've tried following all code that calls oopDesc::klass_or_null() and then calls oopDesc::size_given_klass() (or just ::size()). I added a "concurrent" parameter to size_given_klass(), defaulted to false, that if true, issues a loadload barrier if the object is an array. This is all inline code, so shouldn't add overhead unless it's needed. A webrev with this fix is located here: http://cr.openjdk.java.net/~drwhite/8160369/webrev.02/ My main concern with this webrev is that it's not always clear that it handles all calls to oop::size() - some are buried in iterators etc. In addition, if some code is calling oopDesc::klass_or_null() when not doing concurrent GC, then I may have added some unnecessary barriers. Another approach is to simply add the loadload barrier to oopDesc::klass_or_null() if the oop is an array. @dholmes: There are existing load barriers in CMS that were added for PPC that fix 1/2 the problem, but nothing on the store side. And I haven't seen any G1 code that tried to handle this issue yet.
29-06-2016
Is this a GC or Runtime issue? I recall some barrier insertions in related code a couple of years ago. The intent is that a barrier is in place prior to the object being "published" such that the GC can see it. Please include runtime, or at least me, on any code review here. Thanks.
29-06-2016
Some more thoughts about this while hacking some prototype change: - only old gen allocations (or accesses in the old gen) need this barriers executed, as all concurrent gcs only ever scan the objects in the old gen (afaik); for g1 these are only humongous objects. If this in any way helps of course. Nonconcurrent gcs obviously also do not need this. - (naturally) all kinds of reading the object size needs to be handled; from what I saw there are three ways to get an object size (if we want to fix it at the call to read the object size): 1) from the Klass'es layout helper (regular objects) 2) the length field within the object 3) using a virtual call to the Klass::oop_size() 1+3 are probably not an issue because the architectures we support already have an implicit loadload barrier on address dependent accesses (which you have if you access a field via the oop's Klass. (Did not exactly check 3), but in the end it will be some field in the Klass instance that is used anyway) 2 is the problematic case, as there is no implicit loadload barrier. Alternatively one could also look at all code that could be executed concurrently that potentially reads object sizes and add barriers there. - the storestore barriers need to be added in the code that writes these fields to ensure the sequence of "length" and "klass" fields stored in order - there are the the post_allocation_setup_XXX methods, somewhere there could be a good location. They should be the ones called during old gen allocation (and some of them for out-of-tlab allocation, but they should be rare). - there may be need for another barrier somewhere in the class loading code, after the layout helper is set and before the Klass is published if there is not (I did not follow through the entire allocation call sequence), since Klass is a MetaspaceObj, i.e. allocated outside the heap. This list may not be complete.
29-06-2016
Actually the problem occurs for any array where this kind of assumption is made (klass set after size of object).
28-06-2016

Relates :	JDK-8159864 - gc/stress/TestStressRSetCoarsening.java hits assert assert(ILocked()) failed: invariant
Relates :	JDK-8166811 - Missing memory fences between memory allocation and refinement
Relates :	JDK-8158946 - btree009 fails with assert(s > 0) failed: Bad size calculated

JDK-8165808 :	Add release barriers when allocating objects with concurrent collection - Resolved
JDK-8166228 :	Remove unused HeapRegion::object_iterate_mem_careful() - Resolved
JDK-8166229 :	Eliminate ParNew's use of klass_or_null() - Resolved
JDK-8166583 :	Add oopDesc::klass_or_null_acquire() - Resolved
JDK-8166607 :	G1 needs klass_or_null_acquire - Resolved
JDK-8166663 :	Simplify oops_on_card_seq_iterate_careful - Resolved
JDK-8166862 :	CMS needs klass_or_null_acquire - Resolved