JDK-8178497 : Bug in MutableNUMASpace::ensure_parsability
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 9,10
  • Priority: P2
  • Status: Closed
  • Resolution: Fixed
  • Submitted: 2017-04-11
  • Updated: 2018-06-21
  • Resolved: 2017-11-27
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 10
10 b36Fixed
Related Reports
Relates :  
Description
In the following code in MutableNUMASpace::ensure_parsability:

            size_t touched_words = ... pointer_delta(s->end(), s->top()) ...; // where end() and top() are HeapWords.
[...]
 0         HeapWord *crossing_start = align_up((HeapWord*)cur_top, os::vm_page_size());
 1         HeapWord *crossing_end = align_down((HeapWord*)(cur_top + touched_words), os::vm_page_size());
 2         if (crossing_start != crossing_end) {
 3           // If object header crossed a small page boundary we mark the area
 4           // as invalid rounding it to a page_size().
 5           HeapWord *start = MAX2(align_down((HeapWord*)cur_top, page_size()), s->bottom());
 6           HeapWord *end = MIN2(align_up((HeapWord*)(cur_top + touched_words), page_size()), s->end());
 7           invalid = MemRegion(start, end);

cur_top is an intptr_t that gets added something in word_granularity in lines 1 and 6. That looks wrong, particularly because in the other uses of align_up and align_down cur_top is cast to HeapWord* first.

I.e. cur_top + touched_words doesn't use pointer arithmetic, so the the resulting values are too small.

This was found while working on JDK-8178489.
Comments
Proposed patch is about changing 'intptr_t' to 'HeapWord*' as Kim B. and Thomas S. mentioned. * Short summary: Wrong calculation at MutableNUMASpace::ensure_parsability() only affects to Solaris and the problem will be rare on product binary. * Long explanation: 0. Wrong calculation above results in creating a new 'invalid region', those regions are merged with newly selected region at MutableNUMASpace::initialize() and both ensure_parsability() and initialize() are called every GC. 1. Above problematic code is guarded with 'if (!os::numa_has_static_binding())' and only Solaris returns 'false' at numa_has_static_binding(). 2. 'touched_words' is different between product and non-product binary: #ifndef ASSERT if (!ZapUnusedHeapArea) { touched_words = MIN2((size_t)align_object_size(typeArrayOopDesc::header_size(T_INT)), touched_words); } #endif If 'touched_words' is header size(e.g. 2 byte in product binary), most cases will be same for wrong calculation and correct one since we align with page size(at least 4k or larger). e.g. cur_top=0x71d24ce80, align down value is 0x71d24c000 and align up is 0x71d24d000 for both wrong/correct cases. There will be some cases to have different values(close to page aligned size) so that adds an invalid region but it is not common. In that case, we will have wrong end. * Hundred run of GCOld on Solaris with NUMA enabled (product only) showed no problem.
22-11-2017

Making the type of cur_top be intptr_t rather than HeapWord* seem like the root of lots of messiness in this code. The update at the end would need fixing for that change, but the code would be much cleaner and the need for a bunch of cast littering would be eliminated.
15-11-2017

So as Stefan say the impact should be investigated to figure out the ILW, we think this might lead to a crash and if so maybe we should look at fixing it in 10 and the likelihood should most likely be H since this error will always occur. Also can we create a simple test that run into this, some that will rely on the heap being parsable.
15-11-2017

I haven't investigated the impact of this bug.
11-04-2017