JDK-8167077 : Limit deferred card marking for (large) objArrays with G1
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 9,10
  • Priority: P3
  • Status: Open
  • Resolution: Unresolved
  • Submitted: 2016-10-04
  • Updated: 2019-01-15
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdUnresolved
Related Reports
Relates :  
Description
Deferred card marking is an optimization that batches the barrier application for initial stores to objects to increase performance.

This optimization, in conjunction with large objArrays can cause inacceptable pause times, as scanning of these arrays takes a significant amount of time. This breaks pause time requirements by itself.

The applicability of this optimization for objArrays seems to be very limited too (but may be another issue).

E.g.

public class Test {
  private static Object foo() {
    Object i = new Integer(1);
    Object[] x = new Object[123];
    x[0] = i;
    return x;
  }
  public static void main(String ...args) {
    for (int i = 0; i < 1_0000_0000; i++) {
      foo();
    }
  }
}

works, but something like

Object b = new Object();

private static Object foo() {
  Object x = new Object[123];
  x[0] = b;
  return x;
}

does not.

This example also shows the limitations of this optimizations in another way: it requires you to assign each element explicitly, because any other way (like in a loop) also will not omit the barriers (in the general case can not because typically loops often have forced safepoint locations at their back-edges). Initializing large objArrays using single assignments seems very unlikely.

This means that this optimization may not be too useful for (large) objArrays, but still be very effective for regular objects for member initialization.

Measurements on some microbenchmarks that do nothing but

private static Object foo() {
  for (int i = 0; i < 5000; i++) {
    Object temp = new Object[some-value];
  }
}

can easily cause pause target requirements to be missed. (Note: not sure why the compiler emits any deferred card mark in this case, as there is no initializing store at all in this particular case. That may be another issue).

E.g. an objArray with ~1.5M entries (13M in size) results in 22ms avg/36ms max *additional* pause.
Even smaller objArrays with ~650k entries (5M in size) cause additional work of 6ms avg/10ms max additional pause.

In workloads, where a small, deterministic pause time is expected (like 30ms), this optimization already eats up a large part (or actually causes missed pause time goals) of the pause that should preferably be spent in actual space reclamation.

Another option would be to be a bit more precise about the initializing store, i.e. only pass the area(s) where initializing stores actually occur to the GC. I.e. in this case:

  private static Object foo() {
    Object i = new Integer(1);
    Object[] x = new Object[123];
    x[0] = i;
    x[..] = ...;
    x[10] = i;
    return x;
  }

only pass an area covering at least the first eleven elements of the object to the GC to rescan later instead of the entire object.