JDK-8331572 : Allow using OopMapCache outside of STW GC phases
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 17,21,22,23
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2024-05-02
  • Updated: 2024-08-09
  • Resolved: 2024-05-21
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 23
23 b24Fixed
Related Reports
Blocks :  
Blocks :  
Relates :  
Relates :  
Relates :  
Relates :  
Description
`Method::mask_for` is used on hot paths in concurrent stack processing, see screenshot. The method actually caches the OopMaps when GC is active:

```
void Method::mask_for(int bci, InterpreterOopMap* mask) {
  methodHandle h_this(Thread::current(), this);
  // Only GC uses the OopMapCache during thread stack root scanning
  // any other uses generate an oopmap but do not save it in the cache.
  if (Universe::heap()->is_gc_active()) {
    method_holder()->mask_for(h_this, bci, mask);
  } else {
    OopMapCache::compute_one_oop_map(h_this, bci, mask);
  }
  return;
}
```

...but GC active is only set by `IsGCActiveMark`, which e.g. for Shenandoah is only set in `ShenandoahGCPauseMark`. The problem is that `CollectedHeap::is_gc_active` is specified to answer `true` only for STW GCs. 

So when we run in concurrent phases, we just compute the oop maps and drop them on the floor after use. We should consider relaxing this check and allowing collected heap implementations to cache oop maps even in concurrent phases.

This might need adjustments in how OopMapCache does locking, mostly for lock ordering with the locks that concurrent GCs hold, see JDK-8331714.

The slowness in processing interpreter frames can be replicated with Shenandoah like this:

```
% cat ManyThreadsStacks.java 
public class ManyThreadStacks {
        static final int THREADS = 1024;
        static final int DEPTH = 1024;
        static volatile Object sink;

        public static void main(String... args) {
                for (int t = 0; t < DEPTH; t++) {
                        int ft = t;
                        new Thread(() -> work(1024)).start();
                }

                while (true) {
                        sink = new byte[100_000];
                }
        }

        public static void work(int depth) {
                if (depth > 0) {
                        work(depth - 1);
                }
                while (true) {
                        try {
                                Thread.sleep(100);
                        } catch (Exception e) {
                                return;
                        }
                }
        }
}

% build/linux-aarch64-server-release/images/jdk/bin/java -Xmx1g -Xms1g -XX:+UseShenandoahGC -Xlog:gc -Xint ManyThreadsStacks.java 2>&1 | grep "marking roots"
[1.259s][info][gc] GC(0) Concurrent marking roots 73.293ms
[1.350s][info][gc] GC(1) Concurrent marking roots 73.405ms
[1.441s][info][gc] GC(2) Concurrent marking roots 73.303ms
[1.532s][info][gc] GC(3) Concurrent marking roots 73.115ms
[1.622s][info][gc] GC(4) Concurrent marking roots 73.156ms
[1.813s][info][gc] GC(5) Concurrent marking roots 73.998ms

# With the patch that enables access to OopMapCache from concurrent phases:

% build/linux-aarch64-server-release/images/jdk/bin/java -Xmx1g -Xms1g -XX:+UseShenandoahGC -Xlog:gc -Xint ManyThreadsStacks.java 2>&1 | grep "marking roots"
[1.191s][info][gc] GC(0) Concurrent marking roots 6.273ms
[1.214s][info][gc] GC(1) Concurrent marking roots 6.183ms
[1.235s][info][gc] GC(2) Concurrent marking roots 6.150ms
[1.254s][info][gc] GC(3) Concurrent marking roots 6.280ms
[1.274s][info][gc] GC(4) Concurrent marking roots 6.178ms
```
Comments
[jdk21u-fix-request] Approval Request from Aleksey Shipilëv Considerably improves concurrent GC performance with lots of threads. Does not apply cleanly, since it requires some hooks into cleanup sequence in JDK 21, see PR. Provides a firmer ground for backport bugfixes in OopMapCache. All tests pass. Was in mainline for 9+ weeks. There is a little bugtail, JDK-8334594, relaxing a too strong locked signalling. The risk is medium-low: changes a common path, but comes with only a little bugtail. The performance and maintenance benefits after related follow-up backports likely outweigh that risk.
08-08-2024

A pull request was submitted for review. Branch: master URL: https://git.openjdk.org/jdk21u-dev/pull/610 Date: 2024-05-28 09:44:28 +0000
09-07-2024

This seems to be causing a deadlocked that I've reported as JDK-8334594.
20-06-2024

Changeset: d999b81e Author: Aleksey Shipilev <shade@openjdk.org> Date: 2024-05-21 14:56:53 +0000 URL: https://git.openjdk.org/jdk/commit/d999b81e7110751be402012e1ed41b3256f5895e
21-05-2024

A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/19229 Date: 2024-05-14 12:31:08 +0000
14-05-2024

Right. I am repurposing this issue to work out the solution that gets rids of GC checks completely. Your idea of using the GlobalCounter looks very promising for solving the reclamation races, I'll credit you as the contributor, Zhengyu :)
14-05-2024

Stack walk is unrelated to any GCs. I filed JDK-8317466 mainly wanted to address stack walk (but I thought using concurrent GCs as a cover was more convincing :-)). I want to point out that slowness of stack walk is a pain point in real production systems. We use Parallel GC, stack walk number was magnitude better when oopMapCache was enabled.
08-05-2024

Looking closer at JDK-8186042, I think the *had* the OopMapCache not guarded by any GC checks: https://github.com/openjdk/jdk/commit/96aa3d9dbe18f2e721cd3f2858907e19fbba0df4 So maybe we should consider just using the cache without any GC checks again. [~coleenp], do you remember why we introduced the GC checks on those paths? Was it to manage _old_entries at the GC vmops epilogs?
08-05-2024