Bug ID: JDK-8160539 Stack frame scanning acquires DerivedPointerTableGC

Versions (Unresolved/Resolved/Fixed)

The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.

JDK 13
13 b22Fixed

The JavaThread::oops_do code path currently contains at least three ways to acquire more or less global mutexes. This can led to lock contention during parallel stack walking and thus long root scan times.

There are three kinds of locks known to be taken in the code path:

DerivedPointerTableGC_lock
- guards DerivedPointerTable::add, which is called for every c2-compiled stack frame which contains derived pointers. It is currently unknown how common derived pointers are in real workloads.

The other two are not (any longer) a problem:

OopMapCache::_mut
- guards all retrieval of InterpreterOopMap instances, which are used to scan a specific (Method*, bci) executing in the interpreter. The per-klass OopMapCaches are lazily allocated as per below. The mutex protects hash lookup, generation of new oop maps if cache miss, hash table insertion after generation and eviction of less recently used oop maps. See OopMapCache::lookup.  This was dealt with by JDK-8186042, so is no longer a problem.

OopMapCacheAlloc_lock
- guards the lazy initialization of InstanceKlass::_oop_map_cache, only taken if a thread observes _oop_map_cache == NULL so unless new classes are added all the time it should disappear after warmup. Because of the use of DCLP, this lock is rarely hit, so should not be a performance bottleneck.

The DerivedPointerTable is presently just a GrowableArray of DerivedPointerEntry*, protected by the DerivedPointerTableGC_lock against concurrent insertions. There don't seem to be any other uses of that lock. It might be that the lock could be eliminated via a change of representation to use LockFreeStack for the table.
05-05-2019
Not going to change OopMapCacheAlloc_lock. This lock won't affect performance.
26-03-2019
There are about 212 OopMapCacheAlloc_lock taken for specjbb2015. It's an easy thing to fix though so I guess it's worth it. OopMapCache::_mux lock was taken out as part of JDK-8186042. DerivedPointerTableGC_lock should be filed a new compiler issue, if it's still a bottleneck and linked to this bug.
25-03-2019
Runtime asks if the DervivedPointerTableGC_lock issue can be moved to a separate bug report assigned to the compiler team.
15-08-2016
For the DerivedPointer stuff, they may get larger than expected, see the (serial) iteration time reported in JDK-8152948. (That code may be completely unrelated though).
29-06-2016

Relates :	JDK-8186042 - Optimize OopMapCache lookup
Relates :	JDK-8152948 - More unaccounted other time on large machines