JDK-8056240 : Investigate increased GC remark time after class unloading changes in CRM Fuse
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 8u40,9
  • Priority: P2
  • Status: Closed
  • Resolution: Fixed
  • Submitted: 2014-08-28
  • Updated: 2017-07-26
  • Resolved: 2014-10-17
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
8u40Fixed 9 b38Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Relates :  
Tests on CRM Fuse after Class Unloading after a concurrent marking cycle (JDK-8049421) had been pushed to the 8u40-dev show increased GC remark times that are larger than expected compared to earlier prototypes.

E.g. log messages with -XX:G1LogLevel=finest show:
13160.468: #4064: [GC remark
  13160.468: #4064: [Finalize Marking, 0.0013290 secs]
  13160.469: #4064: [GC ref-proc
    13160.469: #4064: [SoftReference, 4610 refs, 0.0022800 secs]
    13160.472: #4064: [WeakReference, 67021 refs, 0.0236090 secs]
    13160.495: #4064: [FinalReference, 1279 refs, 0.0056960 secs]
    13160.501: #4064: [PhantomReference, 38 refs, 0.0000850 secs]
    13160.501: #4064: [JNI Weak Reference, 0.0000120 secs],
  0.0325860 secs]
13160.502: #4064: [Unloading
   13160.502: #4064: [System Dictionary Unloading, 0.2235080 secs]
  13160.725: #4064: [Parallel Unloading, 0.0444030 secs],
  0.2679750 secs],
0.3044090 secs]

Particularly the System Dictionary Unloading times look too high. Investigate the cause and eventually look for fixes.
Is it walking the code cache that is slow or is it calling back to mark things on stack? The callback is indirect through a function pointer and a virtual call. This was only needed to generalize metadata_do because I didn't want to copy the code.

I wonder if we can have a Method* hashtable like the Symbol* hashtable and only walk that instead of the code cache. We can populate the table when creating nmethods (or increase refcounts) and decrease refcounts when sweeping the nmethod, and remove the Method* when the refcount is zero. Like the SymbolTable with the same mechanism as possibly_parallel_unlink.

No, I can't because of ordering problems. I have a patch that I'll send out when I've verified that the parallel class loading tests pass.

Can you call CodeCache::do_unloading() before SystemDictionary::do_unloading() and mark methods->on_stack() at the same time?

A few ideas: * Create separate relocation tables for oops and metadata, so that iterating over only oops or only metadata can be done much faster. * If we are only interested in looking at the metadata without writing to it we could ensure that all metadata is present in the metadata section in the nmethod and thereby we can avoid iterating over relocations completely in some cases.

We could probably do something like that, but we still need to keep down the pause time spikes that this is causing, so that's not really a complete solution. We need to figure out why walking the CodeCache and using the relocIterators are so slow. We might have to rewrite parts of it to optimize it for faster iterations over oops and metadata.

Can you also try some simpler fix that doesn't call MetadataOnStackMark for every do_unloading() call? There should only be entries to purge if there's been a lot of redefinition. We can delay purging metadata until the metaspace has hit the HWM. Or if there's further redefinition. Actually further redefinition also purges the previous versions. Wonder if we can just not do this at all in class unloading. In order to deallocate CLD->deallocate_list, we have to call purge_previous_versions to clean_weak_method_links right before, so we can't only do this during class redefinition. There might be metadata on the deallocate_list that isn't from class redefinition. But we can still delay this in some way and not do this everytime.

Optimization and parallelization of CodeCache marking, needed by MetadataOnStackMark, lowers the remark times from ~300 ms to 120 ms.

We know that the increased pause times are because of this code: MetadataOnStackMark::MetadataOnStackMark(bool has_redefined_a_class) { ... if (has_redefined_a_class) { CodeCache::alive_nmethods_do(nmethod::mark_on_stack); }

ILW: I -> medium: gives somewhat bad average response time, only on that benchmark though L -> high: always W -> high: no known workaround MHH -> P2