JDK-8317809 : Insertion of free code blobs into code cache can be very slow during class unloading
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 20,21,22
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2023-10-10
  • Updated: 2024-05-03
  • Resolved: 2023-12-05
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 21 JDK 22
21.0.4-oracleFixed 22 b27Fixed
Related Reports
Blocks :  
Blocks :  
Blocks :  
Duplicate :  
Relates :  
Relates :  
Description
Class unloading adds dead code blobs to the free list at the end of that process item (freed code blob) by item. The insertion keeps the free list sorted.

Since the free list is a linked list, sorted insertion of a single element is O(n); with 10's of thousands of elements to insert with a stress test application this process can take a very long time, lengthening pauses in the range of a few seconds.

This is particularly problematic since removal of the code sweeper (JDK-8290025) which did that process concurrently, but G1, Serial and Parallel GC do code cache unloading in a pause now.
Comments
Fix request [21u] I backport this for parity with 21.0.4-oracle. Larger change in central component, medium to higher risk, but the original author had a look also on the issue that caused backout JDK-8329524. Small adaptions needed. Test passes. SAP nightly testing passes.
02-05-2024

A pull request was submitted for review. URL: https://git.openjdk.org/jdk21u-dev/pull/505 Date: 2024-04-16 09:47:42 +0000
19-04-2024

Changeset: 30817b74 Author: Thomas Schatzl <tschatzl@openjdk.org> Date: 2023-12-05 10:37:34 +0000 URL: https://git.openjdk.org/jdk/commit/30817b742300f10f566e6aee3a8c1f8af4ab3083
05-12-2023

Fwiw, to put this change in a bit more context: it is part of a series of changes to improve class unloading performance back to pre-jdk21 levels (and better). The basic plan: * this change, [JDK-8317809](https://bugs.openjdk.org/browse/JDK-8317809), that improves nmethod sorting/free list handling (and introduces the ClassUnloadingContext) * [JDK-8317007](https://bugs.openjdk.org/browse/JDK-8317007) that allows bulk unregistering of nmethods instead of (slow) per-nmethod unregistering (also out for review) With the above two changes, Remark pause time should be <= before removal of the code root sweeper (lots of changes went in already that improved time taken for various parts of the class/code unloading). I am planning the following follow-ups in the next few months (after FC time will be spent on bugfixing, and holidays coming up): * (for G1) move out several parts of class unloading into the concurrent phase, at least this will include - bulk nmethod unregistering ([JDK-8317007](https://bugs.openjdk.org/browse/JDK-8317007)) - nmethod code blob freeing (this change) - metaspace unloading Not necessarily in a single change; this basically halves g1 remark pause times again in my testing. * split up and parallelize ClassLoaderData unloading; currently with this change, when registering CLDs CLD->unload() is immediately called as before. However this is wasteful as most of that method can either be "obviously" parallelized or made so that other tasks can run in parallel. So the plan is that class unloading (`SystemDictionary::do_unloading`) will be split into a part that iterates only over the CLD list to determine dead ones, and a parallel part. There are no CR/PRs out for these latter two items, but hopefully this will short of making everything concurrent keep class/code unloading times low enough for some time.
04-12-2023

A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/16759 Date: 2023-11-21 11:03:12 +0000
21-11-2023

Fwiw, a prototype for the former (pre-sorting items) reduces the time freeing code blobs to 50ms or less even in these extreme cases.
10-10-2023

One solution is gathering and pre-sorting the items and do a merge-sort which is O(n+m) instead of O(n*m). Another option is to redesign the free code blob management carefully.
10-10-2023