Bug ID: JDK-8350621 Code cache stops scheduling GC

Type: Bug
Component: hotspot
Sub-Component: gc
Affected Version: 20,21.0.6,22,23,24,25

Priority: P2
Status: Open
Resolution: Unresolved

Submitted: 2025-02-17
Updated: 2025-07-10

JDK 25
25Unresolved

ADDITIONAL SYSTEM INFORMATION :
openjdk version "17.0.12" 2024-07-16 LTS
OpenJDK Runtime Environment Zulu17.52+17-CA (build 17.0.12+7-LTS)
OpenJDK 64-Bit Server VM Zulu17.52+17-CA (build 17.0.12+7-LTS, mixed mode, sharing)
Linux sayonara 6.12.10-arch1-1 #1 SMP PREEMPT_DYNAMIC Sat, 18 Jan 2025 02:26:57 +0000 x86_64 GNU/Linux

A DESCRIPTION OF THE PROBLEM :
Sweeper was removed recently, in JDK-8290025.
Now, the GC is responsible to free memory in code cache (see CodeCache::gc_on_allocation).

There are corner cases where the GC may not be scheduled anymore because the `_unloading_threshold_gc_requested` is not reset correctly by the GC

I did observe this situation after migrating JVM from Java 17 to 21.
I can reproduce this issue with G1GC and ParallelGC (less likely to happen).

The problem is that `CodeCache::gc_on_allocation` is supposed to run the GC (or schedule it), which is done after setting `_unloading_threshold_gc_requested` to `true`, but the GC may not be run/scheduled in some situation (for instance, if GC is already running). As a result, the flag will remain to `false` and `CodeCache` will not be able to run/schedule the GC anymore.

This issue is more likely to happen if:
- there is pressure on code cache (not large enough / big application with a lot of activity)
- the heap is large and full GC does not happen or very rarely

REGRESSION : Last worked in version 17.0.14

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Run the following java program attached (see Test Case Code) with the following JVM flags:

```
-Xlog:gc*=info,codecache=info -Xmx512m -XX:ReservedCodeCacheSize=2496k -XX:StartAggressiveSweepingAt=15
```

Then attach jconsole and monitor code cache.
The purpose of jconsole is not only to monitor code cache, it is also responsible for generating activity in the code cache.

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Even if the reserved memory is low for the code cache, the JVM should be able to survive by clearing it periodically without being forced to disable the compiler
ACTUAL -
Everything works fine at the beginning, GC will be triggered to clean code cache, but at some point GC will not be scheduled anymore and the cache will fill up to full.

```
[183.393s][info][codecache   ] Code cache is full - disabling compilation
[183.393s][warning][codecache   ] CodeCache is full. Compiler has been disabled.
[183.393s][warning][codecache   ] Try increasing the code cache size using -XX:ReservedCodeCacheSize=
OpenJDK 64-Bit Server VM warning: CodeCache is full. Compiler has been disabled.
OpenJDK 64-Bit Server VM warning: Try increasing the code cache size using -XX:ReservedCodeCacheSize=
CodeCache: size=2496Kb used=2475Kb max_used=2475Kb free=20Kb
 bounds [0x00007ac6e3890000, 0x00007ac6e3b00000, 0x00007ac6e3b00000]
 total_blobs=1141 nmethods=652 adapters=399
 compilation: disabled (not enough contiguous free space left)
              stopped_count=1, restarted_count=0
 full_count=1
```

As the heap is high and the program does not allocate anything, the regular full GC will not happen leaving the application in a unstable situation

---------- BEGIN SOURCE ----------
package fr.alexandrejacob;

public class CodeCacheMain {

    public static void main(String[] args) throws InterruptedException {
        while (true) {
            Thread.sleep(100);
        }
    }
}

---------- END SOURCE ----------

A pull request was submitted for review. Branch: master URL: https://git.openjdk.org/jdk/pull/26189 Date: 2025-07-08 15:03:27 +0000
08-07-2025
The problem is a mismatch between what the code cache management sees as "marking/reclamation cycle" and what G1 sees as such. I.e. from a code cache management POV the marking/reclamation starts from (G1's) concurrent start and ends at the Remark pause (it is notified then). However, for G1 the concurrent (marking) cycle starts at concurrent start and ends after the Cleanup pause. So if a request from the code cache management is scheduled between (after) Remark and G1's end of the concurrent cycle, code cache management believes that a suitable garbage collection has been scheduled and waits for the notification. That does not actually happen: the notification occurs in the Remark pause, but since G1 sees the concurrent cycle still active, G1 does not schedule one again - and since G1 progressed already beyond the Remark pause that notification will never come. Parallel does not seem to be affected.
26-06-2025
The next full gc/marking will clear that flag again, and the aggressive code cache GCs will be triggered again (or be blocked again the same way). It is briefly mentioned in the description I think, but to point that out explicitly.
24-06-2025
A pull request was submitted for review. Branch: master URL: https://git.openjdk.org/jdk/pull/23656 Date: 2025-02-16 18:39:29 +0000
28-04-2025
Probably affects everything from 20 to including 25 after JDK-8290025 (removing the code cache sweeper)
26-02-2025