JDK-8290025 : Remove the Sweeper
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 8,11,17,19,20
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2022-07-08
  • Updated: 2024-07-18
  • Resolved: 2022-08-25
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 20
20 b13Fixed
Related Reports
Blocks :  
Blocks :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Description
When the world was still young, the sweeper was built to unload bad smelling nmethods. While it has been going through various revisions, the GCs got support for class unloading, and the need for the GCs to get rid of nmethods with a different unpleasant scent.

The two systems would now compete for unloading nmethods, and the responsibility of throwing away nmethods would blur. The sweeper was still good at throwing away nmethods faster as it only needs to scan stacks, and not do a full GC.

With the advent of Loom, the situation has gotten even worse. The stacks are now also in the Java heap. The sweeper is unable to throw away nmethods without the liveness analysis of a full GC, which also performs code cache unloading, but isn't allowed to actually delete nmethods due to races with the sweeper. In a way we have the worst of both worlds, where both the sweeper and GC are crippled, unable to unload nmethods without the help of the other. And there are a very large number of complicated races that the JVM needs to deal with, especially with concurrent code cache unloading not interfering with concurrent sweeping. And concurrent sweeping not interfering with the application.

The sweeper cycle exposes 2 flavours of nmethods that are "dead" to the system. So whenever nmethods are used, we have to know they are not dead. But we typically don't have the tools to really know they are not dead. For example, one might think grabbing the CodeCache_lock and using an iterator that only walks is_alive() nmethods would help make sure you don't get dead nmethods in your iterator. However, that is not the case, because the CodeCache_lock can't be held across the entire zombie transition due to "reasons" that are not trivial to actually change. Because of this, code has to deal with nmethods flipping around randomly to a dead state.

I propose to get out of this sad situation, by removing the sweeper. If we need a full GC anyway to remove nmethods, we might as well let the GC do everything. This removes the notion of is_zombie(), is_unloaded() and hence is_alive() from the JVM. It also removes the notion of the orthogonal but related nmethodLocker to keep nmethods around, without preventing them from dying. We instead throw away nmethods the way we throw away pretty much anything else in the unloading GC code:
1. Unlink
2. Global sync
3. Throw away
4. Profit!
This way, if you get a reference to an nmethod, it won't go away until the next safepoint poll, and will not flip around liveness due to concurrent transitions.
Comments
This change created a chunk of dead code related to `is_in_asgct`, which was subsequently removed by JDK-8297864.
05-12-2022

Changeset: 054c23f4 Author: Erik Ă–sterlund <eosterlund@openjdk.org> Date: 2022-08-25 09:48:55 +0000 URL: https://git.openjdk.org/jdk/commit/054c23f484522881a0879176383d970a8de41201
25-08-2022

A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/9741 Date: 2022-08-04 09:02:02 +0000
04-08-2022

Hi [~eastigeevich], Yes, there is indeed a hook for code cache allocations, which mitigates running out of CodeCache, but GC style. Very similar to how metaspace unloading also has a trigger on metadata allocations, to trigger GCs. The GC is also tracking temperature, but through nmethod entry barriers. This is why I have lately pushed some patches to make nmethod entry barriers as fast as possible on x86_64 and AArch64. The idea is to run with them on all the time with all GCs, like when we enable loom. It solves the problem of knowing what heap stacks have been marked through, but also has the implicit time dimension how long ago that was. I have a bunch of heuristics that looks at code cache allocation rate and average GC times to compute a hotness metric based on number of GCs that an nmethod has not been called. This is a more direct hotness heuristic compared to the previous sampling based mechanism that the sweeper used.
25-07-2022

Hi Erik, The idea of using GC to clean up CodeCache LGTM. At the moment Sweeper can be triggered from CodeCache::allocate. This mitigates running out of CodeCache and stopping compilation. Will you have something similar? With thousands of compiled methods there is a problem of broken hot code locality: hot methods are put too far from each other. The Sweeper can do compacting relocation based on nmethod's temperature. Will it be possible to implement in GC?
25-07-2022

Nmethod entry barriers were introduced in JDK12. I am targeting JDK20 - 8 releases later.
22-07-2022

Hmm. If I were in your shoes I would ensure arm32 has nmethod entry barrier support. It is kind of expected all platforms have it right now, and I believe arm32 is the only exception of that. Having said that, I might be able to give you a short term solution where if nmethod aging is turned off, you can run without nmethod entry barriers, and dodge implementing it yet a little bit longer, if that is indeed an issue.
21-07-2022

Loom is not yet supported by arm32. VMContinuations + enable-preview causes arm32 VM to crash due to many Unimplemented things in hotspot/cpu/arm
21-07-2022

Then it would seem that now is an excellent time to implement nmethod entry barriers. It is also required for loom to work already. What happens if without my patch you run the same program with --enable-preview?
20-07-2022

Thanks for the JBS reports. Now I see it was a long story of pain :) FYI. On ARM32 the change gets ShouldNotReachHere with following stack: - BarrierSetNMethod::is_armed(nmethod*)+0x2c - BarrierSetNMethod::nmethod_osr_entry_barrier(nmethod*)+0x1c - InterpreterRuntime::frequency_counter_overflow(JavaThread*, unsigned char*)+0x30c Issue is that nmethod barriers are not implemented for ARM. With the change frequency_counter_overflow gets non-null bs_nm from select_barrier_set_nmethod, and fails in the next call.
20-07-2022

Update: my code (https://github.com/openjdk/jdk/compare/master...fisk:jdk:remove_sweeper) now passes mach5 tier1-5, and brings no regressions in our perf system. Had to optimize nmethod entry barriers further to get there. Will upstream said optimizations separately.
20-07-2022

> Why does GC needs to unload nmethods? Why don't mark them as "dead" and let the sweeper to do its job? The GC needs to unload nmethods, because the GC is responsible for class unloading. It could be that a class is no longer reachable in the system, and hence the class will get unloaded. But nmethods could have speculatively inlined calls into that class, and embedded oops of that class (such as static final oops), which are dead. Therefore, the GC *has* to unload nmethods that die when their classes are unloaded. And that is indeed done by marking them as a flavour of dead today (cf. make_unloaded) What I am doing is asking the exact same question, but with the roles reversed. Why does the sweeper need to unload nmethods? Why not let the GC do its job? And the answer to that question used to be because the sweeper can unload more efficiently by only walking thread stacks. But after loom, the opposite is true; the GC can now unload more efficiently because if you leave it to do the *entire* job, it can get the whole thing done in a single GC round, instead of multiple GC rounds as is done today when loom is enabled. > Why Loom? Java stacks have always resided on the Java heap. That is wildly incorrect. Really not sure where you heard that. Loom brought stacks to the Java heap and it required an incredible amount of engineering to make that work. > Why does sweeper need liveness analysis? Because it can't delete nmethods that have activation records in the system, for a great number of reasons, including e.g. deoptimization requires information from the nmethod to be able to convert a compiled frame to an interpreter frame, and execution of the nmethod would never be valid because oops and metadata embedded are not kept alive and will crash. So you can't run the nmethod and also can't deopt away from the nmethod. These activation records used to reside only in physical thread stacks, but can now (with loom) also reside in virtual thread stacks, posing as continuations in the Java heap. Said continuations can get mounted whenever, to continue execution. > Can you point JBS records for these issues? Yes, but there have been so many that it is unlikely I will get anywhere close to a complete list. It has been broken essentially every release of the JDK to date. Here are a few: https://bugs.openjdk.org/browse/JDK-8215491 https://bugs.openjdk.org/browse/JDK-8215621 https://bugs.openjdk.org/browse/JDK-8134493 https://bugs.openjdk.org/browse/JDK-8075805 https://bugs.openjdk.org/browse/JDK-8217468 https://bugs.openjdk.org/browse/JDK-8240693 https://bugs.openjdk.org/browse/JDK-8215491 https://bugs.openjdk.org/browse/JDK-8268524 https://bugs.openjdk.org/browse/JDK-8267972 https://bugs.openjdk.org/browse/JDK-8215754 https://bugs.openjdk.org/browse/JDK-8246442 https://bugs.openjdk.org/browse/JDK-8245877 https://bugs.openjdk.org/browse/JDK-8214302 There are many more. There are also many issues that have been solved before popping up as bugs, but still caused significant amounts of headache. > A conservative approach: it is Ok to skip the "dead" method (treat them as the "live" this time) It is what we currently do in lots of places, and it is dangerous (e.g. the nmethod::_method racingly gets cleared, asserts blow up), and there have been a bunch of bugs because of this. People also don't realize this can even happen, and they shouldn't really have to think about it. It's a class of bugs that we can choose to have, that we can choose not to have. > The compiled method has two lives. First, the "age" (number of sweeps) is counsidered. Once compiled, the method happily survives 10 sweeps. In the meantime, the stack scan updates the age of the found methods. > Once the method seems not to appear in a call stacks for too long, it is marked as "dead", after that it goes to zombie, it is evicted and recompiled. > On the second run, the method gets a "hotness" counter, the counter updater is compiled into the method to track the exact number of calls. When the "hotness" counter shows no activity, the method is marked dead again. What you described is not how it works. 1. When nmethods age, they become not_entrant. That does not make them dead (is_alive returns true, and must because they can be on-stack at the point of the transition and potentially stay forever, due to races) 2. When it goes to zombie, it is not evicted and recompiled. It still contains resources that have not yet been unlinked, such as CompiledICHolders for its inline caches. They have to be released one traversal before the nmethod can be recycled, for example due to ABA conflicts where if we free the memory immediately and an adapter blob gets allocated at the same address, then when we free the next nmethod and need to check if it has a CompiledICHolder, it might see that the inline cache points at an adapter blob, which means that there should be a CompiledICHolder, but in fact it's just ABA causing an incorrect decision leading us to free random memory that has never been allocated. And a bunch of other fun issues. > Is it going to be a special GC pass, common for all GC implementations? Yes. In fact, we already have a special GC pass, for all GC implementations, called "code cache unloading", with the explicit purpose of unloading bad nmethods from the code cache. It does a lot of the stuff the sweeper does, but slightly differently. This is part of class unloading, and runs for all full GC events. It just happens to be that today the GC isn't allowed to remove the nmethods, because that would interfere with what the sweeper is doing, which is mostly just getting in the way and asking the GC for help. > WIll GC be triggered by both JavaHeap and CodeCache events? Yes. Similar to how Metaspace unloading (and hence GC) can also be triggered due to class loading events filling up metaspace, CodeCache unloading can also be triggered due to compilation events. So this isn't really something all that new conceptually, just that we start treating the CodeCache like we already treat the Metaspace. > When the JavaHeap is Ok, can GC process the CodeCache only? No. Similar to how a GC needs full Java heap live analysis to know what it is allowed to remove from Metaspace during class unloading, we also (with loom) need full Java heap live analysis to know what we are allowed to unload from the code cache. Importantly, the sweeper is in the same boat after loom; it can't process the CodeCache only without doing a full GC, which already performs code cache unloading as part of class unloading, but isn't allowed to remove them because we really need the sweeper to take over that last thing, with a bunch of complicated state machinery that no longer makes sense. So really I'm mostly removing a bunch of code, and making small adjustments to the GCs to do that extra thing the sweeper did, which comes down to a slightly wider filter for candidates for unloading, based not just on liveness, but a few more things. > What test base would be adequate for this change? We have some good things in mach5, I believe. Hope this makes things clearer.
15-07-2022

I don't understand some points. Let me ask it here. > While it has been going through various revisions, the GCs got support for class unloading, and the need for the GCs to get rid of nmethods with a different unpleasant scent. Why does GC needs to unload nmethods? Why don't mark them as "dead" and let the sweeper to do its job? > With the advent of Loom, the situation has gotten even worse. The stacks are now also in the Java heap. Why Loom? Java stacks have always resided on the Java heap. > The sweeper is unable to throw away nmethods without the liveness analysis of a full GC, > which also performs code cache unloading, but isn't allowed to actually delete nmethods > due to races with the sweeper. > In a way we have the worst of both worlds, where both the sweeper and GC are crippled, unable to unload nmethods without the help of the other. Why does sweeper need liveness analysis? Yes, it is not a good idea to delete nmethods from GC threads with current architecture. > And there are a very large number of complicated races that the JVM needs to deal with, > especially with concurrent code cache unloading not interfering with concurrent sweeping. > And concurrent sweeping not interfering with the application. Can you point JBS records for these issues? > The sweeper cycle exposes 2 flavours of nmethods that are "dead" to the system. > So whenever nmethods are used, we have to know they are not dead. > But we typically don't have the tools to really know they are not dead. > For example, one might think grabbing the CodeCache_lock and using an iterator that only > walks is_alive() nmethods would help make sure you don't get dead nmethods in your iterator. A conservative approach: it is Ok to skip the "dead" method (treat them as the "live" this time) > However, that is not the case, because the CodeCache_lock can't be held across > the entire zombie transition due to "reasons" that are not trivial to actually change. > Because of this, code has to deal with nmethods flipping around randomly to a dead state. This is not clear. The compiled method has two lives. First, the "age" (number of sweeps) is counsidered. Once compiled, the method happily survives 10 sweeps. In the meantime, the stack scan updates the age of the found methods. Once the method seems not to appear in a call stacks for too long, it is marked as "dead", after that it goes to zombie, it is evicted and recompiled. On the second run, the method gets a "hotness" counter, the counter updater is compiled into the method to track the exact number of calls. When the "hotness" counter shows no activity, the method is marked dead again. > I propose to get out of this sad situation, by removing the sweeper. > If we need a full GC anyway to remove nmethods, we might as well let the GC do everything. Is it going to be a special GC pass, common for all GC implementations? WIll GC be triggered by both JavaHeap and CodeCache events? When the JavaHeap is Ok, can GC process the CodeCache only? What test base would be adequate for this change?
15-07-2022

Check out codecache::calculate_cold_gc_count() in the current prototype.
11-07-2022

Thank you for your interest in this, [~simonis]. The idea is indeed to have the GC deal with all policy decisions as well, including when to remove old/cold nmethods. To deal with nmethods getting cold, the idea is to have the nmethod::is_unloading() method also take coldness into account, which is detected in an is_cold() method that calculates if this is cold or not. The idea is that now with loom, we will be running nmethod entry barriers to track what nmethods have been on-stack, with an epoch number. So I can more precisely with the epoching infrastructure I already built for loom, express the coldness without the previous sampling based approach. I can see if an nmethod has not been on-stack for N GC cycles, and then decide it's time to retire. That seems to work pretty well. I haven't written the heuristic yet for calculating a good number for N, but it will be decided heuristically, probably when marking ends, depending on a few different factors (which I am currently figuring out right now what they should be). The main takeaway for your question is that the GC is already sitting on better information regardless what nmethods are not being used, than the sweeper had through sampling, and can make better decisions with that information that we already have in the GC. You can track my progress so far in this branch (which currently passes tier1-3, but is a work in progress): https://github.com/openjdk/jdk/compare/master...fisk:jdk:remove_sweeper Hope this answers your question.
11-07-2022

I think the sweeper also unloads "cold" methods in order to free space in the code cache when it runs full. Do you plan to integrate this functionality into the GC code as well? Or do you want to keep the NMethodSweeper to maintain the hotness counter for nemthods and only move the actual "sweeping" activity from the sweeper into the GC?
11-07-2022