JDK-8326092 : Pause Remark sometimes has extremely long pause times on class unloading
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 21.0.2
  • Priority: P2
  • Status: Closed
  • Resolution: Delivered
  • Submitted: 2024-02-08
  • Updated: 2024-06-07
  • Resolved: 2024-04-04
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 21
21-poolResolved
Related Reports
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Sub Tasks
JDK-8333832 :  
Description
ADDITIONAL SYSTEM INFORMATION :
Running a containerized server application using eclipse-temurin:21.0.1_12-jdk on Google Kubernetes Engine, (16 core epyc-7003-series, 64 GB RAM).

Flags:

-XX:+AlwaysPreTouch -XX:ConcGCThreads=3 -XX:FlightRecorderOptions=stackdepth=256 -XX:G1ConcRefinementThreads=13 -XX:GCDrainStackTargetSize=64 -XX:InitialCodeCacheSize=272629760 -XX:InitialHeapSize=53951594496 -XX:InitialRAMPercentage=80.000000 -XX:MarkStackSize=4194304 -XX:MaxHeapFreeRatio=100 -XX:MaxHeapSize=53951594496 -XX:MaxRAM=67439493120 -XX:MaxRAMPercentage=80.000000 -XX:MinHeapSize=6815736 -XX:NewRatio=1 -XX:ObjectAlignmentInBytes=16 -XX:+PrintCommandLineFlags -XX:+PrintFlagsFinal -XX:ReservedCodeCacheSize=838860800 -XX:+SegmentedCodeCache -XX:+UseCompressedOops -XX:-UseContainerSupport -XX:+UseG1GC -XX:+UseTransparentHugePages 


A DESCRIPTION OF THE PROBLEM :
Within 10 minutes of our application startup, we often encounter very long GC pause times in "Remark". These vary from 400ms to 4 seconds long.

When capturing one such pause in the flight recorder, it shows that almost all of the pause is spent in "Class Unloading".

This long pause tends to be repeated a few times but eventually settles when our application has been running for a while.

As an experiment, I disabled class unloading with "-XX:-ClassUnloading". This completely solved the long pause times, but we instead saw much longer young generation pauses (went from around 160 ms -> 350 ms). The reason for that is not clear to us, but it was notable that "code root" scanning was very unbalanced.

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
We run our server under moderate load and encounter the issue within 10 minutes. Unfortunately there is no small test case to reproduce the issue.


EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Class unloading should not be taking a few seconds.
ACTUAL -
Remark phase that is less than 100 ms?

CUSTOMER SUBMITTED WORKAROUND :
-XX:-ClassUnloading does prevent the long pause in "remark" but seems to have a side effect of causing longer pause times in young generation collection ("evacuate collection set").


FREQUENCY : often



Comments
Delivered as individual backports of changes.
04-04-2024

Evaluating the effort for backporting.
29-02-2024

Additional Information from submitter: ================================= I (original submitter) did some additional investigation using Java 22 in our test environment. I'm happy to report that the long GC remark pause issue is definitely improved in Java 22, most likely due to https://bugs.openjdk.org/browse/JDK-8317809 as noted in the last comment. I did see one slightly long pause with Java 22 which I included below. We can see "Purge Unlinked NMethods" is a little slow. I believe this will be fixed by https://bugs.openjdk.org/browse/JDK-8317007 which is part of Java 23. Even without that additional fix I think the fixes already in Java 22 look promising and hopefully sufficient for us. I should add one caveat that we can't be 100% sure until I've got the fixes into our real production server (we need to wait for Java 22 to reach GA, or hope for a backport to Java 21). This is the debug output for a slightly long GC remark pause with Java 22: [1356.824s][debug][gc,phases,start ] GC(122) Class Unloading [1356.824s][debug][gc,phases,start ] GC(122) ClassLoaderData [1356.825s][debug][gc,phases ] GC(122) ClassLoaderData 0.405ms [1356.825s][debug][gc,phases,start ] GC(122) Trigger cleanups [1356.825s][debug][gc,phases ] GC(122) Trigger cleanups 0.002ms [1356.825s][debug][gc,phases,start ] GC(122) G1 Complete Cleaning [1356.838s][debug][gc,phases ] GC(122) G1 Complete Cleaning 12.775ms [1356.838s][debug][gc,phases,start ] GC(122) Purge Unlinked NMethods [1356.927s][debug][gc,phases ] GC(122) Purge Unlinked NMethods 89.642ms [1356.927s][debug][gc,phases,start ] GC(122) Free Code Blobs [1356.935s][debug][gc,phases ] GC(122) Free Code Blobs 7.329ms [1356.935s][debug][gc,phases,start ] GC(122) Purge Class Loader Data [1356.940s][debug][gc,phases ] GC(122) Purge Class Loader Data 5.455ms [1356.940s][debug][gc,phases ] GC(122) Class Unloading 115.764ms
26-02-2024

The main issue is JDK-8317809, but there are others. The linked JDK-8317007 does impact Remark pause times significantly in such cases, but is not dominating.
20-02-2024

Submitter replied to our 1st email: 1- JDK version: openjdk version "21" 2023-09-19 LTS OpenJDK Runtime Environment Temurin-21+35 (build 21+35-LTS) OpenJDK 64-Bit Server VM Temurin-21+35 (build 21+35-LTS, mixed mode, sharing) 2- jcmd <pid> VM.info and GC log were attached. Also they provided some comments I am sharing here: >Comment on the GC logs >Our server has a fairly long startup time and does some warming of caches and other prep work then starts receiving traffic around 40s into its life. You can search through the logs for "Remark" to note that we have a couple of very long remark pauses, particularly around 731s and 1046s. >Additional information >This issue looks related to https://bugs.java.com/bugdatabase/view_bug?bug_id=8317007 >I noticed that the problem seems to be caused by slow flushing of nmethods (CodeCache::flush_unlinked_nmethods()). I cannot be 100% sure that this is the biggest contributor to the long pause times, but it seems likely given the data I have. From class unloading logs, I could see no classes are being unloaded (so we can rule this out). I collected "code cache stats" with JFR and could see that the "Remark" is cleaning up a large number of nmethods in cases when the pause time is long. For example in one "remark" I checked: >Non-profiled nmethods reduced from 25000 -> 22889 >Profiled nmethods reduced from 36361 -> 25564 >Bug vs enhancement >Although parallelization may greatly help to alleviate the issue (as implemented in JDK-8317007), I am a little unsure if this really explains why the nmethod cleanup is so slow and whether this slowness is expected. Perhaps there is something that can be improved here? >It would be ideal for us if some kind of improvement can be back-ported to Java 21 to reduce the pause times. I don't know if this is also reproducible in older versions (I can try to find out but it will take time). I'm speculating that it may be a regression related to removal of the code sweeper and more work being added to the GC pause than in earlier versions. End of quotation.
19-02-2024

> As an experiment, I disabled class unloading with "-XX:-ClassUnloading". This completely solved the long pause times, but we instead saw much longer young generation pauses (went from around 160 ms -> 350 ms). The reason for that is not clear to us, but it was notable that "code root" scanning was very unbalanced. The likely reason for all these issues is a lot of code cache contents concentrated in a few regions. Processing them then takes a lot of time when evacuated. The code root scan issue in particular has also been improved in JDK 22 with JDK-8315503 for this exact reason.
19-02-2024

Could you try JDK 22? A substantial amount of effort has been going in into fixing class unloading times. Some explanation is given in the "All STW GCs" section at https://tschatzl.github.io/2024/02/06/jdk22-g1-parallel-gc-changes.html
19-02-2024