### Please provide a brief summary of the bug
As of 21.0.3 we are experiencing excessive memory usage (and oom-killer) from our Java Tomcat containers.
This is off-heap memory usage, and we can easily trigger a 10GB mem usage (real mem usage, memstat:total_rss) with a Xmx=1G. From our current testing, the mem usage will go as high as whatever we configure as the docker container memory limit.
From a total_rss usage of 1GB, the mem usage within ~10 seconds goes up to the limit 10GB.
Application memory leaks are not a factor, we have checked heap with jmap, just to make sure.
We have no issues on 21.0.1 or 21.0.2, only starting with 21.0.3. Same issue also present on latest 21.0.5.
We have isolated the memory usage to the C2 Compiler thread, based on oom-killer syslog output, and with Native Memory Tracking.
On a lightly loaded production node we experience this after 1-2 hours, so we can quite easily reproduce the issue.
We are using Tomcat images from https://hub.docker.com/_/tomcat.
Ideally we should have some code or easy way to reproduce this outside of our application, but our application is quite large, and we don't know exactly what triggers the issue.
Realize it will be hard to track down based on this bug report, but I will post it anyway, in case others experience similar problems and / or there's some additional logs or debug output we could provide.
### Did you test with the latest update version?
- [X] Yes
### Please provide steps to reproduce where possible
- Start Java Tomcat container
- Run our closed source application (this might be a problem, I guess)
### Expected Results
Memory usage similar to 21.0.1 - 21.0.2
### Actual Results
Consumes all available memory up to docker container limit.
### What Java Version are you using?
openjdk 21.0.5 2024-10-15 LTS OpenJDK Runtime Environment Temurin-21.0.5+11 (build 21.0.5+11-LTS) OpenJDK 64-Bit Server VM Temurin-21.0.5+11 (build 21.0.5+11-LTS, mixed mode, sharing)
### What is your operating system and platform?
Ubuntu 20.04 docker host
Container image based on Ubuntu 22.04. Dockerhub: tomcat:9.0.90-jdk21-temurin-jammy
### How did you install Java?
Docker image from https://hub.docker.com/_/tomcat, which uses eclipse-temurin:21-jdk-jammy
### Did it work before?
```Shell
Works fine (no leak) on the following versions:
openjdk 21.0.1 2023-10-17 LTS
OpenJDK Runtime Environment Temurin-21.0.1+12 (build 21.0.1+12-LTS)
OpenJDK 64-Bit Server VM Temurin-21.0.1+12 (build 21.0.1+12-LTS, mixed mode, sharing)
openjdk 21.0.2 2024-01-16 LTS
OpenJDK Runtime Environment Temurin-21.0.2+13 (build 21.0.2+13-LTS)
OpenJDK 64-Bit Server VM Temurin-21.0.2+13 (build 21.0.2+13-LTS, mixed mode, sharing)
```
### Did you test with other Java versions?
```Shell
Also failing on:
openjdk 21.0.3 2024-10-15 LTS
OpenJDK Runtime Environment Temurin-21.0.5+11 (build 21.0.5+11-LTS)
OpenJDK 64-Bit Server VM Temurin-21.0.5+11 (build 21.0.5+11-LTS, mixed mode, sharing)
```
### Relevant log output
```Shell
OOM killer (syslog)
Oct 27 02:20:14 <hostname> kernel: [12313888.554885] C2 CompilerThre invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0
Oct 27 02:20:14 <hostname> kernel: [12313888.554911] oom_kill_process.cold+0xb/0x10
Oct 27 02:20:14 <hostname> kernel: [12313888.554975] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
Oct 27 02:20:14 <hostname> kernel: [12313888.554988] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=8846fd420713df8fdbf8d257b6e75a27893a1a339e7458d7f7310c50f2d62b6d,mems_allowed=0-1,oom_memcg=/docker/8846fd420713df8fdbf8d257b6e75a27893a1a339e7458d7f7310c50f2d62b6d,task_memcg=/docker/8846fd420713df8fdbf8d257b6e75a27893a1a339e7458d7f7310c50f2d62b6d,task=java,pid=1235475,uid=1337
Oct 27 02:20:14 <hostname> kernel: [12313888.555474] Memory cgroup out of memory: Killed process 1235475 (java) total-vm:15036484kB, anon-rss:10207028kB, file-rss:39704kB, shmem-rss:0kB, UID:1337 pgtables:22964kB oom_score_adj:0
Java Native Memory Tracking output (as the problem is happening, Compiler memory usage is growing). Running with Xmx=1G
docker@<hostname>(devxx):~$ docker exec -i tomcat-$serviceGroup jcmd 1 VM.native_memory
Native Memory Tracking:
(Omitting categories weighting less than 1KB)
Total: reserved=8444154KB, committed=6461730KB
malloc: 5747358KB #685549
mmap: reserved=2696796KB, committed=714372KB
- Java Heap (reserved=1024000KB, committed=464896KB)
(mmap: reserved=1024000KB, committed=464896KB)
- Class (reserved=1052409KB, committed=17977KB)
(classes #20859)
( instance classes #19720, array classes #1139)
(malloc=3833KB #78983) (peak=3849KB #79169)
(mmap: reserved=1048576KB, committed=14144KB)
( Metadata: )
( reserved=131072KB, committed=102144KB)
( used=100769KB)
( waste=1375KB =1,35%)
( Class space:)
( reserved=1048576KB, committed=14144KB)
( used=12346KB)
( waste=1798KB =12,71%)
- Thread (reserved=176700KB, committed=21028KB)
(thread #172)
(stack: reserved=176128KB, committed=20456KB)
(malloc=372KB #1049) (peak=494KB #1365)
(arena=200KB #342) (peak=11107KB #356)
- Code (reserved=253719KB, committed=63699KB)
(malloc=6031KB #15848) (peak=6031KB #15809)
(mmap: reserved=247688KB, committed=57668KB)
- GC (reserved=61148KB, committed=50292KB)
(malloc=8244KB #26589) (peak=8282KB #26796)
(mmap: reserved=52904KB, committed=42048KB)
- GCCardSet (reserved=733KB, committed=733KB)
(malloc=733KB #2566) (peak=740KB #2566)
- Compiler (reserved=5675077KB, committed=5675077KB)
(malloc=1348KB #1300) (peak=1682KB #1416)
(arena=5673729KB #9) (at peak)
- Internal (reserved=1799KB, committed=1799KB)
(malloc=1763KB #51500) (peak=1863KB #53933)
(mmap: reserved=36KB, committed=36KB)
- Other (reserved=2742KB, committed=2742KB)
(malloc=2742KB #213) (peak=2764KB #255)
- Symbol (reserved=30217KB, committed=30217KB)
(malloc=25127KB #295303) (peak=25128KB #295307)
(arena=5090KB #1) (at peak)
- Native Memory Tracking (reserved=10905KB, committed=10905KB)
(malloc=194KB #3433) (peak=203KB #3579)
(tracking overhead=10712KB)
- Shared class space (reserved=16384KB, committed=12972KB, readonly=0KB)
(mmap: reserved=16384KB, committed=12972KB)
- Arena Chunk (reserved=2780KB, committed=2780KB)
(malloc=2780KB #177676) (peak=5681830KB #177677)
- Tracing (reserved=32KB, committed=32KB)
(arena=32KB #1) (at peak)
- Module (reserved=290KB, committed=290KB)
(malloc=290KB #4410) (peak=297KB #4412)
- Safepoint (reserved=8KB, committed=8KB)
(mmap: reserved=8KB, committed=8KB)
- Synchronization (reserved=2454KB, committed=2454KB)
(malloc=2454KB #24125) (peak=2459KB #24175)
- Serviceability (reserved=18KB, committed=18KB)
(malloc=18KB #37) (peak=66KB #369)
- Metaspace (reserved=132724KB, committed=103796KB)
(malloc=1652KB #2421) (at peak)
(mmap: reserved=131072KB, committed=102144KB)
- String Deduplication (reserved=1KB, committed=1KB)
(malloc=1KB #8) (at peak)
- Object Monitors (reserved=14KB, committed=14KB)
(malloc=14KB #71) (peak=129KB #637)
```
Manual logging total_rss per second (from cgroup v1 memstat)
Test container Xmx=1G, container limit 2,5GB. oom-killer disabled.
Tue Oct 29 14:00:00 CET 2024: Total RSS: 933.01 MB
Tue Oct 29 14:00:01 CET 2024: Total RSS: 1071.18 MB
------ Problem starts here ------
Tue Oct 29 14:00:02 CET 2024: Total RSS: 1457.25 MB
Tue Oct 29 14:00:03 CET 2024: Total RSS: 1862.92 MB
Tue Oct 29 14:00:04 CET 2024: Total RSS: 2231.92 MB
Tue Oct 29 14:00:05 CET 2024: Total RSS: 2451.58 MB
------ Reached container mem limit 2,5GB ------
JVM freezes, out of memory, and cannot create new threads:
java.io.IOException: Cannot allocate memory
java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached
Tue Oct 29 14:00:06 CET 2024: Total RSS: 2451.58 MB
Tue Oct 29 14:00:08 CET 2024: Total RSS: 2456.09 MB
Tue Oct 29 14:00:09 CET 2024: Total RSS: 2456.09 MB
Tue Oct 29 14:00:10 CET 2024: Total RSS: 2456.09 MB
Tue Oct 29 14:00:11 CET 2024: Total RSS: 2456.09 MB
Tue Oct 29 14:00:12 CET 2024: Total RSS: 2456.09 MB
Tue Oct 29 14:00:13 CET 2024: Total RSS: 2456.09 MB
Tue Oct 29 14:00:14 CET 2024: Total RSS: 2456.09 MB
Tue Oct 29 14:00:15 CET 2024: Total RSS: 2456.09 MB
Tue Oct 29 14:00:16 CET 2024: Total RSS: 2456.09 MB
Tue Oct 29 14:00:17 CET 2024: Total RSS: 2456.09 MB
------------------------
No other native components, just pure Java.
It's a large web application with many functions, running in Tomcat.
In this particular case however, it's purely MySQL database transactions using JDBC. Fetching data in a ResultSet, and serializing to JSON using gson.
What strikes me as weird is that we don't see anything like this on 21.0.1 / 21.0.2 after countless hours in production across several instances, and we can reproduce this in a few minutes on 21.0.3-21.0.5. Currently only from random clicking around in our application, but perhaps we can isolate it further at some point.
I did manage to get some more logs today when reproducing the issue. Used the following flags:
-Xlog:jit+compilation=debug
-XX:+UnlockDiagnosticVMOptions
-XX:LogFile=/var/log/tomcat9/hotspot_pid1.log
-XX:+LogCompilation
-XX:+PrintCompilation
-XX:+PrintCompilation2
Found a compilation task that takes 9995ms , and eats memory at the time I observe memory spikes, seen from outside of the docker container. In this case it consumed about +4GB of RSS memory. Xmx=1G.
**-XX:+PrintCompilation / -XX:+PrintCompilation2**
```
52631 14809 % ! 4 <package.name.redacted>.controller.ActivityController::getActivities @ 3075 (4542 bytes)
62627 14809 % size: 187248(114816) time: **9995** inlined: 5335 bytes
```
**hotspot_pid1.log (-XX:+LogCompilation)**
```
<nmethod compile_id='14809' compile_kind='osr' compiler='c2' level='4' entry='0x00007f08ee6adca0' size='203192' address='0x00007f08ee6aa610' relocation_offset='344' insts_offset='13968' stub_offset='12878
4' scopes_data_offset='134016' scopes_pcs_offset='184104' dependencies_offset='193432' handler_table_offset='193832' nul_chk_table_offset='202736' oops_offset='132440' metadata_offset='132592' method='no.
<package.name.redacted>.controller.ActivityController getActivities (Lxx/xx/portal/core/model/CorporationGroup;Ljava/lang/String;Lxx/xx/portal/xxxxx/model/ActivityFilter;Lxx/xx/portal/core/model/database/OrderColumn;Lxx/xxx/portal/core/model/database/QueryLimit;Lxx/xxx/portal/core/database/Transaction;)Ljava/util/List;' bytes='4542' count='194' backedge_count='114869' iicount='194' stamp='62,606'/>
62627 14809 % size: 187248(114816) time: 9995 inlined: 5335 bytes
```
See attached debug log from the compiler thread as well.