JDK-8343322 : C2 Memory Leak (starting from 21.0.3)
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 21.0.3,21.0.4,21.0.5,23,24
  • Priority: P3
  • Status: Open
  • Resolution: Unresolved
  • OS: linux
  • CPU: x86
  • Submitted: 2024-10-30
  • Updated: 2024-11-04
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdUnresolved
Related Reports
Relates :  
Description
### Please provide a brief summary of the bug

As of 21.0.3 we are experiencing excessive memory usage (and oom-killer) from our Java Tomcat containers.

This is off-heap memory usage, and we can easily trigger a 10GB mem usage (real mem usage, memstat:total_rss) with a Xmx=1G. From our current testing, the mem usage will go as high as whatever we configure as the docker container memory limit.

From a total_rss usage of 1GB, the mem usage within ~10 seconds goes up to the limit 10GB.

Application memory leaks are not a factor, we have checked heap with jmap, just to make sure.

We have no issues on 21.0.1 or 21.0.2, only starting with 21.0.3. Same issue also present on latest 21.0.5.

We have isolated the memory usage to the C2 Compiler thread, based on oom-killer syslog output, and with Native Memory Tracking.

On a lightly loaded production node we experience this after 1-2 hours, so we can quite easily reproduce the issue.

We are using Tomcat images from https://hub.docker.com/_/tomcat. 

Ideally we should have some code or easy way to reproduce this outside of our application, but our application is quite large, and we don't know exactly what triggers the issue.

Realize it will be hard to track down based on this bug report, but I will post it anyway, in case others experience similar problems and / or there's some additional logs or debug output we could provide.

### Did you test with the latest update version?

- [X] Yes

### Please provide steps to reproduce where possible

- Start Java Tomcat container
- Run our closed source application (this might be a problem, I guess)

### Expected Results

Memory usage similar to 21.0.1 - 21.0.2

### Actual Results

Consumes all available memory up to docker container limit. 

### What Java Version are you using?

openjdk 21.0.5 2024-10-15 LTS OpenJDK Runtime Environment Temurin-21.0.5+11 (build 21.0.5+11-LTS) OpenJDK 64-Bit Server VM Temurin-21.0.5+11 (build 21.0.5+11-LTS, mixed mode, sharing)

### What is your operating system and platform?

Ubuntu 20.04 docker host
Container image based on Ubuntu 22.04. Dockerhub: tomcat:9.0.90-jdk21-temurin-jammy


### How did you install Java?

Docker image from https://hub.docker.com/_/tomcat, which uses eclipse-temurin:21-jdk-jammy

### Did it work before?

```Shell
Works fine (no leak) on the following versions:

openjdk 21.0.1 2023-10-17 LTS
OpenJDK Runtime Environment Temurin-21.0.1+12 (build 21.0.1+12-LTS)
OpenJDK 64-Bit Server VM Temurin-21.0.1+12 (build 21.0.1+12-LTS, mixed mode, sharing)

openjdk 21.0.2 2024-01-16 LTS
OpenJDK Runtime Environment Temurin-21.0.2+13 (build 21.0.2+13-LTS)
OpenJDK 64-Bit Server VM Temurin-21.0.2+13 (build 21.0.2+13-LTS, mixed mode, sharing)
```


### Did you test with other Java versions?

```Shell
Also failing on:

openjdk 21.0.3 2024-10-15 LTS
OpenJDK Runtime Environment Temurin-21.0.5+11 (build 21.0.5+11-LTS)
OpenJDK 64-Bit Server VM Temurin-21.0.5+11 (build 21.0.5+11-LTS, mixed mode, sharing)
```


### Relevant log output

```Shell
OOM killer (syslog)
Oct 27 02:20:14 <hostname> kernel: [12313888.554885] C2 CompilerThre invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0
Oct 27 02:20:14 <hostname> kernel: [12313888.554911]  oom_kill_process.cold+0xb/0x10
Oct 27 02:20:14 <hostname> kernel: [12313888.554975] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
Oct 27 02:20:14 <hostname> kernel: [12313888.554988] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=8846fd420713df8fdbf8d257b6e75a27893a1a339e7458d7f7310c50f2d62b6d,mems_allowed=0-1,oom_memcg=/docker/8846fd420713df8fdbf8d257b6e75a27893a1a339e7458d7f7310c50f2d62b6d,task_memcg=/docker/8846fd420713df8fdbf8d257b6e75a27893a1a339e7458d7f7310c50f2d62b6d,task=java,pid=1235475,uid=1337
Oct 27 02:20:14 <hostname> kernel: [12313888.555474] Memory cgroup out of memory: Killed process 1235475 (java) total-vm:15036484kB, anon-rss:10207028kB, file-rss:39704kB, shmem-rss:0kB, UID:1337 pgtables:22964kB oom_score_adj:0

Java Native Memory Tracking output (as the problem is happening, Compiler memory usage is growing). Running with Xmx=1G
docker@<hostname>(devxx):~$ docker exec -i tomcat-$serviceGroup jcmd 1 VM.native_memory
Native Memory Tracking:

(Omitting categories weighting less than 1KB)

Total: reserved=8444154KB, committed=6461730KB
       malloc: 5747358KB #685549
       mmap:   reserved=2696796KB, committed=714372KB

-                 Java Heap (reserved=1024000KB, committed=464896KB)
                            (mmap: reserved=1024000KB, committed=464896KB) 
 
-                     Class (reserved=1052409KB, committed=17977KB)
                            (classes #20859)
                            (  instance classes #19720, array classes #1139)
                            (malloc=3833KB #78983) (peak=3849KB #79169) 
                            (mmap: reserved=1048576KB, committed=14144KB) 
                            (  Metadata:   )
                            (    reserved=131072KB, committed=102144KB)
                            (    used=100769KB)
                            (    waste=1375KB =1,35%)
                            (  Class space:)
                            (    reserved=1048576KB, committed=14144KB)
                            (    used=12346KB)
                            (    waste=1798KB =12,71%)
 
-                    Thread (reserved=176700KB, committed=21028KB)
                            (thread #172)
                            (stack: reserved=176128KB, committed=20456KB)
                            (malloc=372KB #1049) (peak=494KB #1365) 
                            (arena=200KB #342) (peak=11107KB #356)
 
-                      Code (reserved=253719KB, committed=63699KB)
                            (malloc=6031KB #15848) (peak=6031KB #15809) 
                            (mmap: reserved=247688KB, committed=57668KB) 
 
-                        GC (reserved=61148KB, committed=50292KB)
                            (malloc=8244KB #26589) (peak=8282KB #26796) 
                            (mmap: reserved=52904KB, committed=42048KB) 
 
-                 GCCardSet (reserved=733KB, committed=733KB)
                            (malloc=733KB #2566) (peak=740KB #2566) 
 
-                  Compiler (reserved=5675077KB, committed=5675077KB)
                            (malloc=1348KB #1300) (peak=1682KB #1416) 
                            (arena=5673729KB #9) (at peak)
 
-                  Internal (reserved=1799KB, committed=1799KB)
                            (malloc=1763KB #51500) (peak=1863KB #53933) 
                            (mmap: reserved=36KB, committed=36KB) 
 
-                     Other (reserved=2742KB, committed=2742KB)
                            (malloc=2742KB #213) (peak=2764KB #255) 
 
-                    Symbol (reserved=30217KB, committed=30217KB)
                            (malloc=25127KB #295303) (peak=25128KB #295307) 
                            (arena=5090KB #1) (at peak)
 
-    Native Memory Tracking (reserved=10905KB, committed=10905KB)
                            (malloc=194KB #3433) (peak=203KB #3579) 
                            (tracking overhead=10712KB)
 
-        Shared class space (reserved=16384KB, committed=12972KB, readonly=0KB)
                            (mmap: reserved=16384KB, committed=12972KB) 
 
-               Arena Chunk (reserved=2780KB, committed=2780KB)
                            (malloc=2780KB #177676) (peak=5681830KB #177677) 
 
-                   Tracing (reserved=32KB, committed=32KB)
                            (arena=32KB #1) (at peak)
 
-                    Module (reserved=290KB, committed=290KB)
                            (malloc=290KB #4410) (peak=297KB #4412) 
 
-                 Safepoint (reserved=8KB, committed=8KB)
                            (mmap: reserved=8KB, committed=8KB) 
 
-           Synchronization (reserved=2454KB, committed=2454KB)
                            (malloc=2454KB #24125) (peak=2459KB #24175) 
 
-            Serviceability (reserved=18KB, committed=18KB)
                            (malloc=18KB #37) (peak=66KB #369) 
 
-                 Metaspace (reserved=132724KB, committed=103796KB)
                            (malloc=1652KB #2421) (at peak) 
                            (mmap: reserved=131072KB, committed=102144KB) 
 
-      String Deduplication (reserved=1KB, committed=1KB)
                            (malloc=1KB #8) (at peak) 
 
-           Object Monitors (reserved=14KB, committed=14KB)
                            (malloc=14KB #71) (peak=129KB #637)
```

Manual logging total_rss per second (from cgroup v1 memstat)
Test container Xmx=1G, container limit 2,5GB. oom-killer disabled.

Tue Oct 29 14:00:00 CET 2024: Total RSS: 933.01 MB
Tue Oct 29 14:00:01 CET 2024: Total RSS: 1071.18 MB
------ Problem starts here ------
Tue Oct 29 14:00:02 CET 2024: Total RSS: 1457.25 MB
Tue Oct 29 14:00:03 CET 2024: Total RSS: 1862.92 MB
Tue Oct 29 14:00:04 CET 2024: Total RSS: 2231.92 MB
Tue Oct 29 14:00:05 CET 2024: Total RSS: 2451.58 MB

------ Reached container mem limit 2,5GB ------
JVM freezes, out of memory, and cannot create new threads:
java.io.IOException: Cannot allocate memory
java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached


Tue Oct 29 14:00:06 CET 2024: Total RSS: 2451.58 MB
Tue Oct 29 14:00:08 CET 2024: Total RSS: 2456.09 MB
Tue Oct 29 14:00:09 CET 2024: Total RSS: 2456.09 MB
Tue Oct 29 14:00:10 CET 2024: Total RSS: 2456.09 MB
Tue Oct 29 14:00:11 CET 2024: Total RSS: 2456.09 MB
Tue Oct 29 14:00:12 CET 2024: Total RSS: 2456.09 MB
Tue Oct 29 14:00:13 CET 2024: Total RSS: 2456.09 MB
Tue Oct 29 14:00:14 CET 2024: Total RSS: 2456.09 MB
Tue Oct 29 14:00:15 CET 2024: Total RSS: 2456.09 MB
Tue Oct 29 14:00:16 CET 2024: Total RSS: 2456.09 MB
Tue Oct 29 14:00:17 CET 2024: Total RSS: 2456.09 MB

------------------------

No other native components, just pure Java.

It's a large web application  with many functions, running in Tomcat.

In this particular case however, it's purely MySQL database transactions using JDBC. Fetching data in a ResultSet, and serializing to JSON using gson.

What strikes me as weird is that we don't see anything like this on 21.0.1 / 21.0.2 after countless hours in production across several instances, and we can reproduce this in a few minutes on 21.0.3-21.0.5. Currently only from random clicking around in our application, but perhaps we can isolate it further at some point.

I did manage to get some more logs today when reproducing the issue. Used the following flags:

-Xlog:jit+compilation=debug 
-XX:+UnlockDiagnosticVMOptions 
-XX:LogFile=/var/log/tomcat9/hotspot_pid1.log 
-XX:+LogCompilation 
-XX:+PrintCompilation 
-XX:+PrintCompilation2

Found a compilation task that takes 9995ms , and eats memory at the time I observe memory spikes, seen from outside of the docker container. In this case it consumed about +4GB of RSS memory. Xmx=1G.

**-XX:+PrintCompilation / -XX:+PrintCompilation2**
```
  52631 14809 % !   4       <package.name.redacted>.controller.ActivityController::getActivities @ 3075 (4542 bytes)
  62627 14809 % size: 187248(114816) time: **9995** inlined: 5335 bytes

```
**hotspot_pid1.log (-XX:+LogCompilation)**
```
<nmethod compile_id='14809' compile_kind='osr' compiler='c2' level='4' entry='0x00007f08ee6adca0' size='203192' address='0x00007f08ee6aa610' relocation_offset='344' insts_offset='13968' stub_offset='12878
4' scopes_data_offset='134016' scopes_pcs_offset='184104' dependencies_offset='193432' handler_table_offset='193832' nul_chk_table_offset='202736' oops_offset='132440' metadata_offset='132592' method='no.
<package.name.redacted>.controller.ActivityController getActivities (Lxx/xx/portal/core/model/CorporationGroup;Ljava/lang/String;Lxx/xx/portal/xxxxx/model/ActivityFilter;Lxx/xx/portal/core/model/database/OrderColumn;Lxx/xxx/portal/core/model/database/QueryLimit;Lxx/xxx/portal/core/database/Transaction;)Ljava/util/List;' bytes='4542' count='194' backedge_count='114869' iicount='194' stamp='62,606'/>
  62627 14809 % size: 187248(114816) time: 9995 inlined: 5335 bytes
```

See attached debug log from the compiler thread as well.
Comments
ILW = Increased footprint for C2 compilation leading to OOM ("memory leak" per compilation - memory not freed early), reproducible with customer application, no workaround but increase available memory = MMM = P3
04-11-2024

[~roland] mentioned that it could be a duplicate of JDK-8340824. Checking with the submitter.
04-11-2024

Feedback from the reporter through https://github.com/adoptium/adoptium-support/issues/1190: @karianna, following up with additional information based on upstream questions. I don't have an account at bugs.openjdk.org, so I would appreciate if you could forward this information. Question: Could the reporter try to narrow it down to a specific change by by building the VM himself? Answer: I've built OpenJDK from source and isolated the commit that introduces this behaviour: See: https://bugs.openjdk.org/browse/JDK-8309203 I've built a version of OpenJDK up until the commit before the one above: commit 8fbd10bc5db476abf76a55a456f811132041301f Author: Sergey Bylokhov serb@openjdk.org 2024-01-03 22:00:07 This version does not display any of the mentioned symptoms. I will gladly assist with reproducing the issue, and trying out any patches / changes. We see the same behaviour on brief tests on current versions of JDK 22 and 23 as well. Builds / commits: No issues: 24b3b2b66f0bed3e9812999a5b6d511f31e9ad1f (2023-12-31) No issues: 8fbd10bc5db476abf76a55a456f811132041301f (2024-01-03) Issue observed: 966071a5f10a7a8910b2639e10c961924a07d675 (2024-01-04) <--- Issue first observed at this commit Issue observed: b4c163d07b2db96ce91a0f712ced60ace14dbf63 (2024-01-17) Question: And is this really a memory leak? From the report it sounds like it's rather high memory consumption by a single compilation and the memory is freed right after. Answer: In this instance, the memory is not released for almost two minutes, so it at least looks like a temporary(?) leak ;) It's worth to note that although the pattern we see is the same in test/production, we've seen much higher memory usages in the few instances we've run this in production. Don't know how far it would have gone, but we've hit the container limit after consuming about 9G. Haven't seen that high number in the test environment yet. Logs: The abrupt rise in memory consumption is accompanied by a CPU spike, which stops after a few seconds. At the same time as the memory usage stops rising. Log output from test run from a version of OpenJDK 21.0.3+9 that we've compiled today. Sun Nov 3 15:46:06 CET 2024: Total RSS: 503.45 MB Sun Nov 3 15:46:08 CET 2024: Total RSS: 523.17 MB Sun Nov 3 15:46:09 CET 2024: Total RSS: 803.67 MB Sun Nov 3 15:46:10 CET 2024: Total RSS: 1371.32 MB Sun Nov 3 15:46:11 CET 2024: Total RSS: 1945.73 MB Sun Nov 3 15:46:12 CET 2024: Total RSS: 2519.87 MB Sun Nov 3 15:46:13 CET 2024: Total RSS: 3088.35 MB Sun Nov 3 15:46:14 CET 2024: Total RSS: 3653.48 MB Sun Nov 3 15:46:15 CET 2024: Total RSS: 4113.67 MB Sun Nov 3 15:46:16 CET 2024: Total RSS: 4133.04 MB Sun Nov 3 15:46:17 CET 2024: Total RSS: 4128.20 MB Sun Nov 3 15:46:18 CET 2024: Total RSS: 4107.51 MB Sun Nov 3 15:46:19 CET 2024: Total RSS: 4060.11 MB Sun Nov 3 15:46:20 CET 2024: Total RSS: 4035.06 MB Sun Nov 3 15:46:21 CET 2024: Total RSS: 4035.06 MB Sun Nov 3 15:46:22 CET 2024: Total RSS: 2815.38 MB Sun Nov 3 15:46:23 CET 2024: Total RSS: 2815.51 MB Sun Nov 3 15:46:24 CET 2024: Total RSS: 2815.51 MB Sun Nov 3 15:46:25 CET 2024: Total RSS: 2815.23 MB Sun Nov 3 15:46:26 CET 2024: Total RSS: 2814.51 MB Sun Nov 3 15:46:27 CET 2024: Total RSS: 2814.51 MB Sun Nov 3 15:46:28 CET 2024: Total RSS: 2814.51 MB Sun Nov 3 15:46:29 CET 2024: Total RSS: 2814.51 MB Sun Nov 3 15:46:30 CET 2024: Total RSS: 2793.37 MB Sun Nov 3 15:46:31 CET 2024: Total RSS: 2793.37 MB Sun Nov 3 15:46:32 CET 2024: Total RSS: 2793.37 MB Sun Nov 3 15:46:33 CET 2024: Total RSS: 2793.21 MB Sun Nov 3 15:46:34 CET 2024: Total RSS: 2793.21 MB Sun Nov 3 15:46:35 CET 2024: Total RSS: 2793.21 MB Sun Nov 3 15:46:36 CET 2024: Total RSS: 2793.21 MB Sun Nov 3 15:46:37 CET 2024: Total RSS: 2793.21 MB Sun Nov 3 15:46:38 CET 2024: Total RSS: 2793.33 MB Sun Nov 3 15:46:39 CET 2024: Total RSS: 2793.33 MB Sun Nov 3 15:46:40 CET 2024: Total RSS: 2793.33 MB Sun Nov 3 15:46:41 CET 2024: Total RSS: 2794.50 MB Sun Nov 3 15:46:42 CET 2024: Total RSS: 2794.50 MB .... Sun Nov 3 15:48:00 CET 2024: Total RSS: 520.06 MB
04-11-2024

And is this really a memory leak? From the report it sounds like it's rather high memory consumption by a single compilation and the memory is freed right after.
31-10-2024

[~karianna] Could the reporter try to narrow it down to a specific change by by building the VM himself?
31-10-2024

It could be JDK-8325672. Maybe the reporter could revert that simple change and check if the issue persists.
31-10-2024