Bug ID: JDK-8326615 C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime

JDK-8326615 : C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash)

Type: Bug
Component: hotspot
Sub-Component: compiler
Affected Version: 7u351,8u341,11,17,21,22,23,24

Priority: P4
Status: Closed
Resolution: Fixed
OS: linux
CPU: x86_64

Submitted: 2024-02-24
Updated: 2025-01-10
Resolved: 2024-09-03

Versions (Unresolved/Resolved/Fixed)

The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.

JDK 24
24 b14Fixed

Related Reports

Cloners :	JDK-8347406 - [REDO] C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash)
Relates :	JDK-8326731 - Problem-list compiler/startup/StartupCode
Relates :	JDK-8339700 - Test compiler/startup/StartupOutput.java intermittent fatal error: Initial size of CodeCache is too small
Relates :	JDK-8339542 - compiler/codecache/CheckSegmentedCodeCache.java fails
Relates :	JDK-8326376 - java -version failed with CONF=fastdebug -XX:InitialCodeCacheSize=1024K -XX:ReservedCodeCacheSize=1200k

Sub Tasks

JDK-8326636 :	Problem list StartupOutput.java due to 8326615 - Resolved
JDK-8326731 :	Problem-list compiler/startup/StartupCode - Closed

Description

compiler/startup/StartupOutput.java intermittently Internal Error (codeBlob.cpp:429) Initial size of CodeCache is too small

test command:
export test=test/hotspot/jtreg/compiler/startup/StartupOutput.java
function runJtreg() { jtreg -ea -esa -timeoutFactor:4 -v:fail,error,time,nopass -nr -w $dir/index-$1 $test &> $dir/$1.log ; if [[ 0 -ne $? ]] ; then echo -n "$1 " ; else rm -rf $dir/index-$1 $dir/$1.log ; fi ; } ; export -f runJtreg ; export dir="tmp-jtreg-"`basename ${test##* } .java | sed "s|#|_|"` ; rm -rf $dir ; mkdir -p $dir ; time seq 1000 | xargs -i -n 1 -P `nproc` bash -c "runJtreg {}" ; echo total fail number: `ls $dir/*.log 2> /dev/null | wc | awk '{print $1}'`

result:
STDERR:
 stdout: [[0.043s][warning][codecache] CodeCache is full. Compiler has been disabled.
[0.043s][warning][codecache] Try increasing the code cache size using -XX:ReservedCodeCacheSize=
CodeCache: size=1200Kb used=1199Kb max_used=1199Kb free=0Kb
 bounds [0x00007f2ca0687000, 0x00007f2ca07b3000, 0x00007f2ca07b3000]
 total_blobs=285, nmethods=0, adapters=199, full_count=1
Compilation: disabled (not enough contiguous free space left), stopped_count=1, restarted_count=0
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (codeBlob.cpp:429), pid=155190, tid=155378
#  fatal error: Initial size of CodeCache is too small
#
# JRE version: OpenJDK Runtime Environment (23.0) (build 23)
# Java VM: OpenJDK 64-Bit Server VM (23, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0x6495cc]  RuntimeStub::new_runtime_stub(char const*, CodeBuffer*, int, int, OopMapSet*, bool, bool)+0x16c
#
# Core dump will be written. Default location: /var/tmp/tone/run/jtreg/jt-work/index-14/compiler/startup/StartupOutput/core.155190
#
# An error report file with more information is saved as:
# /var/tmp/tone/run/jtreg/jt-work/index-14/compiler/startup/StartupOutput/hs_err_pid155190.log

[error occurred during error reporting (), id 0xb, SIGSEGV (0xb) at pc=0x00007f2ca105c25a]
#
# If you would like to submit a bug report, please visit:
#   mailto:yansendao.ysd@alibaba-inc.com
#
];
 stderr: [OpenJDK 64-Bit Server VM warning: CodeCache is full. Compiler has been disabled.
OpenJDK 64-Bit Server VM warning: Try increasing the code cache size using -XX:ReservedCodeCacheSize=
]
 exitValue = 134


Recurrence probability: 40/1000


cpu and enviroment infomation:
# uname -a ; cat /etc/os-release ; free -h ; lscpu | head -n 25 ; java -version ; java -Xinternalversion
Linux iZbp15rwnojzp4ihzwephwZ 5.10.134-16.1.al8.x86_64 #1 SMP Thu Dec 7 14:11:24 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
NAME="Alibaba Cloud Linux"
VERSION="3 (Soaring Falcon)"
ID="alinux"
ID_LIKE="rhel fedora centos anolis"
VERSION_ID="3"
UPDATE_ID="9"
PLATFORM_ID="platform:al8"
PRETTY_NAME="Alibaba Cloud Linux 3 (Soaring Falcon)"
ANSI_COLOR="0;31"
HOME_URL="https://www.aliyun.com/"

              total        used        free      shared  buff/cache   available
Mem:           60Gi       806Mi       3.2Gi       114Mi        56Gi        58Gi
Swap:            0B          0B          0B
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              32
On-line CPU(s) list: 0-31
Thread(s) per core:  2
Core(s) per socket:  16
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
BIOS Vendor ID:      Alibaba Cloud
CPU family:          6
Model:               106
Model name:          Intel(R) Xeon(R) Platinum 8369B CPU @ 2.70GHz
BIOS Model name:     pc-i440fx-2.1
Stepping:            6
CPU MHz:             3518.694
BogoMIPS:            5399.99
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           48K
L1i cache:           32K
L2 cache:            1280K
L3 cache:            49152K
NUMA node0 CPU(s):   0-31
openjdk version "23" 2024-09-17
OpenJDK Runtime Environment (build 23)
OpenJDK 64-Bit Server VM (build 23, mixed mode, sharing)
OpenJDK 64-Bit Server VM (23) for linux-amd64 JRE (23), built on 2024-02-24T04:59:15Z by "root" with gcc 9.3.1 20200408 (Red Hat 9.3.1-2)

Comments

-XX:NonNMethodCodeHeapSize=5M is a bit too small for the (fast)debug build. Using 6M seems to help. (Added this comment to the new issue, too.)
04-09-2024
Yes, it almost certainly is. I created a new bug report for it JDK-8339542.
04-09-2024
We see now errors in the test compiler/codecache/CheckSegmentedCodeCache.java on platform linuxppc64le , fastdebug . Is this related ? stdout: [Error occurred during initialization of VM Not enough space in non-nmethod code heap to run VM: 5120K < 5226K ]; stderr: [] exitValue = 1 java.lang.RuntimeException: 'Invalid code heap sizes' missing from stdout/stderr at jdk.test.lib.process.OutputAnalyzer.shouldContain(OutputAnalyzer.java:252) at compiler.codecache.CheckSegmentedCodeCache.failsWith(CheckSegmentedCodeCache.java:81) at compiler.codecache.CheckSegmentedCodeCache.main(CheckSegmentedCodeCache.java:186) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) at java.base/java.lang.reflect.Method.invoke(Method.java:573) at com.sun.javatest.regtest.agent.MainWrapper$MainTask.run(MainWrapper.java:138) at java.base/java.lang.Thread.run(Thread.java:1575)
04-09-2024
Changeset: 633fad8e Branch: master Author: Damon Fenacci <dfenacci@openjdk.org> Date: 2024-09-03 09:45:43 +0000 URL: https://git.openjdk.org/jdk/commit/633fad8e53109bef52190494a8b171035229d2ac
03-09-2024
A pull request was submitted for review. Branch: master URL: https://git.openjdk.org/jdk/pull/19280 Date: 2024-05-17 09:37:01 +0000
09-07-2024
Deferring to JDK 24 because fix is not ready in time due to ongoing discussions.
05-06-2024
As [~stuefe] mentioned, the origin of the problem seems indeed to be that C1 does not handle allocation failures from code heap during initialization properly. This specific issue happens while initializing C1 but a similar issue can happen when initializing C2 as well (stacktrace below) https://github.com/openjdk/jdk/pull/15970 introduced a new argument to determine if RuntimeStub::new_runtime_stub fails fatally or not. We can probably take advantage of it for this issue as well. — C2 failure stack trace: V [libjvm.so+0x60a154] RuntimeStub::new_runtime_stub(char const, CodeBuffer, short, int, OopMapSet, bool, bool)+0x294 (codeBlob.cpp:413) V [libjvm.so+0xcdd474] PhaseOutput::install()+0xc4 (output.cpp:3462) V [libjvm.so+0x63e316] Compile::Code_Gen()+0x676 (compile.cpp:3032) V [libjvm.so+0x63ec16] Compile::Compile(ciEnv, TypeFunc const* ()(), unsigned char, char const, int, bool, bool, DirectiveSet)+0x846 (compile.cpp:992) V [libjvm.so+0xd6f306] OptoRuntime::generate_stub(ciEnv, TypeFunc const ()(), unsigned char, char const, int, bool, bool)+0xe6 (runtime.cpp:185) V [libjvm.so+0xd6f5f0] OptoRuntime::generate(ciEnv)+0x270 (runtime.cpp:157) V [libjvm.so+0x569fad] C2Compiler::initialize()+0xcd (c2compiler.cpp:99) V [libjvm.so+0x643f3c] CompileBroker::init_compiler_runtime()+0xcc (compileBroker.cpp:1771) V [libjvm.so+0x64a241] CompileBroker::compiler_thread_loop()+0x111 (compileBroker.cpp:1913) V [libjvm.so+0x8eaa08] JavaThread::thread_main_inner() [clone .part.0]+0xb8 (javaThread.cpp:759) V [libjvm.so+0xeace6f] Thread::call_run()+0x9f (thread.cpp:225) V [libjvm.so+0xcc7835] thread_native_entry(Thread*)+0xd5 (os_linux.cpp:846)
16-05-2024
No problem, thanks for confirming. [~dfenacci] will look into this.
27-02-2024
No time, sorry
27-02-2024
[~stuefe] Just double-checking, you are not planning to look into this, right?
27-02-2024
I can reproduce this with a JDK 23 release build an rr's chaos mode: rr record -h jdks/jdk-23/bin/java -XX:InitialCodeCacheSize=1024K -XX:ReservedCodeCacheSize=1200k -XX:+UseG1GC rr: Saving execution to trace directory `/home/tobias/.local/share/rr/java-97'. [1,410s][warning][codecache] CodeCache is full. Compiler has been disabled. [1,410s][warning][codecache] Try increasing the code cache size using -XX:ReservedCodeCacheSize= Java HotSpot(TM) 64-Bit Server VM warning: CodeCache is full. Compiler has been disabled. Java HotSpot(TM) 64-Bit Server VM warning: Try increasing the code cache size using -XX:ReservedCodeCacheSize= CodeCache: size=1200Kb used=1199Kb max_used=1199Kb free=0Kb bounds [0x00000c2e66582000, 0x00000c2e666ae000, 0x00000c2e666ae000] total_blobs=335, nmethods=6, adapters=241, full_count=1 Compilation: disabled (not enough contiguous free space left), stopped_count=1, restarted_count=0 # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (codeBlob.cpp:429), pid=1017582, tid=1017598 # fatal error: Initial size of CodeCache is too small # # JRE version: Java(TM) SE Runtime Environment (23.0+12) (build 23-ea+12-854) # Java VM: Java HotSpot(TM) 64-Bit Server VM (23-ea+12-854, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0x617ec4] RuntimeStub::new_runtime_stub(char const, CodeBuffer, int, int, OopMapSet*, bool, bool)+0x244
27-02-2024
A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/18020 Date: 2024-02-27 05:46:06 +0000
27-02-2024
Thank you [~stuefe] for looking on it. I agree with problem list the test for now to reduce noice in testing. [~thartmann] can you re-assign it to someone in our team? Thanks! [~syan] as Thomas pointed, it is not related to JDK-8326376. It is pre-existing issue. It could be the order in which C1 and C2 runtime stubs are created. I see the presence of C2 thread: 0x00007f2c9c181660 JavaThread "C2 CompilerThread0" daemon [_thread_blocked, id=155377, stack(0x00007f2c6f2c4000,0x00007f2c6f3c4000) (1024K)] =>0x00007f2c9c182d80 JavaThread "C1 CompilerThread0" daemon [_thread_in_vm, id=155378, stack(0x00007f2c6f1c3000,0x00007f2c6f2c3000) (1024K)] If OS's schedular run C2 thread first, it will create its runtime stubs first filling CodeCache and not leaving space for C1 stubs.
26-02-2024
Assuming this is not a regression. ILW = Fatal error because code cache is full (should be handled gracefully with VM error), intermittent with extreme flag values, no workaround but increase code cache size = MLH = P4
26-02-2024
Problem predates JDK-8326376. JDK-8326376 brought the test that triggers that bug. The problem is that C1 does not handle allocation failures from code heap during initialization properly. It fatal()s out depending on `StubAssembler::must_gc_arguments()`: ``` RuntimeStub::new_runtime_stub( ... if (!alloc_fail_is_fatal) { return nullptr; } fatal("Initial size of CodeCache is too small"); ``` This looks like a bit more work, and I don't have that time at the moment, sorry. I also cannot reproduce it. I assume stub generation at Alibaba is different from my machine. Unless someone else wants a quick stab at it, I can disable the test for now (possibly factor it out into an own test, then problem-list it). [~kvn] what do you think?
26-02-2024
After JDK-8326376 has been fixed, this issue still reproducible on specific environment.
26-02-2024
[~stuefe] Please look. This seems like missing check in C1.
24-02-2024