JDK-8361752 : Double free in CompileQueue::delete_all after JDK-8357473
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 26
  • Priority: P3
  • Status: Open
  • Resolution: Unresolved
  • Submitted: 2025-07-10
  • Updated: 2025-07-14
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 26
26Unresolved
Related Reports
Causes :  
Relates :  
Relates :  
Sub Tasks
JDK-8362122 :  
Description
Two occurrences of test compiler/debug/TestStressBailout.java

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (c:\sb\prod\1752094221\workspace\open\src\hotspot\share\nmt/mallocHeader.inline.hpp:107), pid=21140, tid=45964
#  fatal error: NMT corruption: Block at 0x000001ebef404580: header canary broken
#
# JRE version: Java(TM) SE Runtime Environment (26.0+6) (fastdebug build 26-ea+6-574)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 26-ea+6-574, compiled mode, sharing, compressed oops, compressed class ptrs, g1 gc, windows-amd64)
# Problematic frame:
# V  [jvm.dll+0xce17ef]  MallocHeader::resolve_checked+0x17f
#
# Core dump will be written. Default location: C:\sb\prod\1752106010\testoutput\test-support\jtreg_open_test_hotspot_jtreg_tier3_compiler\scratch\0\hs_err_pid21140.mdmp
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
#

---------------  S U M M A R Y ------------

Command Line: -XX:MaxRAMPercentage=4.16667 -Dtest.boot.jdk=c:\ade\mesos\work_dir\jib-master\install\jdk\24\36\bundles\windows-x64\jdk-24_windows-x64_bin.zip\jdk-24 -Djava.io.tmpdir=c:\sb\prod\1752106010\testoutput\test-support\jtreg_open_test_hotspot_jtreg_tier3_compiler\tmp -XX:+CreateCoredumpOnCrash -ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -XX:+TieredCompilation -Xcomp -XX:-TieredCompilation -XX:+StressBailout -XX:StressBailoutMean=5 

Host:  AMD EPYC 7J13 64-Core Processor                , 12 cores, 23G,  Windows Server 2019 , 64 bit Build 17763 (10.0.17763.7009)
Time: Thu Jul 10 01:16:41 2025 /GM elapsed time: 0.345587 seconds (0d 0h 0m 0s)

---------------  T H R E A D  ---------------

Current thread (0x000001ebef5ed160):  JavaThread "C2 CompilerThread0" daemon [_thread_in_vm, id=45964, stack(0x0000002f56b00000,0x0000002f56c00000) (1024K)]

Stack: [0x0000002f56b00000,0x0000002f56c00000],  sp=0x0000002f56bfef70,  free space=1019k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [jvm.dll+0xce17ef]  MallocHeader::resolve_checked+0x17f  (mallocHeader.inline.hpp:113)
V  [jvm.dll+0xce1221]  MallocTracker::record_free_block+0x41  (mallocTracker.cpp:208)
V  [jvm.dll+0xddd4c6]  os::free+0x76  (os.cpp:780)
V  [jvm.dll+0x5da78a]  CompileQueue::delete_all+0x10a  (compileBroker.cpp:373)
V  [jvm.dll+0x5e04c0]  CompileBroker::shutdown_compiler_runtime+0x70  (compileBroker.cpp:1835)
V  [jvm.dll+0x5dbd36]  CompileBroker::init_compiler_runtime+0x196  (compileBroker.cpp:1786)
V  [jvm.dll+0x5d9f05]  CompileBroker::compiler_thread_loop+0x125  (compileBroker.cpp:1920)
V  [jvm.dll+0x930548]  JavaThread::thread_main_inner+0x288  (javaThread.cpp:774)
V  [jvm.dll+0x1049942]  Thread::call_run+0x1b2  (thread.cpp:248)
V  [jvm.dll+0xdf0cb1]  thread_native_entry+0xe1  (os_windows.cpp:562)
C  [ucrtbase.dll+0x2268a]  (no source info available)
C  [KERNEL32.DLL+0x17ac4]  (no source info available)
C  [ntdll.dll+0x5a8c1]  (no source info available)

Comments
A pull request was submitted for review. Branch: master URL: https://git.openjdk.org/jdk/pull/26294 Date: 2025-07-14 13:57:09 +0000
14-07-2025

Ah, hold on a second. So the coordination is basically: CompileBroker::wait_for_completion: { MonitorLocker ml(thread, CompileTaskWait_lock); free_task = true; task->inc_waiting_for_completion(); while (!task->is_complete() && !is_compilation_disabled_forever()) { ml.wait(); } task->dec_waiting_for_completion(); } if (free_task) { delete task; } CompileQueue::delete_all: { MutexLocker ct_lock(CompileTaskWait_lock); if (current->waiting_for_completion_count() > 0) { CompileTaskWait_lock->notify_all(); found_waiter = true; } } if (!found_waiter) { delete current; } AFAICS, the coordination is still racy, as this sequence of events is possible: 1. Thread A comes into CompileBroker::wait_for_completion 2. Thread B comes into CompileQueue::delete_all 3. Thread A acquires the CTW_lock, records free_task=true, incs the waiting_for_completion count, successfully waits for compilation, decs the waiting_for_completion count, releases the CTW_lock, exits. Since free_task == true, it proceeds to delete task. 4. Thread B acquires the CTW_lock, looks at waiting_for_completion count, discovers it is zero, releases the CTW_lock, proceeds to delete task. So end result we do double free. There is a timing window when Thread A had not yet tried to wait or already exited to wait, which Thread B completely misses and assumes it is a single owner of the task. It was "fine" before JDK-8357473, because the free-listing code used to check for `is_free`, which fail-safed this condition. Now we just call plain `delete`, and things go off the rails.
14-07-2025

What's worse, this coordination assumes that one thread can proceed to delete the task, while other threads might still be accessing it. Dang. Let me see if there is an easy way out of this. Otherwise, we need to backout JDK-8357473 and reconsider CompileTask lifecycle.
14-07-2025

Weird. I see no clear path to failure yet. Surely this part of the stack means the compiler runtime was shutdown immediately at the attempt to initialize, i.e. generated OptoRuntime blobs: V [jvm.dll+0x5e04c0] CompileBroker::shutdown_compiler_runtime+0x70 (compileBroker.cpp:1835) V [jvm.dll+0x5dbd36] CompileBroker::init_compiler_runtime+0x196 (compileBroker.cpp:1786)
14-07-2025

Maybe it's more likely on Windows because 'os::random' behaves differently? You could try with different values for 'os::_rand_seed'. I would also try with different values of '-XX:StressBailoutMean='.
14-07-2025

It happened 48 times in our CI, with all kinds of different arguments, only once on Linux x64, twice on Linux AArch64 and the rest on Windows.
14-07-2025

I cannot reproduce it locally on my Linux AArch64 or x86_64 servers. Without reproducer, I cannot really take a good look. Are there any extra test options involved? $ CONF=linux-x86_64-server-fastdebug make images test TEST=compiler/debug/TestStressBailout.java JTREG=REPEAT_COUNT=100
14-07-2025

ILW = Memory corruption during compiler shutdown (double free?), single test with -XX:+StressBailout, no known workaround but avoid compiler thread shutdown = HLM = P3
14-07-2025

FTR, we had a race condition in this code triggered by `-XX:+StressBailout` before, see JDK-8343938.
14-07-2025

Aleksey, would you have time to take a look at this?
14-07-2025

JDK-8357473 potentially triggers this. Let's problem list the test.
14-07-2025

Okay, thanks for testing David! Let's re-open this bug then.
14-07-2025

[~thartmann] I ran testing on the fix for 8360048 and I still see the crashes in compiler/debug/TestStressBailout.java.
14-07-2025

I think I have an easy fix, testing it now.
14-07-2025

Tentatively closing this as duplicate of JDK-8360048.
10-07-2025

[~thartmann] I do not see any such linkages in JDK-8360048?? As the stack was quite different here I did not assume it was a duplicate - just another case of NMT detecting memory corruption. But if the bug is actually in NMT's memory accounting ...
10-07-2025

[~dholmes] This looks like a duplicate of JDK-8360048 to me ([~dlong] already linked some of the earlier failures of the TestStressBailout.java test to JDK-8360048).
10-07-2025

[~dholmes] I meant linkages in our CI. And yes, looks like an issue in NMT to me.
10-07-2025