JDK-8336095 : Use-after-free in Superword leads to memory corruption
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 24
  • Priority: P2
  • Status: Resolved
  • Resolution: Fixed
  • OS: windows
  • CPU: x86_64
  • Submitted: 2024-07-10
  • Updated: 2024-10-04
  • Resolved: 2024-07-29
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 23 JDK 24
23Resolved 24 b09Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Description
Since ~ middle of (or shortly before middle of) June we see various C1 and C2 compiler crashes on windows x86_64.
These have been observed with both opt and fastdebug binaries, when running an internal test suite.
The crashes occured on both test servers (with Windows server OS) and also on Win11 based notebook.
Unfortunately we cannot easily reproduce them with externally available tests.
The stacks in hserr files differ a bit. Some examples : 

1)

#  Internal Error (utilities/growableArray.hpp:256), pid=7436, tid=18160
#  Error: ShouldNotReachHere()
#

---------------  T H R E A D  ---------------

Current thread (0x000001c7c11059c0):  JavaThread "C2 CompilerThread0" daemon [_thread_in_native, id=18160, stack(0x0000009eba000000,0x0000009eba100000) (1024K)]


Current CompileTask:
C2:14090 13241       4       sun.font.HBShaper::store_layout_results (440 bytes)

Stack: [0x0000009eba000000,0x0000009eba100000]
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [jvm.dll+0x6fce59]  os::win32::platform_print_native_stack+0xd9  (os_windows_x86.cpp:235)
V  [jvm.dll+0x8eda25]  VMError::report+0xd95  (vmError.cpp:1011)
V  [jvm.dll+0x8efcdd]  VMError::report_and_die+0x5fd  (vmError.cpp:1846)
V  [jvm.dll+0x8f0347]  VMError::report_and_die+0x47  (vmError.cpp:1611)
V  [jvm.dll+0x28d587]  report_vm_error+0x57  (debug.cpp:193)
V  [jvm.dll+0x28d5ac]  report_vm_error+0x1c  (debug.cpp:149)
V  [jvm.dll+0x28d500]  report_should_not_reach_here+0x10  (debug.cpp:240)
V  [jvm.dll+0x260b57]  Compile::remove_useless_node+0x3a7  (compile.cpp:403)
V  [jvm.dll+0x724606]  PhaseIterGVN::remove_globally_dead_node+0x366  (phaseX.cpp:1304)
V  [jvm.dll+0x724f0a]  PhaseIterGVN::subsume_node+0x2ca  (phaseX.cpp:1430)
V  [jvm.dll+0x7258fd]  PhaseIterGVN::transform_old+0x1bd  (phaseX.cpp:1284)
V  [jvm.dll+0x723d12]  PhaseIterGVN::optimize+0x182  (phaseX.cpp:1048)
V  [jvm.dll+0x2576b1]  Compile::Optimize+0x1101  (compile.cpp:2425)
V  [jvm.dll+0x254d17]  Compile::Compile+0xe47  (compile.cpp:853)
V  [jvm.dll+0x1d06ea]  C2Compiler::compile_method+0x11a  (c2compiler.cpp:145)
V  [jvm.dll+0x264c31]  CompileBroker::invoke_compiler_on_method+0x811  (compileBroker.cpp:2306)
V  [jvm.dll+0x262efb]  CompileBroker::compiler_thread_loop+0x26b  (compileBroker.cpp:1962)
V  [jvm.dll+0x4051f6]  JavaThread::run+0x116  (javaThread.cpp:742)
V  [jvm.dll+0x892bf8]  Thread::call_run+0xc8  (thread.cpp:235)
V  [jvm.dll+0x6fb695]  thread_native_entry+0x95  (os_windows.cpp:553)
C  [ucrtbase.dll+0x1fb80]  (no source info available)
C  [KERNEL32.DLL+0x84d4]  (no source info available)
C  [ntdll.dll+0x51a11]  (no source info available)


2)

#  EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x00007ffcf58b6202, pid=14520, tid=28428
#
# V  [jvm.dll+0x156202]  LIR_OpVisitState::append+0x52

---------------  T H R E A D  ---------------

Current thread (0x000001cff5b98820):  JavaThread "C1 CompilerThread1" daemon [_thread_in_native, id=28428, stack(0x00000019a9800000,0x00000019a9900000) (1024K)]

Current CompileTask:
C1:7053 8888   !   3       com.pietjonas.wmfwriter2d.WMFGraphics::setGDIFillBrush (231 bytes)

Stack: [0x00000019a9800000,0x00000019a9900000],  sp=0x00000019a98fde40,  free space=1015k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [jvm.dll+0x156202]  LIR_OpVisitState::append+0x52  (c1_LIR.hpp:2488)
V  [jvm.dll+0x174e83]  LIR_OpVisitState::visit+0xd3  (c1_LIR.cpp:928)
V  [jvm.dll+0x19dfc2]  LinearScan::build_intervals+0x2e2  (c1_LinearScan.cpp:1364)
V  [jvm.dll+0x19ffb4]  LinearScan::do_linear_scan+0x34  (c1_LinearScan.cpp:3099)
V  [jvm.dll+0x158cb2]  Compilation::emit_lir+0x132  (c1_Compilation.cpp:277)
V  [jvm.dll+0x1584fe]  Compilation::compile_java_method+0x16e  (c1_Compilation.cpp:409)
V  [jvm.dll+0x1587bb]  Compilation::compile_method+0x1db  (c1_Compilation.cpp:484)
V  [jvm.dll+0x157a61]  Compilation::Compilation+0x201  (c1_Compilation.cpp:611)
V  [jvm.dll+0x159911]  Compiler::compile_method+0xe1  (c1_Compiler.cpp:261)
V  [jvm.dll+0x264c31]  CompileBroker::invoke_compiler_on_method+0x811  (compileBroker.cpp:2306)
V  [jvm.dll+0x262efb]  CompileBroker::compiler_thread_loop+0x26b  (compileBroker.cpp:1962)
V  [jvm.dll+0x4051f6]  JavaThread::run+0x116  (javaThread.cpp:742)
V  [jvm.dll+0x892be8]  Thread::call_run+0xc8  (thread.cpp:235)
V  [jvm.dll+0x6fb685]  thread_native_entry+0x95  (os_windows.cpp:553)
C  [ucrtbase.dll+0x2268a]  (no source info available)
C  [KERNEL32.DLL+0x17ac4]  (no source info available)
C  [ntdll.dll+0x5a8c1]  (no source info available)

3)

#  Internal Error (node.cpp:2955), pid=9520, tid=7900
#  Error: ShouldNotReachHere()

Current thread (0x0000015428937860):  JavaThread "C2 CompilerThread0" daemon [_thread_in_native, id=7900, stack(0x000000336c400000,0x000000336c500000) (1024K)]

Current CompileTask:
C2:10741 8161       4       sun.font.HBShaper::get_nominal_glyph (57 bytes)

Stack: [0x000000336c400000,0x000000336c500000]
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [jvm.dll+0x6fce49]  os::win32::platform_print_native_stack+0xd9  (os_windows_x86.cpp:235)
V  [jvm.dll+0x8eda15]  VMError::report+0xd95  (vmError.cpp:1011)
V  [jvm.dll+0x8efccd]  VMError::report_and_die+0x5fd  (vmError.cpp:1846)
V  [jvm.dll+0x8f0337]  VMError::report_and_die+0x47  (vmError.cpp:1611)
V  [jvm.dll+0x28d587]  report_vm_error+0x57  (debug.cpp:193)
V  [jvm.dll+0x28d5ac]  report_vm_error+0x1c  (debug.cpp:149)
V  [jvm.dll+0x28d500]  report_should_not_reach_here+0x10  (debug.cpp:240)
V  [jvm.dll+0x6d7e15]  Unique_Node_List::remove+0xc5  (node.cpp:2955)
V  [jvm.dll+0x7245ea]  PhaseIterGVN::remove_globally_dead_node+0x35a  (phaseX.cpp:1380)
V  [jvm.dll+0x724efa]  PhaseIterGVN::subsume_node+0x2ca  (phaseX.cpp:1430)
V  [jvm.dll+0x3cc525]  idealize_test+0x1c5  (ifnode.cpp:1919)
V  [jvm.dll+0x3cadae]  IfNode::Ideal_common+0x6e  (ifnode.cpp:1479)
V  [jvm.dll+0x3ca885]  IfNode::Ideal+0x15  (ifnode.cpp:1494)
V  [jvm.dll+0x72578e]  PhaseIterGVN::transform_old+0x5e  (phaseX.cpp:1198)
V  [jvm.dll+0x723d02]  PhaseIterGVN::optimize+0x182  (phaseX.cpp:1048)
V  [jvm.dll+0x2567b4]  Compile::Optimize+0x204  (compile.cpp:2239)
V  [jvm.dll+0x254d17]  Compile::Compile+0xe47  (compile.cpp:853)
V  [jvm.dll+0x1d06ea]  C2Compiler::compile_method+0x11a  (c2compiler.cpp:145)
V  [jvm.dll+0x264c31]  CompileBroker::invoke_compiler_on_method+0x811  (compileBroker.cpp:2306)
V  [jvm.dll+0x262efb]  CompileBroker::compiler_thread_loop+0x26b  (compileBroker.cpp:1962)
V  [jvm.dll+0x4051f6]  JavaThread::run+0x116  (javaThread.cpp:742)
V  [jvm.dll+0x892be8]  Thread::call_run+0xc8  (thread.cpp:235)
V  [jvm.dll+0x6fb685]  thread_native_entry+0x95  (os_windows.cpp:553)
C  [ucrtbase.dll+0x1fb80]  (no source info available)
C  [KERNEL32.DLL+0x84d4]  (no source info available)
C  [ntdll.dll+0x51a11]  (no source info available)





Comments
Changeset: 90641a60 Branch: master Author: Tobias Hartmann <thartmann@openjdk.org> Date: 2024-07-29 05:05:32 +0000 URL: https://git.openjdk.org/jdk/commit/90641a601c9385b5e76e1abc5362ace3ab1fff3d
29-07-2024

I put the fix into our build/test queue.
23-07-2024

FIx is out for review. [~mbaesken], [~rrich], could you please confirm that it fixes the issue?
23-07-2024

A pull request was submitted for review. Branch: master URL: https://git.openjdk.org/jdk/pull/20297 Date: 2024-07-23 10:46:01 +0000
23-07-2024

I implemented some verification code similar to 'GrowableArrayNestingCheck' we have for GrowableArrays (see attached assert.patch) and it already triggers during the build: # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/workspace/open/src/hotspot/share/opto/node.cpp:2776), pid=2797772, tid=2811603 # fatal error: allocation bug: Node_Array could grow within nested ResourceMark # # JRE version: Java(TM) SE Runtime Environment (24.0) (slowdebug build 24-internal-2024-07-22-1258214.tobias.hartmann.jdk2) # Java VM: Java HotSpot(TM) 64-Bit Server VM (slowdebug 24-internal-2024-07-22-1258214.tobias.hartmann.jdk2, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0x13068b6] Node_Array::grow(unsigned int)+0x5e Current CompileTask: C2:43320 2223 4 java.util.regex.Pattern::atom (422 bytes) Stack: [0x0000153a55bfd000,0x0000153a55cfd000], sp=0x0000153a55cf6cb0, free space=999k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x13068b6] Node_Array::grow(unsigned int)+0x5e (node.cpp:2776) V [libjvm.so+0x51e4ec] Node_Array::map(unsigned int, Node*)+0x24 (node.hpp:1621) V [libjvm.so+0x51e53b] Node_List::push(Node*)+0x33 (node.hpp:1653) V [libjvm.so+0x149a51f] PhaseIdealLoop::register_new_node(Node*, Node*)+0xf3 (split_if.cpp:440) V [libjvm.so+0x1197813] PhaseIdealLoop::register_new_node_with_ctrl_of(Node*, Node*)+0x3d (loopnode.hpp:1682) V [libjvm.so+0x170b1d7] VTransformNode::register_new_node_from_vectorization(VLoopAnalyzer const&, Node*, Node*) const+0x47 (vtransform.cpp:357) V [libjvm.so+0x170b124] VTransformVectorNode::register_new_node_from_vectorization_and_replace_scalar_nodes(VLoopAnalyzer const&, Node*) const+0x64 (vtransform.cpp:347) V [libjvm.so+0x170af20] VTransformLoadVectorNode::apply(VLoopAnalyzer const&, GrowableArray<Node*> const&) const+0x244 (vtransform.cpp:322) V [libjvm.so+0x1564232] VTransformGraph::apply_vectorization_for_each_vtnode(unsigned int&, unsigned int&) const+0xd2 (superword.cpp:2008) V [libjvm.so+0x15643b0] VTransform::apply_vectorization() const+0x76 (superword.cpp:2029) V [libjvm.so+0x15637bf] VTransform::apply()+0x201 (superword.cpp:1902) V [libjvm.so+0x156359a] SuperWord::schedule_and_apply() const+0x15c (superword.cpp:1876) V [libjvm.so+0x155d90e] SuperWord::SLP_extract()+0x192 (superword.cpp:477) V [libjvm.so+0x155d705] SuperWord::transform_loop()+0x157 (superword.cpp:392) V [libjvm.so+0x11ccbed] PhaseIdealLoop::auto_vectorize(IdealLoopTree*, VSharedData&)+0x12d (loopopts.cpp:4448) V [libjvm.so+0x11af8fb] PhaseIdealLoop::build_and_optimize()+0x1869 (loopnode.cpp:4894) V [libjvm.so+0x9d4331] PhaseIdealLoop::PhaseIdealLoop(PhaseIterGVN&, LoopOptsMode)+0x163 (loopnode.hpp:1117) V [libjvm.so+0x9d4599] PhaseIdealLoop::optimize(PhaseIterGVN&, LoopOptsMode)+0x47 (loopnode.hpp:1197) V [libjvm.so+0x9c46b8] Compile::optimize_loops(PhaseIterGVN&, LoopOptsMode)+0x68 (compile.cpp:2171) V [libjvm.so+0x9c57c8] Compile::Optimize()+0xf56 (compile.cpp:2418) V [libjvm.so+0x9be072] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x15ba (compile.cpp:852) Similar to JDK-8325672, we could simply allocate IdealLoopTree::_body in the C->comp_arena().
22-07-2024

I got a crash with slowdebug with -XX:+ZapResourceArea. It shows that the 0xab patterns seen in hs_err crash logs aren't false positives. 0xab is the `badResourceValue` used to fill resource area chunks when they are freed. Below we see a local variable `node` at L586 referencing memory filled with 0xab. So this is a use-after-free error which causes memory corruption. (gdb) down #24 0x00007fbd95679199 in PhaseCFG::get_block_for_node (this=0x7fbc7f1f9f60, node=0x7fbbf400bef8) at .../src/hotspot/share/opto/block.hpp:586 586 return _node_to_block_mapping[node->_idx]; (gdb) list 581 _node_to_block_mapping.map(node->_idx, nullptr); 582 } 583 584 // get the block in which this node resides 585 Block* get_block_for_node(const Node* node) const { 586 return _node_to_block_mapping[node->_idx]; 587 } 588 589 // does this node reside in a block; return true 590 bool has_block(const Node* node) const { (gdb) p node $3 = (const Node *) 0x7fbbf400bef8 (gdb) x/4gx node 0x7fbbf400bef8: 0xabababababababab 0xabababababababab 0x7fbbf400bf08: 0xabababababababab 0xabababababababab Current thread (0x00007fbc38010db0): JavaThread "C2 CompilerThread5" daemon [_thread_in_native, id=22638, stack(0x00007fbc7f0fe000,0x00007fbc7f1fe000) (1024K)] Current CompileTask: C2:302436 20751 4 java.util.TimSort::mergeLo (659 bytes) Stack: [0x00007fbc7f0fe000,0x00007fbc7f1fe000], sp=0x00007fbc7f1f9850, free space=1006k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x729f1a] Block_Array::operator[](unsigned int) const+0x46 (block.hpp:69) V [libjvm.so+0x72a199] PhaseCFG::get_block_for_node(Node const*) const+0x29 (block.hpp:586) V [libjvm.so+0xd9aec8] Node_Backward_Iterator::next()+0x18a (gcm.cpp:927) V [libjvm.so+0xd9c64c] PhaseCFG::schedule_late(VectorSet&, Node_Stack&)+0x8a2 (gcm.cpp:1286) V [libjvm.so+0xd9c8a6] PhaseCFG::global_code_motion()+0x23a (gcm.cpp:1521) V [libjvm.so+0xd9d10a] PhaseCFG::do_global_code_motion()+0x60 (gcm.cpp:1644) V [libjvm.so+0x9e89d9] Compile::Code_Gen()+0x1cd (compile.cpp:2948) V [libjvm.so+0x9dec62] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x1760 (compile.cpp:885) V [libjvm.so+0x8a5711] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x1f3 (c2compiler.cpp:142) V [libjvm.so+0xa018dc] CompileBroker::invoke_compiler_on_method(CompileTask*)+0xa1c (compileBroker.cpp:2303) V [libjvm.so+0xa00360] CompileBroker::compiler_thread_loop()+0x40e (compileBroker.cpp:1961) V [libjvm.so+0xa1fc2b] CompilerThread::thread_entry(JavaThread*, JavaThread*)+0x89 (compilerThread.cpp:65) V [libjvm.so+0xe6f9bc] JavaThread::thread_main_inner()+0x168 (javaThread.cpp:757) V [libjvm.so+0xe6f851] JavaThread::run()+0x1d7 (javaThread.cpp:742) V [libjvm.so+0x16c90fd] Thread::call_run()+0x1b9 (thread.cpp:225) V [libjvm.so+0x139024e] thread_native_entry(Thread*)+0x1cb (os_linux.cpp:858)
22-07-2024

I tried to come up with a stress option to trigger this more reliably and reproduce on our end (see attached stress.patch), no luck so far. I'm running out of time for this week but I think it's pretty obvious from the preliminary investigation that this is a problem with the ResourceMark scopes. The obvious fix would be to restore the scopes to be like they were before JDK-8333684 and your experiments showed that this solves the issue.
19-07-2024

> It is some BI/reporting related software, that also works quite a lot with fonts. It comes as a number of large jars, and some native code is in there too. Not sure about unsafe. I checked with JFR (JFR Event for @Deprecated Methods ) for Unsafe usage but could not find anything. But the reason of the crashes is something else anyway , so just for 'full picture'. Regarding the '0xaba' pattern in the Register to memory mapping: ...... RBX=0x000001c76aeaa808 points into unknown readable memory: 0xabababababababab | ab ab ab ab ab ab ab ab Should we maybe add some kind of "Warning" to hserr in case those show up, indicating bad resources ?
19-07-2024

> > I got a crash with slowdebug with -XX:+ZapResourceArea > > Isn't that flag true by default in debug anyway? You're right. I forgot and didn't check somehow. Unfortunately the badResourceValue (0xab) wasn't that prominent in the crash logs... Great you found the root cause!
19-07-2024

I think I found the issue: In SuperWord::apply_vectorization() we call register_new_node_with_ctrl_of on PhaseIdealLoop: https://github.com/openjdk/jdk/blob/5d2a19def154b81c8ebada5594e080fe76c5ffee/src/hotspot/share/opto/superword.cpp#L2469 -> https://github.com/openjdk/jdk/blob/5d2a19def154b81c8ebada5594e080fe76c5ffee/src/hotspot/share/opto/loopnode.hpp#L1676 -> https://github.com/openjdk/jdk/blob/5d2a19def154b81c8ebada5594e080fe76c5ffee/src/hotspot/share/opto/split_if.cpp#L437 which then modifies the _body NodeList that is resource area allocated. This might lead to re-allocation of the NodeLists if they are full and if we do this inside another ResourceMark, the newly allocated memory is incorrectly freed once we leave the scope of that ResourceMark. JDK-8333684 changed the code such that a re-allocation would happen inside a nested ResourceMark. That's incorrect. Separately, given that we hit these kind of issues before, I think we need a stress mode that forces more frequent re-allocation of the NodeLists.
19-07-2024

> I got a crash with slowdebug with -XX:+ZapResourceArea Isn't that flag true by default in debug anyway? > The crashes don't reproduce with that change. Great, I think we found the root cause then! A resource area allocated object is probably passed/used outside of the scope of the corresponding ResourceMark (similar to JDK-8325672). [~epeter], please have a look.
19-07-2024

> One thing I just noticed when staring at the JDK-8333684 changes again: > SuperWord::output (now called "apply_vectorization") got moved into the scope > of this ResourceMark in SuperWord::schedule/schedule_and_apply: > https://github.com/openjdk/jdk/commit/5d2a19def154b81c8ebada5594e080fe76c5ffee#diff-8f29dd005a0f949d108687dabb7379c73dfd85cd782da453509dc9b6cb8c9f81R2102 > Before, it was in the scope of the ResourceMark outside of > SuperWord::SLP_extract. That could make a big difference, if I'm not missing > anything, so [~mbaesken] you might want to try reverting that. I've tried that. See https://github.com/reinrich/jdk/commit/b4797d8222b7f68d397f377e705051510880740f The crashes don't reproduce with that change. HEAD looks different though. Not sure what to do there.
18-07-2024

>Before, it was in the scope of the ResourceMark outside of SuperWord::SLP_extract. That could make a big difference, if I'm not missing anything, so [~mbaesken] you might >want to try reverting that. I am not really sure what I should move where. Do you maybe have a little patch I could apply on top of jdk-head ?
18-07-2024

>>- Could you try to remove the ResourceMarks from superword.cpp and see if that changes anything? >Thanks for the advice, I'll try it. I uncommented the ResourceMarks in superword.cpp. With this special build, the crashes do not show up.
18-07-2024

I can indeed reproduce crashes on Linux x86_64 running one test case of the suite. It works sufficiently well. > My best guess is still memory corruption caused by native code in your test suite or some dependent library. I agree. I'm currently testing with -XX:+ZapResourceArea to see if the corruption might originate from a use of a resource area allocated object after it went out of scope of its ResourceMark. So far I got one crash with R11=0x00007f2b74932008 points into unknown readable memory: 0xabababababababab | ab ab ab ab ab ab ab ab stack at sp + 4 slots: 0x00007f2bc805f9e0 points into unknown readable memory: 0xabababab00000003 | 03 00 00 00 ab ab ab ab Of course the pointers could be dead. Currently I'm trying if I can reproduce with a slowdebug build. Looks like this doesn't work well.
18-07-2024

One thing I just noticed when staring at the JDK-8333684 changes again: SuperWord::output (now called "apply_vectorization") got moved into the scope of this ResourceMark in SuperWord::schedule/schedule_and_apply: https://github.com/openjdk/jdk/commit/5d2a19def154b81c8ebada5594e080fe76c5ffee#diff-8f29dd005a0f949d108687dabb7379c73dfd85cd782da453509dc9b6cb8c9f81R2102 Before, it was in the scope of the ResourceMark outside of SuperWord::SLP_extract. That could make a big difference, if I'm not missing anything, so [~mbaesken] you might want to try reverting that.
18-07-2024

Thanks [~mbaesken]! So a toolchain / OS issue seems rather unlikely. My best guess is still memory corruption caused by native code in your test suite or some dependent library. > I thought about this already, any suggestions what part(s) I could revert separately? Not really, maybe [~epeter] has some suggestions.
18-07-2024

ILW = Various crashes during C1/C2 compilation (looks like memory corruption, might be external), intermittent on Windows x64 with test suite, disable superword = HMM = P2
18-07-2024

[~mbaesken] could you share some more information about what the test suite is doing? Is there any native/unsafe code that could cause a memory corruption? We would like to help investigating this but it's impossible without a reproducer. It's weird that we've never seen that in any of our testing. Could you please check the following: - Which Windows version and toolchain versions (Visual Studio) are you using? Maybe it's a C++ compiler issue. - Running the same test suite on other operating systems never triggered the issue, correct? - Could you try to remove the ResourceMarks from superword.cpp and see if that changes anything? - Could you try to revert parts of JDK-8333684 to see which of the refactoring changes causes the issue?
18-07-2024

>could you share some more information about what the test suite is doing? Is there any native/unsafe code that could cause a memory corruption? It is some BI/reporting related software, that also works quite a lot with fonts. It comes as a number of large jars, and some native code is in there too. Not sure about unsafe. >We would like to help investigating this but it's impossible without a reproducer. It's weird that we've never seen that in any of our testing. We run a lot of other tests too, and see it only with this suite. >- Which Windows version and toolchain versions (Visual Studio) are you using? Maybe it's a C++ compiler issue. Win11 and some Win server versions. We see it on various Win OS versions. Regarding VS, we use VS2019 and I use also locally VS2022 and the issue shows up with both. > - Running the same test suite on other operating systems never triggered the issue, correct? Until today yes, but I heard today from a colleague that we saw it also on Linux x86_64 (so far not sure about the details). >- Could you try to remove the ResourceMarks from superword.cpp and see if that changes anything? Thanks for the advice, I'll try it. >- Could you try to revert parts of JDK-8333684 to see which of the refactoring changes causes the issue? I thought about this already, any suggestions what part(s) I could revert separately ?
18-07-2024

Okay, thanks for confirming. Both [~epeter] and I had another look at JDK-8333684 but we didn't spot anything suspicious. So either it's a side effect (Difference in memory consumption leading to different re-allocation pattern in ResourceMark scope and triggering some existing bug? Different timing? ...) or we are missing something.
17-07-2024

> I meant going back to the changeset just after JDK-8333684, verify that the issue still reproduces, revert only JDK-8333684 and check if it does not reproduce anymore. Please see the commit-history here : https://github.com/openjdk/jdk/commits/master/?after=a44b60c8c14ad998e51239f48e64779304aaac50+279 . I tested the commit just before JDK-8333684, this one : 8311110: multichar warning in WinAccessBridge.cpp ( https://github.com/openjdk/jdk/commit/301bd7085654328f941c462bc786e995051d1a9c ) ; no crashes seen . The commit just AFTER JDK-8333684 has only test changes, so regarding src/make it is identical to previous commit with reverted JDK-8333684 .
17-07-2024

> This one I think has follow up changes , so how could I simply revert just this one ? I meant going back to the changeset just after JDK-8333684, verify that the issue still reproduces, revert only JDK-8333684 and check if it does not reproduce anymore.
17-07-2024

>You could try to trace some things from SuperWord. >TraceNewVectors: we would see if something gets vectorized - and maybe wrongly - creating memory corruption elsewhere? >TraceAutoVectorization: you could see if SuperWord is run at all - maybe the VM code creates memory corruption? Yes I can do some runs with those flags. Should I set both flags at the same time? For which output should I look ?
17-07-2024

> that only reverting JDK-8333684 also makes the issue disappear? This one I think has follow up changes , so how could I simply revert just this one ?
17-07-2024

Just double checking: [~mbaesken] since -XX:-SuperWord is quite invasive and as Emanuel explained will have a huge side-effect, did you verify that only reverting JDK-8333684 also makes the issue disappear? I wonder if it might be memory corruption similar to JDK-8325672, lots of Node_List objects are used / passed in SuperWord.
16-07-2024

> With disabled SuperWord (UseSuperWord = false) the issue does not occur. If it is a SuperWord bug, then disabling SuperWord makes the bug disappear. But we have seen bugs in the past where the bug is elsewhere, and disabling SuperWord makes the bug go away as well - it is a kind of "side-effect". > Some testing showed the issue started to appear since > 8333684: C2 SuperWord: multiple smaller refactorings in preparation for JDK-8332163 > https://github.com/openjdk/jdk/commit/5d2a19def154b81c8ebada5594e080fe76c5ffee I looked through the change again. I cannot see how this creates memory corruption. This was merely supposed to be a refactoring, so behaviour change is not expected. Hard to say more without seeing more data. You could try to trace some things from SuperWord. TraceNewVectors: we would see if something gets vectorized - and maybe wrongly - creating memory corruption elsewhere? TraceAutoVectorization: you could see if SuperWord is run at all - maybe the VM code creates memory corruption? Ideally, we could compare a "healthy" and "bad" run, and see the cause that way.
16-07-2024

That's really weird because PhaseRemoveUseless is run *before* SuperWord. So disabling SuperWord should not have any effect. Also it would obviously not have any effect on C1 compilation. Or am I missing something? As I mentioned in our email discussion, this looks like memory corruption.
16-07-2024

With disabled SuperWord (UseSuperWord = false) the issue does not occur. Some testing showed the issue started to appear since 8333684: C2 SuperWord: multiple smaller refactorings in preparation for JDK-8332163 https://github.com/openjdk/jdk/commit/5d2a19def154b81c8ebada5594e080fe76c5ffee
16-07-2024

Here is another one ( replay_pid23516.log) for the following crash below. But a) it uses internal jars not available to you b) even with the internal jars I could not reproduce a crash with the replay file . But I still attach it, just in case maybe it helps somehow. # EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x00007ffeea478cf3, pid=23516, tid=4244 Current CompileTask: C2:4208 8844 4 sun.awt.geom.AreaOp::addEdges (52 bytes) Stack: [0x00000068b8f00000,0x00000068b9000000], sp=0x00000068b8ffd950, free space=1014k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [jvm.dll+0x258cf3] Compile::identify_useful_nodes+0xc3 (compile.cpp:318) V [jvm.dll+0x7164ab] PhaseRemoveUseless::PhaseRemoveUseless+0x6b (phaseX.cpp:374) V [jvm.dll+0x250505] Compile::Compile+0xd25 (compile.cpp:826) V [jvm.dll+0x1ce1da] C2Compiler::compile_method+0x11a (c2compiler.cpp:145) V [jvm.dll+0x2607eb] CompileBroker::invoke_compiler_on_method+0x80b (compileBroker.cpp:2306) V [jvm.dll+0x25eaaa] CompileBroker::compiler_thread_loop+0x26a (compileBroker.cpp:1962) V [jvm.dll+0x3fcfe6] JavaThread::run+0x116 (javaThread.cpp:744) V [jvm.dll+0x886eb8] Thread::call_run+0xc8 (thread.cpp:235) V [jvm.dll+0x6f02b5] thread_native_entry+0x95 (os_windows.cpp:553) C [ucrtbase.dll+0x29333] (no source info available) C [KERNEL32.DLL+0x1257d] (no source info available) C [ntdll.dll+0x5af28] (no source info available)
11-07-2024

[~mbaesken], please attach the replay log files. Thanks.
10-07-2024