JDK-8303805 : [REDO] JDK-8302189 and JDK-8302799
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 21
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • OS: os_x
  • CPU: aarch64
  • Submitted: 2023-03-08
  • Updated: 2024-02-16
  • Resolved: 2023-03-29
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 21
21 b17Fixed
Related Reports
Blocks :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Sub Tasks
JDK-8303839 :  
Description
After the integration of JDK-8302189 (which hard a very large merge with master immediately before integration) builds of macos-aarch64-debug are failing due to a VM crash:

[2023-03-08T02:49:48,236Z] Optimizing the exploded image
[2023-03-08T02:49:49,737Z] #
[2023-03-08T02:49:49,737Z] # A fatal error has been detected by the Java Runtime Environment:
[2023-03-08T02:49:49,737Z] #
[2023-03-08T02:49:49,737Z] #  SIGSEGV (0xb) at pc=0x000000010297ec24, pid=35266, tid=24579
[2023-03-08T02:49:49,737Z] #
[2023-03-08T02:49:49,737Z] # JRE version: OpenJDK Runtime Environment (21.0+13) (fastdebug build 21-ea+13-1054)
[2023-03-08T02:49:49,737Z] # Java VM: OpenJDK 64-Bit Server VM (fastdebug 21-ea+13-1054, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, bsd-aarch64)
[2023-03-08T02:49:49,737Z] # Problematic frame:
[2023-03-08T02:49:49,737Z] # V  [libjvm.dylib+0xc7ac24]  InitializeNode::coalesce_subword_stores(long, Node*, PhaseGVN*)+0x380
[2023-03-08T02:49:49,737Z] #
[2023-03-08T02:49:49,737Z] # Core dump will be written. Default location: core.35266
[2023-03-08T02:49:49,737Z] #
[2023-03-08T02:49:49,737Z] # An error report file with more information is saved as:
[2023-03-08T02:49:49,737Z] # /System/Volumes/Data/mesos/work_dir/slaves/91e16c40-06d4-468a-9fc3-7198a5bb7d5a-S109737/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/001a5962-3ba0-468e-b3f9-f4edc3999b64/runs/a1ec2ebc-1bce-4fd4-9a93-8e12baf0773f/workspace/open/make/hs_err_pid35266.log
[2023-03-08T02:49:49,757Z] [thread 15107 also had an error]
[2023-03-08T02:49:49,757Z] [thread 24323 also had an error]
[2023-03-08T02:49:49,757Z] [thread 6147 also had an error]
Comments
Changeset: b3ff8d1c Author: Kim Barrett <kbarrett@openjdk.org> Date: 2023-03-29 23:45:03 +0000 URL: https://git.openjdk.org/jdk/commit/b3ff8d1c89b0f968b7b5ec2105502778524e4e4a
29-03-2023

A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/13199 Date: 2023-03-27 23:57:45 +0000
28-03-2023

The problem exists with Xcode12.4. Upgrading to either Xcode13.2.1 or Xcode14.2 fixes the problem.
17-03-2023

The core file isn't very useful. The thread with the error shows the location in coalesce_subword_store as "memnode.cpp:0:15". Selecting that frame in lldb we're told "Note: this address is compiler-generated code in function InitializeNode::coalesce_subword_stores(long, Node*, PhaseGVN*) that has no source code associated with it." Not too surprisingly, slowdebug builds fine.
17-03-2023

I can reproduce. I can also confirm that changing _enabled from int to unsigned removes the failure, which makes no sense at all. coalesce_subword_stores has 9 asserts. Commenting out all but the last still fails in the same way. Commenting out just the last builds successfully. That assert is assert(offset >= header_size, "do not smash header"); Changing that assert to if (offset >= header_size) fatal("..."); also fails in the same way. Changing that to a "wrapper" around `fatal` that isn't declared noreturn successfully builds. So the problem isn't really the complications introduced in the assert test, but rather the noreturn. Continuing to explore.
17-03-2023

Filed JDK-8303810 to restore the attribute positions undone by this backout to match the Style Guide's attribute position requirement
09-03-2023

I confirmed that changing to unsigned int also fixes the issue.
08-03-2023

I tried to debug things and modified isEnabled: static bool is_enabled() { + if (_enabled > 0) { + printf("ERROR: Unexpected _enabled state: %d\n", _enabled); + return true; + } else { + return false; + } + } but the build passed with that change, so I'm guessing it was sufficient to fix whatever compiler bug is present. I guess we need to get a hold of the created object files and disassemble them to see what the compiler is generating.
08-03-2023

Sorry I missed that there were two bug ids covered by the same PR - thanks for changing to the more appropriate one.
08-03-2023

Moving from hotspot/compiler -> hotspot/runtime since that's where JDK-8302799 lives.
08-03-2023

Thanks Damon. Given that we still don't understand the root cause and suspect a toolchain issue, I would recommend a backout of the offending change and re-assign to [~kbarrett] to decide.
08-03-2023

Based on [~dholmes] comment here: https://bugs.openjdk.org/browse/JDK-8303805?focusedCommentId=14565223&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14565223 it looks like this regression is caused by: JDK-8302799 Refactor Debugging variable usage for noreturn crash reporting so I've updated the synopsis line and added a link to JDK-8302799.
08-03-2023

The issue seems to disappear by making "_enabled" an "unsigned int" (which i guess could be ok for its purpose). Still it's not really clear why this happens.
08-03-2023

ILW = Build fails due to SIGSEGV during C2 compilation (regression), macos-aarch64-debug build, no workaround but disable C2 compilation = HMH = P1
08-03-2023

I just wonder why testing did not catch this, maybe [~kbarrett] missed the failing build on the results page. Or was there a last minute change?
08-03-2023

[~dfenacci], please have a look at this. Maybe we can go with a point fix like disabling the assert (as a subtask of this issue) to re-enable the build.
08-03-2023

I changed the code so that is_enabled() always return false and the build passed. There is an assert at the start of InitializeNode::coalesce_subword_stores so I have to suspect that the assert itself somehow crashes due to something toolchain related (else why only this one platform!).
08-03-2023

I grabbed the PR branch for JDK-8302189 and the build failed. I then reverted the merge and the build still failed. I then reverted "make Debugging::_enabled a nesting counter " and the build passed! I then manually added back the patch to make _enabled a counter and the build failed again! This is that patch: diff --git a/src/hotspot/share/utilities/debug.cpp b/src/hotspot/share/utilities/debug.cpp index 3ba7ede8f6fc..ec67c18734f7 100644 --- a/src/hotspot/share/utilities/debug.cpp +++ b/src/hotspot/share/utilities/debug.cpp @@ -76,18 +76,15 @@ static intx g_asserting_thread = 0; static void* g_assertion_context = nullptr; #endif // CAN_SHOW_REGISTERS_ON_ASSERT -bool DebuggingContext::_enabled = false; +int DebuggingContext::_enabled = 0; // Initially disabled. DebuggingContext::DebuggingContext() { - // Not an assert, since asserts are disabled by DebuggingContext. - if (is_enabled()) { - fatal("Multiple Debugging contexts"); - } - _enabled = true; + _enabled += 1; // Increase nesting count. } DebuggingContext::~DebuggingContext() { - _enabled = false; + assert(is_enabled(), "Debugging nesting confusion"); + _enabled -= 1; // Decrease nesting count. } #ifndef ASSERT diff --git a/src/hotspot/share/utilities/debug.hpp b/src/hotspot/share/utilities/debug.hpp index 1b14e4b45ce7..91b3e6564155 100644 --- a/src/hotspot/share/utilities/debug.hpp +++ b/src/hotspot/share/utilities/debug.hpp @@ -55,14 +55,14 @@ bool handle_assert_poison_fault(const void* ucVoid, const void* faulting_address // those operations are not normally permitted, with the state checked by an // assert. We want the debugging commands to bypass those checks. class DebuggingContext { - static bool _enabled; + static int _enabled; // Nesting counter. public: DebuggingContext(); ~DebuggingContext(); // Asserts and other code use this to determine whether to bypass checks // that would otherwise lead to program termination. - static bool is_enabled() { return _enabled; } + static bool is_enabled() { return _enabled > 0; } }; // VMASSERT_CHECK_PASSED(P) provides the mechanism by which DebuggingContext
08-03-2023

--------------- T H R E A D --------------- Current thread (0x000000015400d410): JavaThread "C2 CompilerThread1" daemon [_thread_in_native, id=41475, stack(0x0000000170e64000,0x0000000171067000)] Current CompileTask: C2: 511 349 4 java.util.stream.ReferencePipeline$3$1::accept (23 bytes) Stack: [0x0000000170e64000,0x0000000171067000], sp=0x0000000171063010, free space=2044k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.dylib+0xc785c4] InitializeNode::coalesce_subword_stores(long, Node*, PhaseGVN*)+0x380 V [libjvm.dylib+0xc790a4] InitializeNode::complete_stores(Node*, Node*, Node*, long, Node*, PhaseIterGVN*)+0xc4 V [libjvm.dylib+0xbea3f0] PhaseMacroExpand::initialize_object(AllocateNode*, Node*, Node*, Node*, Node*, Node*, Node*)+0x3dc V [libjvm.dylib+0xbe92b4] PhaseMacroExpand::expand_allocate_common(AllocateNode*, Node*, TypeFunc const*, unsigned char*, Node*)+0x73c V [libjvm.dylib+0xbeeca0] PhaseMacroExpand::expand_macro_nodes()+0x5e8 V [libjvm.dylib+0x487074] Compile::Optimize()+0xd48 V [libjvm.dylib+0x484d30] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0xfc0 V [libjvm.dylib+0x36df40] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x160 V [libjvm.dylib+0x4a1448] CompileBroker::invoke_compiler_on_method(CompileTask*)+0x6d8 V [libjvm.dylib+0x4a0ad0] CompileBroker::compiler_thread_loop()+0x2e8 V [libjvm.dylib+0x825ca4] JavaThread::thread_main_inner()+0x178 V [libjvm.dylib+0xf54c60] Thread::call_run()+0xf4 V [libjvm.dylib+0xd3a2c4] thread_native_entry(Thread*)+0x120 C [libsystem_pthread.dylib+0x7240] _pthread_start+0x94 siginfo: si_signo: 11 (SIGSEGV), si_code: 2 (SEGV_ACCERR), si_addr: 0xfffffff1710630a8
08-03-2023