JDK-8303276 : Secondary assertion failure in AdapterHandlerLibrary::contains during crash reporting
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 20,21
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2023-02-28
  • Updated: 2023-05-01
  • Resolved: 2023-04-24
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 21
21 b20Fixed
Related Reports
Blocks :  
Relates :  
Description
We have the following call chain:

VMError::report
  os::print_register_info
    CodeBlob::dump_for_addr
      AdapterHandlerLibrary::contains

where:

bool AdapterHandlerLibrary::contains(const CodeBlob* b) {
  bool found = false;
  auto findblob = [&] (AdapterFingerPrint* key, AdapterHandlerEntry* a) {
    return (found = (b == CodeCache::find_blob(a->get_i2c_entry())));
  };
  assert_locked_or_safepoint(AdapterHandlerLibrary_lock);
  _adapter_handler_table.iterate(findblob);
  return found;
}

but we don't hold the lock, nor are we at a safepoint.

This was spotted in one of the tests that deliberately crashes the VM. The hs_err file shows:

Register to memory mapping:

RIP=0x00007ffaf5ab1000 LingeredApp.dll
RAX=0x00007ffaf5ab1000 LingeredApp.dll
RBX={method} {0x0000023fd4401dc0} 'crash' '()I' in 'jdk/test/lib/apps/LingeredApp'
RCX=0x0000023fb77581e0 points into unknown readable memory: 0x00007ffaeaeede60 | 60 de ee ea fa 7f 00 00
RDX=0x000000f29a5ff1b8 is pointing into the stack for thread: 0x0000023fb7757ec0
RSP=0x000000f29a5ff138 is pointing into the stack for thread: 0x0000023fb7757ec0
RBP=0x000000f29a5ff1a8 is pointing into the stack for thread: 0x0000023fb7757ec0
RSI=0x000000000000000c is an unknown value
RDI=0x0000023fd4017ac0 is pointing into metadata
R8 ={method} {0x0000023fd4401dc0} 'crash' '()I' in 'jdk/test/lib/apps/LingeredApp'
R9 =
[error occurred during error reporting (printing register info), id 0xe0000000, Internal Error (c:\sb\prod\1677455098\workspace\open\src\hotspot\share\runtime\mutexLocker.cpp:179)]

and the Windows debugger stack dump showed the problematic call chain.
Comments
Changeset: 2ea62c13 Author: Coleen Phillimore <coleenp@openjdk.org> Date: 2023-04-24 21:23:56 +0000 URL: https://git.openjdk.org/jdk/commit/2ea62c136925299d4b767a0149419e7e9de3629a
24-04-2023

A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/13500 Date: 2023-04-17 18:57:50 +0000
17-04-2023

I think assert_locked_or_safepoint code should be disabled if: if (DebuggingContext::is_enabled() || VMError::is_error_reported()) { We shouldn't take a lock for the adapter handler hash table in the unlikely event that another thread is accessing it during class linking during error reporting.
10-03-2023

Yes exactly - while the error reporting thread is calling contains, some other thread could be updating the table. Perhaps this was just as unsafe before the change.
28-02-2023

I think the assertion may need to accommodate being called during error reporting, rather than just disabling it, as we should really have the lock or be at a safepoint, to access the table. Though I guess without the lock we could easily crash when accessing the table. So maybe this just isn't safe to use during error reporting?
28-02-2023

ILW = Assert during error reporting when printing the contents of registers, intermittent during error reporting, no workaround = MMH = P3
28-02-2023

The assert in AdapterHandlerLibrary::contains was added by JDK-8292384 in JDK 20. [~coleenp], do you think we can simply remove it? Ideally, we would have a test that uses -XX:CICrashAt and other means to crash the VM at random points and then checks that there was no additional error during error reporting.
28-02-2023

I think that depends on what exactly can happen during that stage of error reporting. Can new classes/methods be linked? If so, the table could be updated concurrently and I think it would be unsafe to access without holding the lock.
28-02-2023