JDK-8319784 : VM crash during heap dump after JDK-8287061
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 22
  • Priority: P2
  • Status: Closed
  • Resolution: Fixed
  • Submitted: 2023-11-09
  • Updated: 2024-01-22
  • Resolved: 2023-11-21
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 22
22 b25Fixed
Related Reports
Relates :  
Description
The release VM crashes with SEGV during the heap dump, highly intermittently.

Manifests like this in release builds:

```
$ build/macosx-aarch64-server-release/images/jdk/bin/java -XX:+UseParallelGC -XX:+HeapDumpAfterFullGC -Xms256m -Xmx256m -jar dacapo-23.11-chopin.jar kafka

Heap dump file created [219361950 bytes in 0.108 secs]
Dumping heap to java_pid12474.hprof.9 ...
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x000000010399a1a0, pid=12474, tid=20995
#
# JRE version: OpenJDK Runtime Environment (22.0) (build 22-internal-adhoc.shipilev.shipilev-jdk)
# Java VM: OpenJDK 64-Bit Server VM (22-internal-adhoc.shipilev.shipilev-jdk, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, parallel gc, bsd-aarch64)
# Problematic frame:
# V  [libjvm.dylib+0x2e61a0]  ObjectMergeValue::value() const+0xc
```

And it catches the assert in debug builds:

```
$ build/macosx-aarch64-server-fastdebug/images/jdk/bin/java -XX:+UseParallelGC -XX:+HeapDumpAfterFullGC -Xms256m -Xmx256m -jar dacapo-23.11-chopin.jar kafka
...
Heap dump file created [237306088 bytes in 0.746 secs]
Dumping heap to java_pid13476.hprof.78 ...
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/Users/shipilev/Work/shipilev-jdk/src/hotspot/share/code/debugInfo.hpp:235), pid=13476, tid=16899
#  assert(_selected != nullptr) failed: Should call select() first.


---------------  S U M M A R Y ------------

Command Line: -XX:+UseParallelGC -XX:+HeapDumpAfterFullGC -Xms256m -Xmx256m /Users/shipilev/Work/dacapo-23.11-chopin.jar kafka

Host: bcd074129a52, "MacBookPro18,3" arm64, 10 cores, 32G, Darwin 22.6.0, macOS 13.5.2 (22G91)
Time: Thu Nov  9 12:19:19 2023 CET elapsed time: 65.902768 seconds (0d 0h 1m 5s)

---------------  T H R E A D  ---------------

Current thread (0x000000012a709bb0):  VMThread "VM Thread"          [id=16899, stack(0x000000016e300000,0x000000016e503000) (2060K)]

Stack: [0x000000016e300000,0x000000016e503000],  sp=0x000000016e501690,  free space=2053k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.dylib+0x12378e8]  VMError::report_and_die(int, char const*, char const*, char*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long)+0x554  (debugInfo.hpp:235)
V  [libjvm.dylib+0x12380f4]  VMError::report_and_die(Thread*, unsigned int, unsigned char*, void*, void*)+0x0
V  [libjvm.dylib+0x57170c]  print_error_for_unit_test(char const*, char const*, char*)+0x0
V  [libjvm.dylib+0x577400]  ObjectMergeValue::set_value(oop)+0x0
V  [libjvm.dylib+0x108c508]  StackValue* StackValue::create_stack_value<RegisterMap>(ScopeValue*, unsigned char*, RegisterMap const*)+0x388
V  [libjvm.dylib+0x122371c]  compiledVFrame::create_stack_value(ScopeValue*) const+0x140
V  [libjvm.dylib+0x1223344]  compiledVFrame::locals() const+0xac
V  [libjvm.dylib+0x7ee5e8]  ThreadDumper::dump_stack_refs(AbstractDumpWriter*)+0x2a4
V  [libjvm.dylib+0x7efbc0]  VM_HeapDumper::dump_threads()+0x70
V  [libjvm.dylib+0x7f037c]  VM_HeapDumper::work(unsigned int)+0x2b4
V  [libjvm.dylib+0x7eff2c]  VM_HeapDumper::doit()+0x17c
V  [libjvm.dylib+0x123eee8]  VM_Operation::evaluate()+0xf4
V  [libjvm.dylib+0x126172c]  VMThread::evaluate_operation(VM_Operation*)+0x114
V  [libjvm.dylib+0x12621b4]  VMThread::inner_execute(VM_Operation*)+0x1d0
V  [libjvm.dylib+0x7f124c]  HeapDumper::dump(char const*, outputStream*, int, bool, unsigned int)+0x194
V  [libjvm.dylib+0x7f1a4c]  HeapDumper::dump_heap(bool)+0x22c
V  [libjvm.dylib+0x4ba624]  CollectedHeap::full_gc_dump(GCTimer*, bool)+0x178
V  [libjvm.dylib+0xedcc0c]  PSParallelCompact::invoke_no_policy(bool)+0x810
V  [libjvm.dylib+0xef50ac]  PSScavenge::invoke()+0x198
```
Comments
Changeset: 3544d2dd Author: Cesar Soares Lucas <cslucas@openjdk.org> Committer: Tobias Hartmann <thartmann@openjdk.org> Date: 2023-11-21 07:20:28 +0000 URL: https://git.openjdk.org/jdk/commit/3544d2dd869c4c712f5c5ed172ddb7b1683e9a7f
21-11-2023

A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/16622 Date: 2023-11-11 01:52:54 +0000
14-11-2023

I was able to create a test case to reproduce the problem, and I think the fix may be just a matter of returning `Handle()` if `_selected == nullptr` in `ObjectMergeValue::value`. The same thing is essentially done for scalar-replaced objects not participating in merges. I created this _DRAFT_ pull request to start running additional tests: https://github.com/openjdk/jdk/pull/16622 . Aleksey's initial investigation was of great help! Thank you. I'll take a look again this Monday and hopefully turn the draft PR into a regular PR.
11-11-2023

Thanks for assigning to me Aleksey, I'm investigating this as a high priority on my side. /cc @Christian
10-11-2023

ILW = Crash when dumping heap, intermittent, use -XX:-ReduceAllocationMerges. = HML = P2
10-11-2023

I see, thanks Aleksey for the clarification. I first thought it's only with HeapDumpAfterFullGC. Then I think HML is justified. As a workaround we could probably use -XX:-ReduceAllocationMerges. I'm updating the ILW accordingly. We should definitely try to fix this in JDK 22.
10-11-2023

AFAICS, it is not just HeapDumpAfterFullGC, but any heap dump request on live VM when there are scalar replaced objects referenced on stack. So ILW is HML, making it P2?
10-11-2023

Can you have a look [~cslucas]?
10-11-2023

Note the relevant code in HeapDumper was changed recently by JDK-8316691, but I don't see how it might break the heap dumping code. Reverting JDK-8316691 from current mainline still crashes the reproducer.
09-11-2023

Assigning to Cesar for evaluation.
09-11-2023

I suspect it relates to JDK-8287061. I actually wonder if JDK-8287061 relies on deopt machinery to enter to rematerialize_objects(), which would call select() on needed values. But the heap dumping code does not deoptimize AFAICS, so it skips the select() and crashes.
09-11-2023