JDK-8305489 : runtime/ErrorHandling/TestDwarf.java fails in some Linux configurations after JDK-8303805
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 21
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2023-04-03
  • Updated: 2024-09-16
  • Resolved: 2024-09-10
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 24
24 b15Fixed
Related Reports
Duplicate :  
Relates :  
Relates :  
Relates :  
Relates :  
Sub Tasks
JDK-8306029 :  
Description
Also seen in GHA. Bisect points to JDK-8303805.

Reproduces with GCC 11.3.0.

$ make images run-test TEST=runtime/ErrorHandling/TestDwarf.java

STDERR:
java.lang.RuntimeException: Could not find filename or line number in "V  [libjvm.so+0x8f5181]  (debug.cpp:271)"
	at jdk.test.lib.Asserts.fail(Asserts.java:594)
	at TestDwarf.checkNoSourceLine(TestDwarf.java:185)
	at TestDwarf.runAndCheck(TestDwarf.java:150)
	at TestDwarf.test(TestDwarf.java:104)
	at TestDwarf.main(TestDwarf.java:91)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
	at java.base/java.lang.reflect.Method.invoke(Method.java:578)
	at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:125)
	at java.base/java.lang.Thread.run(Thread.java:1623)

Comments
Changeset: 125f7432 Branch: master Author: Christian Hagedorn <chagedorn@openjdk.org> Date: 2024-09-10 08:14:40 +0000 URL: https://git.openjdk.org/jdk/commit/125f743223f2beb6e73f520c48a9a2de7ba5dce7
10-09-2024

A pull request was submitted for review. Branch: master URL: https://git.openjdk.org/jdk/pull/20811 Date: 2024-09-02 10:36:10 +0000
02-09-2024

Hi [~lfoltan], should we keep an open umbrella RFE with all the things currently not working with DWARF parsing that could be picked up by someone in the future or by us if priorities will change at some point? Personally, I don't think that I will have time to work on anything DWARF related, either, in the near future. I think for this bug, we could probably just get rid of the failing TestDwarf.java#checkDecoder configuration and clean the problemlist entry. We could then make a note in the umbrella RFE that this test removal could be reverted if there is a fix for it once. I could take care of that cleanup - we can probably use this bug for the test removal change.
16-08-2024

[RT Triage]: Please remove the test if it is no longer valid or can't be made reliable. Further problems in DWARF processing are going to be closed as WNF due to priorities.
12-07-2024

Sounds good, thanks David!
14-07-2023

I've re-assigned this to [~chagedorn] for an underlying fix to the decoder sometime. Meanwhile I will file a new RFE and split the test as suggested.
13-07-2023

We haven't seen this failure on our machines, yet. Maybe because we are using GCC 11.2 and not 11.3. Anyhow, since this failure is in tier1, I'll increase the priority to P3 for now. The best solution would be to fix the decoder but that might be more difficult if the "address - 1" approach does not work. It sure needs some more time to investigate. In the meantime, I guess we have two options how to proceed if there is too much noise in testing: - Problemlist TestDwarf for the affected architectures - Update TestDwarf to only test the source information (it now assumes that the decoder is always able to find a method name). However, it would still be nice to have such a decoder test, so maybe we can split TestDwarf into a DWARF only and a decoder only test and problemlist the latter. Either way, I think we should fix the misleading exception message in the test (could be done separately).
14-04-2023

stderr from linux_x86_64 failure, we use gcc 11.3.0 [gcc (GCC) 11.3.0] in case that makes a difference: java.lang.RuntimeException: Could not find filename or line number in "V [libjvm.so+0xa8ec1f] (debug.cpp:291)" at jdk.test.lib.Asserts.fail(Asserts.java:594) at TestDwarf.checkNoSourceLine(TestDwarf.java:185) at TestDwarf.runAndCheck(TestDwarf.java:150) at TestDwarf.test(TestDwarf.java:104) at TestDwarf.main(TestDwarf.java:91) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) at java.base/java.lang.reflect.Method.invoke(Method.java:578) at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:125) at java.base/java.lang.Thread.run(Thread.java:1630) The text printed by the assertion is a bit misleading, should probably mention what really fails ( because the filename and line number are present in the string).
14-04-2023

We see this reproducably in our SapMachine CI for linux x86_64 and linux Alpine/Musl x64, for instance here: https://ci.sapmachine.io/job/build-21-pr-validation-linux_x86_64/54/testReport/ https://ci.sapmachine.io/job/build-21-pr-validation-linux_alpine_x86_64/52/testReport/ So it is not only x86_32 related. I'll change the title of this bug accordingly. It would be nice if this issue could be handled with priority or the test exclusion could be extended to linux x86_64 otherwise.
14-04-2023

I'm doing the same thing in the DWARF parser at a call-site: I walk through "address -> source information" mappings. When I find the mapping for the address of the PC, I look back at previous mappings and find the last one with an address smaller than PC (i.e. the address of the call) and go with that one. That has worked quite well so far and in this regard, using DWARF 5 for the source information will probably not be necessary. I guess we could indeed do something similar for the decoder and try "address - 1" at call-sites instead as suggested by [~kbarrett].
11-04-2023

Using return address - 1 rather than trying to figure out exactly where the call instruction starts seems worthy of consideration. In fact, others have used that approach. For example: https://chromium.googlesource.com/external/github.com/abseil/abseil-cpp/+/HEAD/absl/debugging/internal/examine_stack.cc DumpPCAndSymbol symbolizes pc-1 because of noreturns. The full comment is // Symbolizes the previous address of pc because pc may be in the // next function. The overrun happens when the function ends with // a call to a function annotated noreturn (e.g. CHECK). // If symbolization of pc-1 fails, also try pc on the off-chance // that we crashed on the first instruction of a function (that // actually happens very often for e.g. __restore_rt).
10-04-2023

noreturn support was added in dwarf-5: https://dwarfstd.org/issues/140331.1.html Issue 140331.1: C11 _Noreturn function specifier attribute DW_AT_noreturn added in Dwarf version 5 We are explicitly using dwarf-4 make/autoconf/flags-cflags.m4 when TOOLCHAIN_TYPE is gcc CFLAGS_DEBUG_SYMBOLS="-g -gdwarf-4" when TOOLCHAIN_TYPE is clang GDWARF_FLAGS="-gdwarf-4 -gdwarf-aranges" [~chagedorn] mentioned that we are using the return address when decoding the stack. But the return address for a non-returning call seems like it could be completely unrelated to the caller. I don't see any reason why this would be restricted to x86-32 either. It seems like this should be a generic problem that is currently only showing up on x86-32 by happenstance, because of the way the compiler is laying out some code.
07-04-2023

I have the feeling that this is related to the analysis I did for JDK-8305005. There, I noticed that we should decode the call instruction and not the pc pointing at the next instruction after the call. Now with noreturn, this mistake seems to become worse. I've tried to see what I get with llvm-symbolizer for the offset 0x89fc81: hs_err: V [libjvm.so+0x89f9d8] report_fatal(VMErrorType, char const*, int, char const*, ...)+0x78 (debug.cpp:212) V [libjvm.so+0x89fc81] (debug.cpp:271) objdump --syms --demangle -D /home/christian/jdk2/build/linux-x86-debug/jdk/lib/server/libjvm.so | less: 89fc7c: e8 df fc ff ff call 89f960 <report_fatal(VMErrorType, char const*, int, char const*, ...)> 89fc81: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi 89fc88: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi 89fc8f: 90 nop 0089fc90 <blob>: llvm-symbolizer --demangle --functions --obj=/home/christian/jdk2/build/linux-x86-debug/jdk/lib/server/libjvm.so 0x0089fc81 ?? /home/christian/jdk2/open/src/hotspot/share/utilities/debug.cpp:271:19 But when calling frame::print_C_frame() with the offset at the call instruction (0x89fc7c), I get the correct method name: V [libjvm.so+0x89fc7c] report_java_out_of_memory(char const*)+0x10c The same as with: llvm-symbolizer --demangle --functions --obj=/home/christian/jdk2/build/linux-x86-debug/jdk/lib/server/libjvm.so 0x0089fc7c report_java_out_of_memory(char const*) /home/christian/jdk2/open/src/hotspot/share/utilities/debug.cpp:271:19 So maybe, we really need to change the decoder to work with the actual address at the call and not the pc pointing to the instruction after the call to fix these problems. But as [~dlong] has pointed out in JDK-8305005, the decoder uses dladdr() which returns the outer-most frame, while the DWARF parser returns the inner-most frame - this would still be a problem.
04-04-2023

Christian, maybe you have any ideas?
04-04-2023

More debugging: V [libjvm.so+0x8f5181] (debug.cpp:271) For that offset, objdump says: $ objdump --syms --demangle -D ./build/linux-x86-server-fastdebug/jdk/lib/server/libjvm.so | less 008f5070 <report_java_out_of_memory(char const*)>: ... 8f5176: 50 push %eax 8f5177: 68 04 00 00 e0 push $0xe0000004 8f517c: e8 df fc ff ff call 8f4e60 <report_fatal(VMErrorType, char const*, int, char const*, ...)> 8f5181: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi ; <------------ points here 8f5188: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi 8f518f: 90 nop <end>
04-04-2023

[~kbarrett] seems yet another compiler bug with "no return".
04-04-2023

Disabling #define ATTRIBUTE_NORETURN [[noreturn]] in attributeNoreturn.hpp makes the test pass. I suspect that compiler or ELF decoder has a bug with [[noreturn]].
03-04-2023

Stack trace from deliberately crashed VM: ``` Current thread (0xf5e17f20): JavaThread "main" [_thread_in_vm, id=2010191, stack(0xf5f60000,0xf5fb0000)] Stack: [0xf5f60000,0xf5fb0000], sp=0xf5faec30, free space=315k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x15c4672] VMError::report_and_die(int, char const*, char const*, char*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned int)+0x4a2 (debug.cpp:271) V [libjvm.so+0x8f5cf8] report_fatal(VMErrorType, char const*, int, char const*, ...)+0x78 (debug.cpp:212) V [libjvm.so+0x8f5fa1] (debug.cpp:271) V [libjvm.so+0x105fedc] MemAllocator::Allocation::check_out_of_memory()+0x6c (memAllocator.cpp:128) V [libjvm.so+0x1061afc] MemAllocator::allocate() const+0xdc (memAllocator.cpp:85) V [libjvm.so+0xc1bf9f] InstanceKlass::allocate_objArray(int, int, JavaThread*)+0x26f (collectedHeap.inline.hpp:41) V [libjvm.so+0x115e053] oopFactory::new_objArray(Klass*, int, JavaThread*)+0xc3 (oopFactory.cpp:122) V [libjvm.so+0x1298f65] OptoRuntime::new_array_C(Klass*, int, JavaThread*)+0x285 (runtime.cpp:270) Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) v ~RuntimeStub::_new_array_Java 0xee20b514 J 53% c2 TestDwarf.crashOutOfMemory()V (14 bytes) @ 0xee733434 [0xee733260+0x000001d4] j TestDwarf.main([Ljava/lang/String;)V+203 v ~StubRoutines::call_stub 0xee1e6cdb ``` Note the same "debug.cpp:271" line.
03-04-2023

The error message is bogus: it is from the method that checks cases where the main Pattern in TestDwarf.runAndCheck breaks: https://github.com/openjdk/jdk/blob/df819cfa5a0330205fed89923df6dd5f7d5ffb45/test/hotspot/jtreg/runtime/ErrorHandling/TestDwarf.java#L135 The main pattern accepts lines like: V [libjvm.so+0x8f4ed8] report_fatal(VMErrorType, char const*, int, char const*, ...)+0x78 (debug.cpp:212) ...but the failing line is: V [libjvm.so+0x8f5181] (debug.cpp:271) It misses the symbol name.
03-04-2023