JDK-8261702 : ClhsdbFindPC can fail due to PointerFinder incorrectly thinking an address is in a .so
  • Type: Bug
  • Component: hotspot
  • Sub-Component: svc-agent
  • Affected Version: 17
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2021-02-13
  • Updated: 2021-03-02
  • Resolved: 2021-02-21
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 17
17 b11Fixed
Related Reports
Duplicate :  
Relates :  
Relates :  
Relates :  
Description
I had one test failure of ClhsdbFindPC that I could not reproduce again, but I'm pretty sure I know the cause. The part of the test that failed was using findpc on a pc address from the jstack output, and that address was in the interpreter. However, findpc found it at some large offset from the start of a .so without a symbol match.

The cause is due to a known issue with the Linux DSO.java support. A DSO is created for each .so. There is a bug that results in the size given to the created DSO to end up being too large. It is the size of the file rather than the size of the actual mapped in segments. For the most part this is harmless, and when an address is in a .so, findpc will find the proper .so and symbol for it. However, if the address is just outside of the .so, PointerFinder can think it is inside of it. Here's the relevant code:

        loc.loadObject = cdbg.loadObjectContainingPC(a);
        if (loc.loadObject != null) {
            loc.nativeSymbol = loc.loadObject.closestSymbolToPC(a);
            return loc;
        }

loadObjectContainingPC(a) returns the DSO that contains the address. As mentioned above, this might not always be accurate. If the address is not actually in the returned DSO, but is in some DSO, then the call to loadObject.closestSymbolToPC() will still work properly because it actually ignores the DSO info and just searches all DSO's from native to find the symbol in the correct DSO, even if it was not the specified DSO.

However, if the address is not in and DSO, yet a DSO was returned by loadObjectContainingPC(), then there are problems. The most likely way this will happen is when specifying an address just after the last DSO in memory. In the case of this bug the address was in the interpreter. However, because this code thinks it is in a DSO (and we haven't yet done the check to see if it is in the interpreter), we end up calling loadObject.closestSymbolToPC(a). This will fail to find a symbol match, but that is not an indication of failure because not all addresses in a DSO have a symbol associated with them. So then we later end up in the code that prints the findpc info for the address:

        if (nativeSymbol != null) {
            String name = nativeSymbol.getName();
            if (cdbg.canDemangle()) {
                name = cdbg.demangle(name);
            }
            tty.print(name);
            diff = nativeSymbol.getOffset();
        } else {
            tty.print(loadObject.getName());
            diff = addr.minus(loadObject.getBase());
        }
        if (diff != 0L) {
            tty.print(" + 0x" + Long.toHexString(diff));
        }

in this case nativeSymbol is null, so we just end up printing the name of the DSO plus the offset from the start of the DSO. The test fails because what it expected was for "In interpreter codelet:" to appear in the output.

The workaround for this DSO bug is simple. Just have PointerFinder defer searching DSOs for the address to be the last thing it does. That way we will first attempt to find the address in the interpreter.
Comments
Changeset: 539c80bf Author: Chris Plummer <cjplummer@openjdk.org> Date: 2021-02-21 18:59:02 +0000 URL: https://git.openjdk.java.net/jdk/commit/539c80bf
21-02-2021

Failures with "'In java stack' missing from stdout/stderr" and the following in the output should be filed under JDK-8261929: stderr: [ + findpc 0x0000000000000000 Address 0x0: In unknown location
18-02-2021

This issue has now been reproduced by someone else. See JDK-8261844. It includes the log output which I had lost by the time I filed this bug. As you can see below, it was trying to find an NMethod, but instead found a very large offset off of libjvm.so. stderr: [ + findpc 0x00002b28f0f2a284 Address 0x00002b28f0f2a284: /work/shared/mirrors/src_clones/jdk/jdk_baseline.git/build/linux-x86_64-normal-server-fastdebug/images/jdk/lib/server/libjvm.so + 0xff50284 ] exitValue = -1 java.lang.RuntimeException: Test ERROR java.lang.RuntimeException: 'In code in NMethod for LingeredAppWithTrivialMain.main' missing from stdout/stderr
16-02-2021