JDK-8294316 : SA core file support is broken on macosx-x64 starting with macOS 12.x
  • Type: Bug
  • Component: hotspot
  • Sub-Component: svc-agent
  • Affected Version: 20
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • OS: os_x
  • CPU: x86_64
  • Submitted: 2022-09-23
  • Updated: 2024-02-05
  • Resolved: 2023-06-27
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 22
22 b04Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Sub Tasks
JDK-8294548 :  
Description
It appears that SA no longer works with core files on macosx-x64, I believe starting with macOS 12.x.  macosx-aarch64 seems to be fine, as are earlier versions of macosx-x64. The failure that happens with all the SA core file tests in test/hotspot/jtreg/serviceability/sa is:

ERROR: failed to workaround classshareing
Unable to open core file

I added some debugging code to SA's init_classsharing_workaround(), and it indicated that the cause was related to the fetching of the value of SharedArchivePath from the core file. This is suppose to point to a cstring containing the classes.jsa path, but instead seemed to contain garbage. I modified hotspot to print out  &SharedArchivePath, SharedArchivePath, and the cstring it points to:

log_info(cds)("Got default archive path: %p %p %s", &SharedArchivePath, SharedArchivePath, SharedArchivePath);

When SA fails to open the core file, I see:

[0.003s][info][cds] Got default archive path: 0x10faccb30 0x6000008b8010 /System/Volumes/Data/mesos/work_dir/jib-master/install/2022-09-22-2232312.chris.plummer.jdk/macosx-x64-debug.jdk/jdk-20/fastdebug/lib/server/classes.jsa

This all looks fine. However, SA looks up the "SharedArchivePath" symbol to get its address, so in turn it can get its value, which then points to the classes.jsa path. So I also modified SA to print out this info:

      printf("sharedArchivePathAddrAddr(%p)\n", (void*)sharedArchivePathAddrAddr);
      printf("sharedArchivePathAddr        (%p)\n", (void*)sharedArchivePathAddr);

In the passing test cases it would match up with the CDS log output above. When it fails you get something different:

Opening core file, please wait...
hsdb>
sharedArchivePathAddrAddr(0x10f881b30)
sharedArchivePathAddr        (0x7364616572687420)

sharedArchivePathAddrAddr should match the hotspot &SharedArchivePath output, but it doesn't. SA is doing a symbol table lookup to get this value, so there appears to be a bug in SA's mach-o symbol table handling code.

This problem has gone unnoticed because we have problem listed all core file testing on macoxx-x64 for probably a year now due to occasional issues with timeouts (slow core dumps). This issue seems to only be happening on 12.3.1, 12.4 and 12.5.1 host, and happens every time on these hosts, so likely the issue was introduced with  macOS 12.

I'm not seeing this on macos-aarch64, although on occasion I was seeing the same "ERROR: failed to workaround classshareing" failure message. However, I believe it was for a different reason. From what I could tell with some debugging I did with lldb, it looked like the memory where SharedArchivePath pointed to was not in the core file. However, for some reason I can't reproduce this anymore. It could be related to JDK-8293563, which is caused by the java heap not being in the core file. Possibly sometimes other areas of memory are also missing.

Note if you try using -Xshare:off, you still see this same issue with SharedArchivePath, even though SA should not need to access it. This is because SA first accesses UseSharedSpaces to see if it is 0 or 1. It should be 0, but due to the same issue we see with SharedArchivePath (symbol lookup not working properly), UseSharedSpaces could contain anything, and usually it is not 0. To work around this I forced SA to just quickly exit init_classsharing_workaround() no matter what UseSharedSpaces is set to. This caused SA to instead fail at a later point during intialization when trying to lookup some hotspot types. It does so through vmstructs, which SA accesses via other global symbols that it appears SA is not looking up properly. So it appears that in general SA's symbol table lookups are broken with core files on 12.x, and it is not just just some global symbosl.
Comments
Changeset: 269852b9 Author: Tom Rodriguez <never@openjdk.org> Date: 2023-06-27 19:57:06 +0000 URL: https://git.openjdk.org/jdk/commit/269852b90634aa43d4d719c93563608e42792fc6
27-06-2023

A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/14569 Date: 2023-06-20 18:05:09 +0000
20-06-2023

I believe I have a fix for this. I was investigating a mac core and needed working SA support. The problem seems to be that the core file contains a randomly sized section just before the actual mapping for the beginning of the core file. The otool -l output looks like this: Load command 82 cmd LC_SEGMENT_64 cmdsize 72 segname vmaddr 0x00000001076ea000 vmsize 0x0000000000008000 fileoff 5267456 filesize 0 maxprot 0x00000007 initprot 0x00000001 nsects 0 flags 0x0 Load command 83 cmd LC_SEGMENT_64 cmdsize 72 segname vmaddr 0x00000001082e3000 vmsize 0x0000000000b84000 fileoff 5267456 filesize 12075008 maxprot 0x00000007 initprot 0x00000005 nsects 0 flags 0x0 The fix seems to be to ignore load commands with filesize == 0 when looking for the library mappings so that you select the real mapping. diff --git a/src/jdk.hotspot.agent/macosx/native/libsaproc/ps_core.c b/src/jdk.hotspot.agent/macosx/native/libsaproc/ps_core.c index 721eb625797..508df64696e 100644 --- a/src/jdk.hotspot.agent/macosx/native/libsaproc/ps_core.c +++ b/src/jdk.hotspot.agent/macosx/native/libsaproc/ps_core.c @@ -297,13 +297,17 @@ static bool read_core_segments(struct ps_prochandle* ph) { print_debug("failed to read LC_SEGMENT_64 i = %d!\n", i); goto err; } - if (add_map_info(ph, fd, segcmd.fileoff, segcmd.vmaddr, segcmd.vmsize, segcmd.flags) == NULL) { - print_debug("Failed to add map_info at i = %d\n", i); - goto err; + // The base of the library is offset by a random amount which ends up as a load command with a + // filesize of 0. This must be ignored otherwise the base address of the library is wrong. + if (segcmd.filesize != 0) { + if (add_map_info(ph, fd, segcmd.fileoff, segcmd.vmaddr, segcmd.vmsize, segcmd.flags) == NULL) { + print_debug("Failed to add map_info at i = %d\n", i); + goto err; + } + print_debug("LC_SEGMENT_64 added: nsects=%d fileoff=0x%llx vmaddr=0x%llx vmsize=0x%llx filesize=0x%llx %s\n", + segcmd.nsects, segcmd.fileoff, segcmd.vmaddr, segcmd.vmsize, + segcmd.filesize, &segcmd.segname[0]); } - print_debug("LC_SEGMENT_64 added: nsects=%d fileoff=0x%llx vmaddr=0x%llx vmsize=0x%llx filesize=0x%llx %s\n", - segcmd.nsects, segcmd.fileoff, segcmd.vmaddr, segcmd.vmsize, - segcmd.filesize, &segcmd.segname[0]); } else if (lcmd.cmd == LC_THREAD || lcmd.cmd == LC_UNIXTHREAD) { typedef struct thread_fc { uint32_t flavor; With this fix I was able to successfully open core files. Looking at the live process with vmmap shows the library mapping with the extra offset but doesn't seem to show any mapping related to the filesize == 0 part.
20-06-2023

ILW=MLH=P4
27-09-2022