JDK-8184042 : several serviceability/sa tests timed out on MacOS X
  • Type: Bug
  • Component: hotspot
  • Sub-Component: svc-agent
  • Affected Version: 10
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • OS: os_x
  • CPU: x86_64
  • Submitted: 2017-07-09
  • Updated: 2018-10-17
  • Resolved: 2017-10-17
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 10
10 b31Fixed
Related Reports
Duplicate :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Sub Tasks
JDK-8185872 :  
JDK-8191961 :  
Description
Several serviceability/sa tests timed out in the 2017-07-06
JDK10-hs nightly on MacOS X:

serviceability/sa/JhsdbThreadInfoTest.java
serviceability/sa/TestInstanceKlassSize.java
serviceability/sa/TestInstanceKlassSizeForInterface.java
serviceability/sa/TestPrintMdo.java
serviceability/sa/jmap-hprof/JMapHProfLargeHeapTest.java

The first four tests use LingeredApp; the last does not. I don't
know if that's a factor or not. I poked around the environment
and process information for the machine and I don't see any
obvious signs of resource exhaustion or overloading.
Comments
The gdb (mostly under <gdb_source_root>/gdb/darwin*) and lldb (mostly under <lldb_source_root>/source/Plugins/Process/Darwin/) sources have mostly been my reference points for the APIs used. From what I have gathered, these debuggers still use ptrace along with the exception/message handling mechanism to control the debuggee. The other references I used are: http://web.mit.edu/darwin/src/modules/xnu/osfmk/man/ and the book: Mac OS X and iOS Internals: To the Apple's Core By: Jonathan Levin
24-08-2017

This seems to be a very intrusive change and my own view is that we should restore the deprecated version for now and disable the deprecation warning. That gives more time to investigate what the current preferred API's are on OS X for this kind of thing as it seems ptrace is a historical API that is no longer intended to be used. I was unable to find any official documentation regarding the deprecation of PT_ATTACH and was surprised that even the manpages that refer to it leave the bulk of the documentation incorrect, as it still refers to the use of the signals!
24-08-2017

This fix will require careful reviews and is likely to take some time. Considering the limited scope of the effects of this bug and the fact that all tests are already quarantined it has been decided to not consider this a blocker for integration at this point.
18-08-2017

The modifications for fixing this include the following steps: 1. While attaching, a. allocate an exception port to receive exceptions for the target process b. save the existing exception ports registered with the target process (for later restoration while detaching from the process). c. register the newly created exception port with the target process, so that this new exception port is in place to receive the exception that would arrive as a part of ptrace attach with PT_ATTACHEXC. d. then proceed to invoke ptrace with PT_ATTACHEXC. This should cause the kernel to deliver the "mach soft signal" (EXC_SOFTWARE) exception to the controlling process, (SA in this case) (with the code EXC_SOFT_SIGNAL and sub-code SIGSTOP). e. wait for the exception message from the kernel with the mach_msg call (MACH_RCV_MSG) f. Invoke the kernel reported exception handling routine, mach_exc_server() to parse the received exception message. This routine invokes one of the catch_mach_exception_raise(), catch_exception_raise_state() or catch_exception_raise_state_identity() routines. The received message is parsed into arguments passed to these routines by mach_exc_server(). With MacOSX 64 bit, the mach_exc_server() routine is not automatically available, but has to be generated using the Mach Interface Generator (mig) from /usr/include/mach/mach_exc.defs. I ran it thus to generate the files: mach_exc.h, mach_excServer.c, and mach_excUser.c mig -v /usr/include/mach/mach_exc.defs The execution of this mig command to generate the above mentioned files needs to be included as a part of the build process. This is not done as a part of this fix, but will be taken up as a part of JDK-8186427. At this point, the above mentioned generated files are being included in the repository. g. suspend all the threads in the task with task_suspend(). 2. While detaching, the following steps are being followed. a. Invoke ptrace with PT_DETACH on the target process, causing the threads in the target (except the one triggering the exception) to resume execution. b. restore the pre-saved exception ports registered with the target process. c. Reply to the previous "mach soft signal" exception, since unless this acknowledgement is sent, the thread raising the exception (in the target) remains suspended. The reply message to be sent is obtained from the previous call to mach_exc_server(). d. release the exception port allocated while attaching.
18-08-2017

Move integration_blocker label to the test quarantine sub-task. Update: The bug fix that caused these failures (JDK-8182299) was only pushed to JDK10/hs so this bug is still an integration_blocker.
04-08-2017

OS X version 10.2 introduced the PT_ATTACHEXC command for ptrace, with which mach exceptions (as opposed to just UNIX signals) are delivered via mach messages. And with the newer versions of macOS or OS X, the old PT_ATTACH form is deprecated. SA hangs at waitpid() waiting for a signal which doesn't arrive (at least not in the form of a signal). To handle this, SA needs to register an exception port with the target process for notifications for mach exceptions, and listen on this port for these.
19-07-2017

From the man page of ptrace: (under PT_ATTACHEXC) Note that this call differs from the prior call ( PT_ATTACH) in that signals from the child are delivered to the parent as Mach exceptions (see EXC_SOFT_SIGNAL).
17-07-2017

SA attach hangs on MacOS X. This seems to have been caused due to the following change. (in src/jdk.hotspot.agent/macosx/native/libsaproc/MacosxDebuggerLocal.m done as a part of 8182299: Enable disabled clang warnings, build on OSX 10 + Xcode 8 ) - if ((res = ptrace(PT_ATTACH, pid, 0, 0)) < 0) { - print_error("ptrace(PT_ATTACH, %d) failed with %d\n", pid, res); + if ((res = ptrace(PT_ATTACHEXC, pid, 0, 0)) < 0) { + print_error("ptrace(PT_ATTACHEXC, %d) failed with %d\n", pid, res); return false; Reverting the PT_ATTACHEXC to PT_ATTACH gets the attach working.
13-07-2017