JDK-8006423 : SA: NullPointerException in sun.jvm.hotspot.debugger.bsd.BsdThread.getContext(BsdThread.java:67)
  • Type: Bug
  • Component: hotspot
  • Sub-Component: svc
  • Affected Version: hs24,hs25
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • OS: os_x
  • Submitted: 2013-01-16
  • Updated: 2013-06-26
  • Resolved: 2013-02-08
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 7 JDK 8 Other
7u40Fixed 8Fixed hs24Fixed
Related Reports
Relates :  
In many cases when running the SA version of JStack (or other SA tools) an NPE is thrown in BsdThread.getContext(). The underlaying cause is that SA fails to read the context of the thread in the native method getThreadIntegerRegisterSet0() (thread_get_state returns an error). 

SA gets the list of C++ thread objects by looking up the Threads::_thread_list field and reading the data from there. The thread port is needed to get the register values for the threads (which in turn is needed for walking the stack). SA then combines this information in output from, for example, "jstack -F".

That all makes perfect sense. Get the task port then get the thread ports from that and then query the thread state etc. I guess what I'm unclear about is what the SA wants to do with a thread port such that we somehow need to map it back to the C++ Thread object in the other process? I'm not even sure how we communicate this C++ object information?

I'm going to quote some more. These are snippets from chapter 6 and 9. "Rights are owned at the task level. For example, although the code to create a port executes in a thread, the associated rights are granted to the thread���s task. Thereafter, any other thread within that task can use or manipulate the rights." "The namespace for ports is per-task private���that is, a given port name is valid only within the IPC space of a task." "We saw earlier that ports are used to represent both tasks and threads. When a task creates another task or a thread, it automatically gets access to the newly created entity���s port. Since port ownership is task-level, all per-thread ports in a task are accessible to all threads within that task. A thread can send messages to other threads within its task���say, to suspend or resume their execution. It follows that having access to a task���s port implicitly provides access to all threads within that task. The converse does not hold, however: Having access to a thread���s port does not give access to its containing task���s port." "Since a port is a per-task resource, all threads within a task automatically have access to the task���s ports. A task can allow other tasks to access one or more of its ports. It does so by passing port rights in IPC messages to other tasks. Moreover, a thread can access a port only if the port is known to the containing task���there is no global, system-wide port namespace." So to summarize: Having access to a task means that you have access to the threads within that task. But a port's name (or id) is task-private and not valid outside that task. For SA this means that the call to task_for_pid() gets us access to both the task and the threads. It also means that the port names (mach_thread_self() / _thread_id) inside Hotspot cannot be reused in SA to access the threads. Instead SA has to use task_threads() to get all the thread ports within the task. Incidentally, this is also what lldb does.

Argh. Confused by "task" and "thread" again... > Is this perhaps a simple permission issue? If the permissions are wrong, we will fail at task_for_pid(). This has been a source of many problems in the past with the need to compile binaries in a special way and to sign them. I think these problems are well understood and sorted out. See: https://wikis.oracle.com/display/OpenJDK/Mac+OS+X+Port+Using+jsadebug%2C+jinfo%2C+jmap https://jbs.oracle.com/bugs/browse/JDK-7129704 https://jbs.oracle.com/bugs/browse/JDK-7185367 https://jbs.oracle.com/bugs/browse/JDK-7193201 SA does not currently work when running as root (without my patch).

But the task's self-port (to be used by debuggers) is not the same as a thread's self-port. Though somehow a debugger must be able to access the threads as well. Is this perhaps a simple permission issue? http://sourceware.org/gdb/wiki/BuildingOnDarwin Does the SA work if you run it as root?

Experiments show that the kernel port is he same as the port returned by mach_thread_self(), i.e thread_get_kernel_port() returns the same thing as mach_thread_self(). Quoting from "Mac OS X Internals", 9.3.3: "A self port���also known as the task���s kernel port���represents the task itself. The kernel holds receive rights to this port. The self port is used by the task to invoke operations on itself. Other programs (such as debuggers) wishing to perform operations on a task also use this port." This indicates that SA should indeed use mach_thread_self() as it does today, however this doesn't work... Confused.

But again, is a thread's kernel port not system-wide as opposed to the task-based thread ports?

Reading a bit more (especially in "Mac OS X Internals" by Amit Singh - an old fashion thing called a "book") it's clear that a port right is specific to a task (a process). So a port tight has to be specifically assigned to the task that needs to use it. Hotspot can't assign a port right to SA, since Hotspot doesn't know what process SA is or even if it exists. SA gets the right to the Hotspot task by the task_for_pid() call. My proposed patch then uses task_threads() to get the send rights to the threads in the process. These send rights will be different from the ones that Hotspot allocates. Thus the need to correlate them.

Found some more documentation that I need to digest: https://developer.apple.com/library/mac/#documentation/Darwin/Conceptual/KernelProgramming/Mach/Mach.html http://www.gnu.org/software/hurd/microkernel/mach/port.html

Well as I said I thought the SA did this using IPC to invoke in-process code. The Mach API docs are rather terse with little conceptual discussion. Nothing clearly indicates that the port send right of a thread_t is either global or constrained to it's task. Ports and IPC do seem to be used together a lot so it seems a little odd that the thread_t port would be local. I don't really understand the proposed solution as I can't tell what is happening on which process. But I was wondering if use of the threads kernel port might work instead of storing the (presumably task based) thread port: thread_get_kernel_port ?

It has to be OS specific APIs since SA is essentially acting as a debugger on another process. On OS X we use the Mach API to do this. Some links are in a previous comment.

Are you saying that on linux, for example, we directly use the pthread_t obtained in one process in a second process (to access the original thread) ? That is not valid under the pthreads specification. I thought the SA used IPC to communicate with the original process to query thread state etc.

The thread "id" we store is a handle to that thread on that system *in that process*. Any other process wanting access to that thread needs a different id or handle to access the thread. (At least this is my understanding of the problem). "Remote" in my note above refer to a different process on the same system, not on a different system. Bad wording, sorry. As far as I know there is no pthread API to read the context of a thread in a different process. Pthread on OS X is built on top of the mach api.

Ah I see - we're using OS specific API's for this. That is unfortunate. So what is the OS X API we use in place of PTrace?

On linux we use ptrace to access the second process. Ptrace uses the result of os::Linux::gettid() to identify the thread. Pthreads is not involved.

I don't quite understand. Whatever we store as the thread "id" is a handle to that thread on that system. When SA requests information about a specific thread it is asking on the system in question so the thread "id" should still be valid (or no less than valid than the result of thr_self or pthread_self on other platforms). There is no remote access to a thread per-se. That aside why do we resort to use the Mach thread API here instead of the pthreads API?

Links to some documentation of these functions: http://www.gnu.org/software/hurd/gnumach-doc/Thread-Execution.html http://www.gnu.org/software/hurd/gnumach-doc/Thread-Information.html

The following is my understanding of what the cause is and a suggestion for a fix - my experience with OS X is a bit limited so I may be off on some details. thread_get_state() takes a thread_t as a parameter. The value of this parameter comes from SA reading the value of the OSThread._thread_id field in the Hotspot process being debugged. This value is set in HotSpot to ::mach_thread_self() which is documented as "The mach_thread_self system call returns the calling thread's thread port." My theory is that this "thread port" in not valid when a remote process calls thread_get_state(). Instead, the remote process (SA in this case) needs it's own "thread port" for the thread it wants to access. There is a way to list all the thread ports in a remote process (or "task" as they are called in Mach) via the task_threads() function. So now we have the thread ports, we just need to correlate them with the C++ Thread objects in the Hotspot process. One way to do this correlation is via the stack pointer. We can get the current value of the stack pointer (rsp) in SA and look through all the Thread objects to see which one the stack pointer belongs to (by looking at Thread._stack_base and Thread._stack_size). Another way seems to be to use the thread_info() function with the THREAD_IDENTIFIER_INFO parameter. This gives us a struct which has a field called thread_id. The comment for this field in the thread_info.h file says "system-wide unique 64-bit thread id". The value for this thread_id is the same when called from Hotspot and when called from the remote debugging process (SA), so this looks like a way to do the correlation. This requires Hotspot to store this value in OSThread and SA to first list all the "thread ports", then find the thread_id for each one and select the right "thread port" for the thread we are looking for. Using a thread_id provided by the system seems more reliable than using the stack pointer for correlation.