JDK-8022335 : Native stack walk while generating hs_err does not work on Windows x64
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: hs25
  • Priority: P2
  • Status: Closed
  • Resolution: Fixed
  • OS: windows
  • CPU: x86
  • Submitted: 2013-08-06
  • Updated: 2019-04-29
  • Resolved: 2013-09-06
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 8 Other
8Fixed hs25Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Description
The implementation of stack unwinding in frame_x86.cpp does not handle Windows x64 stack frame structure as generated by the Microsoft VC compiler. The result is that any stack trace in a hs_err file will only contain the first native frame encountered as we will get an invalid IP as we try to find the sender.

In short, on Windows x64 frame pointers are not pushed on the stack, instead each frame has a static size encoded in the PE file header, plus a dynamic size recorded in a dynamic function table. This information can most easily be retrieved through the SymFunctionTableAccess64 function in dbghelp.dll. (A very good description can be found here http://www.codejury.com/a-walk-in-x64-land/)


Example from a hs_err file (also attached):

Stack: [0x0000000018e90000,0x0000000018f90000],  sp=0x0000000018f8e520,  free space=1017k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [jvm.dll+0x27060]  ciInstance::field_value+0x90


Note that the same issue exists in SA with tools like jstack -F and CLHSDB.
Comments
The hsx links above don't work anymore. Here's the updated location of this fix: http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/rev/38f750491293
07-08-2017

The bug description suggested using SymFunctionTableAccess64. This is more low-level than StackWalk64 and should be easier to integrate with the way we unwind the stack. However, the problem with SymFunctionTableAccess64 is it uses internal data structures: In public headers, we have: PVOID WINAPI SymFunctionTableAccess64( _In_ HANDLE hProcess, _In_ DWORD64 AddrBase ); x64: If the image is for an x64 system, the return value is a pointer to an _IMAGE_RUNTIME_FUNCTION_ENTRY structure. typedef struct _IMAGE_RUNTIME_FUNCTION_ENTRY { DWORD BeginAddress; DWORD EndAddress; DWORD UnwindInfoAddress; } _IMAGE_RUNTIME_FUNCTION_ENTRY, *_PIMAGE_RUNTIME_FUNCTION_ENTRY; However, UnwindInfoAddress is not defined in public headers. You can find some information at MSDN (title "struct UNWIND_INFO", under "Exception Handling (x64)") http://msdn.microsoft.com/en-us/library/ddssxxy8.aspx So it's better to use StackWalk64, which is more awkard to integrate with our unwinding code but would be more portable. Note, In HotSpot we already have code that tries to replicate some of these internal data structures in: src/os_cpu/windows_x86/vm/unwind_windows_x86.hpp and they are used in the following file to support Windows structured exception handling in generated code: src/os_cpu/windows_x86/vm/os_windows_x86.cpp ... but doing something like this for printing the stack trace seems an overkill, and I don't want to test all version combinations of Windows + Visual Studio to make sure the code works.
12-08-2013

We're already using DbgHelp to write minidumps when we're crashing so that shouldn't be a problem.
09-08-2013

The comments in 6655385 suggested this was tested on 32-bit and 64-bit, but perhaps that was only referring to the performance regressions (though if this doesn't affect 64-bit how could there be a regression). And there is nothing in the makefiles regarding /Oy- only working on 32-bit. That was a big oversight. Though I do note that the VS 2003 docs don't mention that this is 32-bit only. Does NMT detail break because of this? On linux if we omit the frame pointer it causes a crash. BTW it says StackWalk64 is preferred but it still implies there are way to walk the stack yourself - but perhaps only with the info in the symbol file. Also using StackWalk64 requires linking with the DbgHelp library.
09-08-2013

It's possible that we didn't have any windows 64 bit machines to test this on back then. :(
09-08-2013

If we call stackwalk from the error handler and get the C frames on the top of the stack and bail at the Java frames, that would be good enough for the error report. We print the java frames in the section below.
08-08-2013

/Oy- has never worked, from the MSDN page: "/Oy enables frame-pointer omission and /Oy- disables omission. /Oy is available only in x86 compilers." It doesn't work on x64 at all. I think the problem with StackWalk64 is that we might not be able to walk our Java frames at all with it? I seem to remember Peter trying to use it for SA without luck.
08-08-2013

StackWalk64 may be the way to go in the future but meanwhile I think we need to resolve why /Oy- has stopped working as that will be the much quicker fix.
08-08-2013

From MSDN: It seems StackWalk64 is the recommended method: http://msdn.microsoft.com/en-us/library/windows/desktop/ms680650(v=vs.85).aspx "The StackWalk64 function provides a portable method for obtaining a stack trace. Using the StackWalk64 function is recommended over writing your own function because of all the complexities associated with stack walking on platforms. In addition, there are compiler options that cause the stack to appear differently, depending on how the module is compiled. By using this function, your application has a portable stack trace that continues to work as the compiler and operating system change."
08-08-2013

I wonder if the frame code has been incorrectly modified ...
07-08-2013

This should not be happening. 6655385 added /Oy- to disable the frame pointer omission precisely so we would get useful stack walking. The flag is present for all VS version even through to 2012.
07-08-2013

That test needs further investigation - there are a few different errors hidden in the logs so the test should not be passing!
07-08-2013

The windows equivalent of the -fomit-frame-pointer problem. Is this a new problem or specific to certain compiler versions?
07-08-2013

I have to wonder in that case why this has never previously been noticed - surely the jstack tests will fail because of this?
07-08-2013

AFAIK this is the way it's always worked for Windows x64 with Visual C++/Intel C++ compilers and is not an option/flag.
07-08-2013