JDK-6583347 : compute_exception_return_address() can fail during stack overflow.
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version:
    1.4.2_12,1.4.2_13,1.4.2_14,1.4.2_16,1.4.2_17 1.4.2_12,1.4.2_13,1.4.2_14,1.4.2_16,1.4.2_17
  • Priority: P3
  • Status: Closed
  • Resolution: Fixed
  • OS: windows,windows_2003
  • CPU: x86
  • Submitted: 2007-07-20
  • Updated: 2012-10-08
  • Resolved: 2008-05-01
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other Other Other
1.4.2_17-rev b12Fixed 1.4.2_18-revFixed 1.4.2_19Fixed
Related Reports
Duplicate :  
Relates :  
Relates :  
Description
The symptom is a JVM crash, with hs_err like:

#
# An unexpected error has been detected by HotSpot Virtual Machine:
#
#  Internal Error (53484152454432554E54494D450E435050017C), pid=2448, tid=3628
#
# Java VM: Java HotSpot(TM) 64-Bit Server VM (1.4.2_12-b03 mixed mode)

---------------  T H R E A D  ---------------

Current thread (0x00000000085f4ce0):  JavaThread "SAPEngine_Application_Thread[impl:3]_11" [_thread_in_Java, id=3628]

Stack: [0x000000000a1d0000,0x000000000a3d0000)
[error occurred during error reporting, step 110, id 0xe0000000]


[error occurred during error reporting, step 120, id 0xe0000000]


----------------------

sharedRuntime.cpp, line 380: where we fail a guarantee.


Originally the customer (using a 1.4.2 64-bit JVM on Windows/AMD64) reported they are suffering from 6511772, but there are differences.

Actually their crashes had a slightly similar stack to that in 6511772, but not quite the same. e.g. (wide note)

0:042> kv
Child-SP          RetAddr           : Args to Child                                                           : Call Site
...
00000000`0c1d8d40 00000000`080b323d : fffffadf`33f9e310 fffff800`0105ca27 fffff680`00854a38 fffffadf`3acc9c20 : jvm!VMError::report_and_die+0x14a
00000000`0c1d8df0 00000000`08246cf7 : fffffadf`00000003 fffffadf`3aefb580 00000000`00000000 fffffadf`246c8c70 : jvm!report_fatal+0x4d
00000000`0c1d8e90 00000000`029bdc2e : 00000000`00000001 00000000`0c1d2fa8 fffffadf`246c8628 00000000`00000000 : jvm!SharedRuntime::compute_exception_return_address+0x107
00000000`0c1d8f00 00000000`00000001 : 00000000`0c1d2fa8 fffffadf`246c8628 00000000`00000000 00000000`077480b0 : 0x29bdc2e
00000000`0c1d8f08 00000000`0c1d2fa8 : fffffadf`246c8628 00000000`00000000 00000000`077480b0 00000000`00000100 : 0x1
00000000`0c1d8f10 fffffadf`246c8628 : 00000000`00000000 00000000`077480b0 00000000`00000100 00000000`2dcf20a7 : 0xc1d2fa8
00000000`0c1d8f18 00000000`00000000 : 00000000`077480b0 00000000`00000100 00000000`2dcf20a7 00000000`875c1030 : 0xfffffadf`246c8628
00000000`0c1d8f20 00000000`077480b0 : 00000000`00000100 00000000`2dcf20a7 00000000`875c1030 00000000`ffff5e58 : 0x0
00000000`0c1d8f28 00000000`00000100 : 00000000`2dcf20a7 00000000`875c1030 00000000`ffff5e58 00000001`00c76448 : 0x77480b0
00000000`0c1d8f30 00000000`2dcf20a7 : 00000000`875c1030 00000000`ffff5e58 00000001`00c76448 00000000`98109a00 : 0x100
00000000`0c1d8f38 00000000`875c1030 : 00000000`ffff5e58 00000001`00c76448 00000000`98109a00 00000000`9836a428 : 0x2dcf20a7
00000000`0c1d8f40 00000000`ffff5e58 : 00000001`00c76448 00000000`98109a00 00000000`9836a428 00000000`976250f8 : 0x875c1030
00000000`0c1d8f48 00000001`00c76448 : 00000000`98109a00 00000000`9836a428 00000000`976250f8 00000000`00000052 : 0xffff5e58
(stack retrace obviously not accurate after the native methods)

Although we don't have a large sample of memory dumps on the crashes, what we do see is a crash in compute_exception_return_address(), but in 6511772, they are getting there from CompiledCodeSafepointHandler::handle_illegal_instruction_exception. 

The Solaris crash originally reported in 6511772 was not a stack overflow situation (plenty of room left!)


So, this CR is to describe the failure of SharedRuntime::compute_exception_return_address to 
There is <24k of available space on the stack.
The Java/compiled code has filled the stack, and faulted.
But it hasn't been recognised as a stack overflow error.
compute_execption_return_address() has failed to match up this PC to a method...

Comments
EVALUATION Customer happy with -Xss to workaround this problem. Configurations that reproduce the crash are not available, will mark this not reproducible.
31-07-2007

WORK AROUND -Xss2m was used already in all cases, when the failure happened. So, customer used -Xss4m as workaround. Probably, -Xss3, will work as well, but we don't know.
27-07-2007

WORK AROUND -Xss2M or any large enough stack size to avoid stack overflows in Java threads.
20-07-2007