JDK-5040096 : Vtest/Vmark fail after 6 hrs run on windows2003 AMD 64bits with C2 flag
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 5.0
  • Priority: P1
  • Status: Closed
  • Resolution: Fixed
  • OS: windows_2003
  • CPU: x86
  • Submitted: 2004-04-29
  • Updated: 2004-07-02
  • Resolved: 2004-06-21
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
5.0 b57Fixed
Related Reports
Relates :  
Relates :  
Relates :  
JDK/VM version
vm_info: Java HotSpot(TM) 64-Bit Server VM (1.5.0-beta2-b49) for windows-amd64, 
built on Apr 28 2004 01:31:31 by "java_re" with unknown MS VC++:1400

Windows_NT JTG-AMD2 5 02 586

Error message
# An unexpected error has been detected by HotSpot Virtual Machine:
#  EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x0000000077fa0c46, pid=2572, t
# Java VM: Java HotSpot(TM) 64-Bit Server VM (1.5.0-beta2-b49 mixed mode)
# Problematic frame:
# C Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
J  java.net.SocketInputStream.socketRead0(Ljava/io/FileDescriptor;[BIII)I
J  java.net.SocketInputStream.read([BII)I
J  java.io.BufferedInputStream.read()I
J  COM.volano.e.run()V
J  java.lang.Thread.run()V
v  ~I2CAdapter
v  ~StubRoutines::call_stub

Please telnet with root/admin and change to d:\tmp directory to check it.

CONVERTED DATA BugTraq+ Release Management Values COMMIT TO FIX: tiger-rc FIXED IN: tiger-rc INTEGRATED IN: tiger-b57 tiger-rc VERIFIED IN: tiger-rc

EVALUATION I have not been able to reproduce it yet. Volano mark is a total resource hog and when I run it, it fails due to various things like running out of port numbers. Looking at the stack trace, it appears to be failing in ntdll.dll rather than winsock, so it might be as a result of a problem with the malloc() in the socketRead0 routine. This code has not changed in a long time, so I doubt there is a bug in it. I will continue trying to reproduce it. ###@###.### 2004-05-07 Ok. A Windb stack trace shows it is definitely crashing in malloc. Also, interestingly, it is not the server which crashes, rather the client which is run repeatedly as a standalone application, and which runs successfully several hundred times before the crash happens. I am pretty confident that this is not a JDK bug, but is an OS bug. malloc should never crash regardless of what parameters it is given. ###@###.### 2004-05-11 Stack trace shown below: ntdll!NtRaiseHardError+0xa USER32!MB_GetString+0x5f7 USER32!SoftModalMessageBox+0xefc USER32!MessageBoxTimeoutA+0x17d USER32!MessageBoxA+0x50 jvm!os::message_box+0x15 jvm!VMError::show_message_box+0x89 jvm!VMError::report_and_die+0x137 jvm!topLevelExceptionFilter+0x40a ntdll!RtlQueryProcessDebugInformation+0xcbe ntdll!RtlLookupFunctionEntry+0x69 ntdll!KiUserExceptionDispatcher+0x2d ntdll!RtlQueryProcessBackTraceInformation+0xc7b9 ntdll!RtlAllocateHeap+0xe5 MSVCRT!malloc+0x3a net!Java_java_net_SocketInputStream_socketRead0+0xb6 0x7ff`bf0bc9ad 0x2`00000001 0x7ff`bf0a0904 0x7ff`b39df3a8 ###@###.### 2004-05-11 I have run several experiments attempting to isolate the cause of this failure. I have reproduced this failure on three different builds of Windows Server2003 1068, 1073 and 1184. This bug reproduces in interpreted mode -Xint. This bug fails on B48 in addition to B49 where it was originally reported. I've attempted to use the Microsoft Debugging Utilities to performance a heap verification on each malloc / free call but since this slows down the test, it does not fail. I've also run the test on the Microsoft Checked build and it did not fail. When it does crash, I get the same stack dump in the failing thread as Michael reports. One interesting note is that the Volano client appears to be in the process of terminating its connection threads because there are only 52 threads left at the time of the crash and normally there are over 400. I've scanned the VM sources involved in thread termination and I don't see any race conditions involved in freeing memory when a GC could occur, etc. This may be a coincidence, but I've done a successful overnight run on two different systems using the switch -XX:+UseDefaultStackSize. This causes the memory for the thread stacks to be reserved and not committed. I don't understand why this would fix this problem but might be a clue for determining what is going on. I don't agree with the statement that the OS should never get a segv from within malloc. If a program corrupts the C Heap by writing garbage to heap header data structures, this would cause the crash that we are seeing. ###@###.### 2004-05-19 I have been able to reproduce the exact crash reported here with a small C++ test case. This is a bug in Windows. Compile the small test case with "cl /MD mallocbasher.cpp" and run the test case on a multi-cpu Windows Server 2003 AMD64 box and it will crash in a few minutes. I submitted a bug on microsoft's beta web site at beta.microsoft.com. The bug number is 154243817. ###@###.### 2004-06-07 It turns out that Microsoft expects these Access Violations. They are using structured exception handling (try/except blocks) in the malloc library and need to get control on AV's caused by their code. Since we are using Vectored Exception Handing rather than SEH, we see this AV first and report it as a fatal error. The full and correct fix for this problem is to stop using Vectored Exceptions and support SEH in compiled code. This will require significant effort/testing/risk. We must register all dynamically generated code in the VM using RtlInstallFunctionCallback or RtlAddFunctionTable APIs. Since it is late in the 1.5 release, I will put a short term fix into 1.5 which will pass AV's on from our exception handling code if the AV is generated from NTDLL.DLL. I will open up a new bug to keep track of this issue. ###@###.### 2004-06-14