United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
Bug ID: JDK-6987812 SAJDI: "gHotSpotVMTypes was not initialized properly in the remote process"
JDK-6987812 : SAJDI: "gHotSpotVMTypes was not initialized properly in the remote process"

Details
Type:
Bug
Submit Date:
2010-09-28
Status:
Closed
Updated Date:
2012-02-01
Project Name:
JDK
Resolved Date:
2011-03-08
Component:
hotspot
OS:
windows_xp,windows_7,windows_2000
Sub-Component:
svc
CPU:
x86,generic
Priority:
P2
Resolution:
Fixed
Affected Versions:
hs19,7
Fixed Versions:
hs20 (b06)

Related Reports
Backport:
Backport:
Duplicate:
Duplicate:
Duplicate:

Sub Tasks

Description
SAPIDAttachingConnector, SACoreAttachingConnector, SADebugServerAttachingConnector are unable to connect to a process on Windows due to:

java.lang.RuntimeException: gHotSpotVMTypes was not initialized properly in the remote process; can not continue

A typical stack is:

[2010-09-21T10:13:09.80] java.io.IOException
[2010-09-21T10:13:09.80] 	at sun.jvm.hotspot.jdi.SAPIDAttachingConnector.attach(SAPIDAttachingConnector.java:126)
...
[2010-09-21T10:13:09.80] Caused by: java.lang.reflect.InvocationTargetException
[2010-09-21T10:13:09.80] 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[2010-09-21T10:13:09.80] 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
[2010-09-21T10:13:09.80] 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[2010-09-21T10:13:09.80] 	at java.lang.reflect.Method.invoke(Method.java:613)
[2010-09-21T10:13:09.80] 	at sun.jvm.hotspot.jdi.SAPIDAttachingConnector.createVirtualMachine(SAPIDAttachingConnector.java:87)
[2010-09-21T10:13:09.80] 	at sun.jvm.hotspot.jdi.SAPIDAttachingConnector.attach(SAPIDAttachingConnector.java:111)
[2010-09-21T10:13:09.80] 	... 5 more
[2010-09-21T10:13:09.80] Caused by: java.lang.RuntimeException: gHotSpotVMTypes was not initialized properly in the remote process; can not continue
[2010-09-21T10:13:09.80] 	at sun.jvm.hotspot.HotSpotTypeDataBase.readVMTypes(HotSpotTypeDataBase.java:118)
[2010-09-21T10:13:09.80] 	at sun.jvm.hotspot.HotSpotTypeDataBase.<init>(HotSpotTypeDataBase.java:85)
[2010-09-21T10:13:09.80] 	at sun.jvm.hotspot.HotSpotAgent.setupVM(HotSpotAgent.java:388)
[2010-09-21T10:13:09.80] 	at sun.jvm.hotspot.HotSpotAgent.go(HotSpotAgent.java:315)
[2010-09-21T10:13:09.80] 	at sun.jvm.hotspot.HotSpotAgent.attach(HotSpotAgent.java:158)
[2010-09-21T10:13:09.80] 	at sun.jvm.hotspot.jdi.VirtualMachineImpl.createVirtualMachineForPID(VirtualMachineImpl.java:222)
[2010-09-21T10:13:10.75] 	... 11 more

                                    

Comments
EVALUATION

I got side tracked for several days trying to deal with a different
failure mode for VM/NSK sajdi tests. While investigating that problem,
I happened to notice that this failure mode only happens with "product"
bits and doesn't happen with "fastdebug" bits. At least that's the case
here in my lab in Colorado. I have a query into the submitter to see
if that observation holds true in VM/SQE testing also.

Doing a slightly more refined grep of my baseline JDK7-B119 test
results shows:

$ grep 'gHotSpotVMTypes' vm-sajdi-prod-*/dcubed*/*/*.log \
| sed -e 's/:.*//' -e 's#/[^/][^/]*$##' | sort -u \
| sed 's#/dcubed.*##' | uniq -c
     75 vm-sajdi-prod-client-prod-comp.windows-i586
     73 vm-sajdi-prod-client-prod-mixed.windows-i586
     69 vm-sajdi-prod-server-prod-comp.windows-i586
     68 vm-sajdi-prod-server-prod-mixed.windows-i586

The grep gets all instances of gHotSpotVMTypes in all Client
and Server VM log files; sometimes gHotSpotVMTypes appears in
more than one log file per test. The first sed line followed
by the 'sort -u' strips us down to one line per test failure.
The second sed line allows us to see failure counts per config.
Only product bits ("prod") here; no fastdebug bits ("fast").

Update: My sync_jdk script has a bug where the jvm.pdb files
were not being copied from the fastdebug sub directories to the
client-fast and server-fast configs. That's the reason that
there are not fastdebug failures. This bug does reproduce with
fastdebug bits also (when the jvm.pdb file is present).
                                     
2010-12-02
EVALUATION

Based on the experiments that I did in my build area (pasted into
comment note #5), the problem is not with the built bits. The problem
is with the jvm.pdb file. If that file is moved aside, then at least
the Client VM is happy. Haven't tested the Server VM yet...
                                     
2010-12-04
EVALUATION

I installed JDK7-B120 from the Windows installer bundle:

$ ls -l jdk-7-ea-bin-b120-windows-i586-01_dec_2010.exe
-rwxrwxr-x   1 nobody   java_re  90778376 Dec  1 17:44 jdk-7-ea-bin-b120-windows-i586-01_dec_2010.exe

into c:/java_jdks/jdk1.7.0_b120 on my WinXP VMware client.

I ran the v7r02 VM/NSK sajdi tests using those bits. I only
ran the tests with product bits since fastdebug bits aren't
available via the installer bundle:

Results dir: vm-sajdi-prod-client-prod-comp.windows-i586
    executed: 93  passed: 83  ignored: 0  failed: 10
    time: 25 minute(s)
Results dir: vm-sajdi-prod-client-prod-mixed.windows-i586
    executed: 93  passed: 83  ignored: 0  failed: 10
    time: 25 minute(s)
Results dir: vm-sajdi-prod-server-prod-comp.windows-i586
    executed: 93  passed: 83  ignored: 0  failed: 10
    time: 27 minute(s)
Results dir: vm-sajdi-prod-server-prod-mixed.windows-i586
    executed: 93  passed: 83  ignored: 0  failed: 10
    time: 25 minute(s)

Summary of Test Results (4 result dirs)
=========================================
    all executed: 372  all passed: 332  all ignored: 0  all failed: 40
    time: 1 hour(s) 42 minute(s)


I found it interesting that there were exactly 10 failures in all
four product bit configs. Here is a breakdown of the failure counts
by test name:

      4 XXX/nsk/sajdi/ReferenceType/allMethods/allmethods001
      4 XXX/nsk/sajdi/ReferenceType/visibleMethods/vsbmethods001
      4 XXX/nsk/sajdi/SADebugServerAttachingConnector/attach/attach001
      4 XXX/nsk/sajdi/SADebugServerAttachingConnector/attach/attach002
      4 XXX/nsk/sajdi/SADebugServerAttachingConnector/attach/attach011
      4 XXX/nsk/sajdi/SADebugServerAttachingConnector/attach/attach012
      4 XXX/nsk/sajdi/ThreadReference/frames/frames001
      4 XXX/nsk/sajdi/ThreadReference/status/status002
      4 XXX/nsk/sajdi/jdb/options/connect/connect003
      4 XXX/nsk/sajdi/jdb/options/connect/connect004

The test names are prefixed by "XXX/" so that this note won't cause
them to show up on the known fail_list.

A number of the tests failed due to:

    java.net.BindException: Address already in use: JVM_Bind

Which is usually a configuration or stale/leftover process problem.

Here is a breakdown of the "Address already in use" exceptions per test:

      4 attach001
      4 attach002
      3 attach011
      4 attach012
      4 connect003
      4 connect004

The one attach011 test that didn't fail with an "Address already in use"
exception failed due to "ERROR: Unable to find pid of started process
with jps in 30 tries".

The allmethods001, frames001, status002, and vsbmethods001
failures look like "regular" test failures.

This failure mode does not affect the Windows bits that we deliver
to customers. However, that doesn't mean that this bug isn't
important because this bug is obscuring proper execution of 69-75
tests per configuration in our automated testing environment.
                                     
2010-12-10
EVALUATION

I mirrored /java/re/jdk/1.7.0/promoted/all/b120/binaries/windows-i586
to a scratch directory on my WinXP VMware client. I removed the jvm.map
and jvm.pdb files from jre/bin/{client,server}. I copied just the jvm.dll
file from fastdebug/jre//bin/{client,server} to jre/bin/{client,server}-fast.

I ran the v7r02 VM/NSK sajdi tests using those bits:

Results dir: vm-sajdi-prod-client-fast-comp.windows-i586
    executed: 93  passed: 83  ignored: 0  failed: 10
    time: 26 minute(s)
Results dir: vm-sajdi-prod-client-fast-mixed.windows-i586
    executed: 93  passed: 83  ignored: 0  failed: 10
    time: 25 minute(s)
Results dir: vm-sajdi-prod-client-prod-comp.windows-i586
    executed: 93  passed: 83  ignored: 0  failed: 10
    time: 25 minute(s)
Results dir: vm-sajdi-prod-client-prod-mixed.windows-i586
    executed: 93  passed: 83  ignored: 0  failed: 10
    time: 23 minute(s)
Results dir: vm-sajdi-prod-server-fast-comp.windows-i586
    executed: 93  passed: 83  ignored: 0  failed: 10
    time: 29 minute(s)
Results dir: vm-sajdi-prod-server-fast-mixed.windows-i586
    executed: 93  passed: 83  ignored: 0  failed: 10
    time: 25 minute(s)
Results dir: vm-sajdi-prod-server-prod-comp.windows-i586
    executed: 93  passed: 83  ignored: 0  failed: 10
    time: 27 minute(s)
Results dir: vm-sajdi-prod-server-prod-mixed.windows-i586
    executed: 93  passed: 83  ignored: 0  failed: 10
    time: 23 minute(s)

Summary of Test Results (8 result dirs)
=========================================
    all executed: 744  all passed: 664  all ignored: 0  all failed: 80
    time: 3 hour(s) 23 minute(s)

I found it interesting that there were exactly 10 failures in all
eight configs. Here is a breakdown of the failure counts by test name:

      8 XXX/nsk/sajdi/ReferenceType/allMethods/allmethods001
      8 XXX/nsk/sajdi/ReferenceType/visibleMethods/vsbmethods001
      8 XXX/nsk/sajdi/SADebugServerAttachingConnector/attach/attach001
      8 XXX/nsk/sajdi/SADebugServerAttachingConnector/attach/attach002
      8 XXX/nsk/sajdi/SADebugServerAttachingConnector/attach/attach011
      8 XXX/nsk/sajdi/SADebugServerAttachingConnector/attach/attach012
      8 XXX/nsk/sajdi/ThreadReference/frames/frames001
      8 XXX/nsk/sajdi/ThreadReference/status/status002
      8 XXX/nsk/sajdi/jdb/options/connect/connect003
      8 XXX/nsk/sajdi/jdb/options/connect/connect004

The test names are prefixed by "XXX/" so that this note won't cause
them to show up on the known fail_list.

Same issue with tests failing due to:

    java.net.BindException: Address already in use: JVM_Bind

Which is usually a configuration or stale/leftover process problem.

Here is a breakdown of the "Address already in use" exceptions per test:

      8 attach001
      8 attach002
      8 attach011
      8 attach012
      8 connect003
      8 connect004

The allmethods001, frames001, status002, and vsbmethods001
failures look like "regular" test failures.

These test results indicate that the only real difference
between the installer bits and the /java/re/jdk/... bits
is the presence of the jvm.map and jvm.pdb files. At least
for the VM/NSK sajdi subsuite.
                                     
2010-12-10
EVALUATION

I built a slightly old RT_Baseline clone (tip = a6b067997c7e) with
VS2003 and then ran the VM/NSK sajdi tests with the following
debugger option set:

    -Dsun.jvm.hotspot.debugger.windbg.disableNativeLookup=1

This option disables use of the native symbol lookup code and
causes the COFF based code to be used instead. Here are the results:

Results dir: vm-sajdi-prod-client_bh_hsx_rt_latest_exp_vs2003_dcubed-fast-comp.windows-i586
    executed: 93  passed: 83  ignored: 0  failed: 10
    time: 26 minute(s)
Results dir: vm-sajdi-prod-client_bh_hsx_rt_latest_exp_vs2003_dcubed-fast-mixed.windows-i586
    executed: 93  passed: 83  ignored: 0  failed: 10
    time: 25 minute(s)
Results dir: vm-sajdi-prod-client_bh_hsx_rt_latest_exp_vs2003_dcubed-prod-comp.windows-i586
    executed: 93  passed: 83  ignored: 0  failed: 10
    time: 25 minute(s)
Results dir: vm-sajdi-prod-client_bh_hsx_rt_latest_exp_vs2003_dcubed-prod-mixed.windows-i586
    executed: 93  passed: 83  ignored: 0  failed: 10
    time: 25 minute(s)
Results dir: vm-sajdi-prod-server_bh_hsx_rt_latest_exp_vs2003_dcubed-fast-comp.windows-i586
    executed: 93  passed: 83  ignored: 0  failed: 10
    time: 29 minute(s)
Results dir: vm-sajdi-prod-server_bh_hsx_rt_latest_exp_vs2003_dcubed-fast-mixed.windows-i586
    executed: 93  passed: 83  ignored: 0  failed: 10
    time: 25 minute(s)
Results dir: vm-sajdi-prod-server_bh_hsx_rt_latest_exp_vs2003_dcubed-prod-comp.windows-i586
    executed: 93  passed: 83  ignored: 0  failed: 10
    time: 26 minute(s)
Results dir: vm-sajdi-prod-server_bh_hsx_rt_latest_exp_vs2003_dcubed-prod-mixed.windows-i586
    executed: 93  passed: 83  ignored: 0  failed: 10
    time: 25 minute(s)

Summary of Test Results (8 result dirs)
=========================================
    all executed: 744  all passed: 664  all ignored: 0  all failed: 80
    time: 3 hour(s) 26 minute(s)


Again, 10 failures per config. Here is a breakdown of the failure
counts by test name:

      8 XXX/nsk/sajdi/ReferenceType/allMethods/allmethods001
      8 XXX/nsk/sajdi/ReferenceType/visibleMethods/vsbmethods001
      8 XXX/nsk/sajdi/SADebugServerAttachingConnector/attach/attach001
      8 XXX/nsk/sajdi/SADebugServerAttachingConnector/attach/attach002
      8 XXX/nsk/sajdi/SADebugServerAttachingConnector/attach/attach011
      8 XXX/nsk/sajdi/SADebugServerAttachingConnector/attach/attach012
      8 XXX/nsk/sajdi/ThreadReference/frames/frames001
      8 XXX/nsk/sajdi/ThreadReference/status/status002
      8 XXX/nsk/sajdi/jdb/options/connect/connect003
      8 XXX/nsk/sajdi/jdb/options/connect/connect004

The test names are prefixed by "XXX/" so that this note won't cause
them to show up on the known fail_list.

Same issue with tests failing due to:

    java.net.BindException: Address already in use: JVM_Bind

Which is usually a configuration or stale/leftover process problem.

Here is a breakdown of the "Address already in use" exceptions per test:

      8 attach001
      8 attach002
      8 attach011
      8 attach012
      8 connect003
      8 connect004

This time I didn't look at the allmethods001, frames001, status002,
and vsbmethods001 failures.

These test results indicate that the COFF fall back code has the
same test results as the native symbol lookup code on VS2003.
So the COFF fall back code isn't bit rotted.
                                     
2010-12-11
EVALUATION

I used VS2010 created bits of the same repo, removed the jvm.pdb file
from the installed VM directory and from the build directory and
verified that nsk/sajdi/SACoreAttachingConnector/attach/attach001
passed (as expected). I added the following option to the debugger:

    -Dsun.jvm.hotspot.debugger.windbg.disableNativeLookup=1

and the test failed. This indicates that the COFF problem is with
the bits built by VS2010 so I'm thinking we have at least two
different problems here:

- we need new dbgeng.dll and dbghelp.dll libs in order to handle
  VS2010 jvm.pdb files
- the COFF fall back code needs to be updated in order to process
  VS2010 built bits
                                     
2010-12-11
EVALUATION

I short circuited the native lookup code to dump info about the
symbol that was found and then return NULL. This causes the COFF
fall back code to be used and combined with the other symbol
dumping code that I added, this allows me to compare the two sets
of symbol info. With VS2003 built bits, the two symbol lookup
code paths are in perfect agreement. With VS2010 built bits, the
symbol addresses returned by the COFF code are off from the symbol
addresses returned by the native lookup code by 0x1A00.

There are some comments in the code about there being possible
issues with the symbol address calculation logic.
                                     
2010-12-15
EVALUATION

A lot of tests fail,
                                     
2010-12-15
EVALUATION

It is looking like the switch in compilers from VS2003 -> VS2010
has revealed a latent bug in the way that the COFF symbol lookup
code calculates the symbol's address. I believe I have found the
reason and now I need to make sure that my idea makes sense in
the context for VS2003 and VS2010.
                                     
2010-12-17
EVALUATION

I found a useful utility class/program in the SA code:

    sun.jvm.hotspot.debugger.win32.coff.DumpExports

The source lives in:

    agent/src/share/classes/sun/jvm/hotspot/debugger/win32/coff/DumpExports.java

The original class dumped information about:

- number of sections in the COFF file
- the names of each section
- DLL name, timestamp, major and minor version numbers
- list of exported names and addresses

I modified the class to dump additional information about
the sections and I reorganized the exported symbol information.

Here are some snippets from the VS2003 bits using the updated class:

Export table: RVA = 2469840/0x25afd0, size = 44330/0xad2a
5 sections in file
  Section 1:
    Name = '.text'
    VirtualSize = 2149396/0x20cc14
    VirtualAddress = 4096/0x1000
    SizeOfRawData = 2150400/0x20d000
    PointerToRawData = 4096/0x1000
  Section 2:
    Name = '.rdata'
    VirtualSize = 359674/0x57cfa
    VirtualAddress = 2154496/0x20e000
    SizeOfRawData = 360448/0x58000
    PointerToRawData = 2154496/0x20e000
  Section 3:
    Name = '.data'
    VirtualSize = 138076/0x21b5c
    VirtualAddress = 2514944/0x266000
    SizeOfRawData = 57344/0xe000
    PointerToRawData = 2514944/0x266000
  Section 4:
    Name = '.rsrc'
    VirtualSize = 888/0x378
    VirtualAddress = 2654208/0x288000
    SizeOfRawData = 4096/0x1000
    PointerToRawData = 2572288/0x274000
  Section 5:
    Name = '.reloc'
    VirtualSize = 161062/0x27526
    VirtualAddress = 2658304/0x289000
    SizeOfRawData = 163840/0x28000
    PointerToRawData = 2576384/0x275000
DLL name: jvm.dll
Time/date stamp 0x4d0fd1c0
Major version 0x0
Minor version 0x0
1188 exports found
[0] '??_7?$G1ParCopyClosure@$00$0A@$00@@6B@': [0] = 0x22c90c

<snip>

[1181] 'gHotSpotVMTypeEntryTypeNameOffset': [1181] = 0x285fe0
[1182] 'gHotSpotVMTypes': [1182] = 0x26eb2c

<snip>

[1187] 'jio_vsnprintf': [1187] = 0x11af10

The 'gHotSpotVMTypes' 'Export RVA' value is 0x26eb2c. When that value
is passed to rvaToFileOffset(), the value is found to belong to the
.data section which starts at VA 2514944/0x266000 and ends at VA
2653020/0x287b5c. The RVA is converted to a file offset by adding
its offset in the Virtual Address space to the start of the raw file
address space:

    if ((va <= rva) && (rva < (va + sz))) {
      return sec.getPointerToRawData() + (rva - va);
    }

In the VS2003 bits .data section, the Virtual Address base value and
the raw file address base value are the same:

    VirtualAddress = 2514944/0x266000
    PointerToRawData = 2514944/0x266000

This means that the call to rvaToFileOffset() is a no-op for symbols
found in the .data section.


Here are some snippets from the VS2010 bits using the updated class:

Export table: RVA = 2747232/0x29eb60, size = 44330/0xad2a
5 sections in file
  Section 1:
    Name = '.text'
    VirtualSize = 2340570/0x23b6da
    VirtualAddress = 4096/0x1000
    SizeOfRawData = 2340864/0x23b800
    PointerToRawData = 1024/0x400
  Section 2:
    Name = '.rdata'
    VirtualSize = 444554/0x6c88a
    VirtualAddress = 2347008/0x23d000
    SizeOfRawData = 444928/0x6ca00
    PointerToRawData = 2341888/0x23bc00
  Section 3:
    Name = '.data'
    VirtualSize = 171004/0x29bfc
    VirtualAddress = 2793472/0x2aa000
    SizeOfRawData = 88064/0x15800
    PointerToRawData = 2786816/0x2a8600
  Section 4:
    Name = '.rsrc'
    VirtualSize = 1300/0x514
    VirtualAddress = 2965504/0x2d4000
    SizeOfRawData = 1536/0x600
    PointerToRawData = 2874880/0x2bde00
  Section 5:
    Name = '.reloc'
    VirtualSize = 200356/0x30ea4
    VirtualAddress = 2969600/0x2d5000
    SizeOfRawData = 200704/0x31000
    PointerToRawData = 2876416/0x2be400
DLL name: jvm.dll
Time/date stamp 0x4d0fd6b1
Major version 0x0
Minor version 0x0
1188 exports found
[0] '??_7?$G1ParCopyClosure@$00$0A@$00@@6B@': [0] = 0x25c898

<snip>

[1181] 'gHotSpotVMTypeEntryTypeNameOffset': [1181] = 0x2d1d68
[1182] 'gHotSpotVMTypes': [1182] = 0x2ba77c

<snip>

[1187] 'jio_vsnprintf': [1187] = 0x136d50

The 'gHotSpotVMTypes' 'Export RVA' value is 0x2ba77c. When that value
is passed to rvaToFileOffset(), the value is found to belong to the
.data section which starts at VA 2793472/0x2aa000 and ends at VA
2964476/0x2d3bfc. The RVA is converted to a file offset by adding
its offset in the Virtual Address space to the start of the raw file
address space:

    if ((va <= rva) && (rva < (va + sz))) {
      return sec.getPointerToRawData() + (rva - va);
    }

In the VS2010 bits .data section, the Virtual Address base value and
the raw file address base value are different:

    VirtualAddress = 2793472/0x2aa000
    PointerToRawData = 2786816/0x2a8600

There is a difference of 6656/0x1A00 between VirtualAddress and
PointerToRawData. This means that the call to rvaToFileOffset()
will change the 'Export RVA' value by 6656/0x1A00 which is what
was seen in earlier debugging sessions.

The 'Export RVA' field should not be passed to rvaToFileOffset()
because that field is not used to fetch another value in the COFF
file. It is a real address. However, when the field is interpreted
as a 'Forwarder RVA' value, then it needs to be passed to
rvaToFileOffset() because it is used to fetch another value.
                                     
2010-12-21
EVALUATION

In my JDK7-B109 Win32 baseline testing, there are 286 tests with
this failure mode:

$  grep 'gHotSpotVMTypes' vm-sajdi-prod-client-*/dcubed*/*/*.log  vm-sajdi-prod-server-*/dcubed*/*/*.log | wc -l
286


In my JDK7-B110 Win32 baseline testing, there are 288 tests with
this failure mode:

$ grep 'gHotSpotVMTypes' vm-sajdi-prod-client-*/dcubed*/*/*.log  vm-sajdi-prod-server-*/dcubed*/*/*.log | wc -l
288


In my JDK7-B111 Win32 baseline testing, there are 286 tests with
this failure mode:

$ grep 'gHotSpotVMTypes' vm-sajdi-prod-client-*/dcubed*/*/*.log  vm-sajdi-prod-server-*/dcubed*/*/*.log | wc -l
286
                                     
2010-09-28
EVALUATION

The RuntimeException that is thrown is coming from

agent/src/share/classes/sun/jvm/hotspot/HotSpotTypeDataBase.java:

    110     // Fetch the address of the VMTypeEntry*
    111     Address entryAddr = lookupInProcess("gHotSpotVMTypes");
    112     //    System.err.println("gHotSpotVMTypes address = " + entryAddr);
    113     // Dereference this once to get the pointer to the first VMTypeEntry
    114     //    dumpMemory(entryAddr, 80);
    115     entryAddr = entryAddr.getAddressAt(0);
    116 
    117     if (entryAddr == null) {
    118       throw new RuntimeException("gHotSpotVMTypes was not initialized properly in the remote process; can not continue");
    119     }


The entryAddr value returned on line 111 is non-NULL which indicates
that the symbol (gHotSpotVMTypes) was found, but when we attempt
to dereference the returned memory, we get a NULL which indicates
that the returned memory is not in the memory layout that we expect.

Here is dumpMemory() output for the failing run:

    gHotSpotVMTypes address = 0x082b7f4c
    0x082b7f4c: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
    0x082b7f54: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
    0x082b7f5c: 0x00 0x00 0x00 0x00 0x20 0x00 0x00 0x00
    0x082b7f64: 0x00 0x00 0x00 0x00 0x90 0x05 0x28 0x08
    0x082b7f6c: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
    0x082b7f74: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
    0x082b7f7c: 0x00 0x00 0x00 0x00 0x3c 0x00 0x00 0x00
    0x082b7f84: 0x00 0x00 0x00 0x00 0x80 0x05 0x28 0x08
    0x082b7f8c: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
    0x082b7f94: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00

Here is dumpMemory() output for a passing run:

    gHotSpotVMTypes address = 0x0826db3c
    0x0826db3c: 0x78 0xb4 0x26 0x08 0x28 0x20 0x27 0x08
    0x0826db44: 0xa0 0x25 0x27 0x08 0x04 0x00 0x00 0x00
    0x0826db4c: 0x00 0x00 0x00 0x00 0x08 0x00 0x00 0x00
    0x0826db54: 0x00 0x00 0x00 0x00 0x0c 0x00 0x00 0x00
    0x0826db5c: 0x00 0x00 0x00 0x00 0x10 0x00 0x00 0x00
    0x0826db64: 0x00 0x00 0x00 0x00 0x18 0x00 0x00 0x00
    0x0826db6c: 0x00 0x00 0x00 0x00 0x20 0x00 0x00 0x00
    0x0826db74: 0x00 0x00 0x00 0x00 0x04 0x00 0x00 0x00
    0x0826db7c: 0x00 0x00 0x00 0x00 0x08 0x00 0x00 0x00
    0x0826db84: 0x00 0x00 0x00 0x00 0x0c 0x00 0x00 0x00

I suspect that the layout of the symbol information has changed
from VS2003 -> VS2010.
                                     
2010-11-16
EVALUATION

I jumped too far down in the failure stack for symbol lookup when
I said that I think that the symbol layout has changed. The passing
test under VS2003 and the failing test under VS2010 actually diverge
code execution paths quite a bit earlier.

In the passing test under VS2003, lookupByName0() succeeds in finding
our symbol, and in the failing test under VS2010, lookupByName0()
fails when GetOffsetByName() returns != S_OK.

agent/src/os/win32/windbg/sawindbg.cpp:

/*
 * Class:     sun_jvm_hotspot_debugger_windbg_WindbgDebuggerLocal
 * Method:    lookupByName0
 * Signature: (Ljava/lang/String;Ljava/lang/String;)J
 */

JNIEXPORT jlong JNICALL Java_sun_jvm_hotspot_debugger_windbg_WindbgDebuggerLocal_lookupByName0
(JNIEnv *env, jobject obj, jstring objName, jstring sym) {
  IDebugSymbols* ptrIDebugSymbols = (IDebugSymbols*) env->GetLongField(obj,
                                                      ptrIDebugSymbols_ID);
  CHECK_EXCEPTION_(0);

  jboolean isCopy;
  const char* buf = env->GetStringUTFChars(sym, &isCopy);
  CHECK_EXCEPTION_(0);
  AutoJavaString name(env, sym, buf);

  ULONG64 offset = 0L;
  if (strstr(name, "::") != 0) {
    ptrIDebugSymbols->AddSymbolOptions(SYMOPT_UNDNAME);
  } else {
    ptrIDebugSymbols->RemoveSymbolOptions(SYMOPT_UNDNAME);
  }
  if (ptrIDebugSymbols->GetOffsetByName(name, &offset) != S_OK) {
    return (jlong) 0;
  }
  return (jlong) offset;
}
                                     
2010-11-16
EVALUATION

It dawned on me this morning that I hadn't quite finished characterizing
the failure mode. Here are the test results when the two interesting
components are built by VS2003 and VS2010:

VM      sawindbg  result
------  --------  ------
VS2003  VS2003    PASS
VS2010  VS2010    FAIL
VS2010  VS2003    FAIL
VS2003  VS2010    PASS

I had been focusing on "what's wrong with sawindbg?". However,
the above table shows that when sawindbg is built by either
VS2003 or VS2010, it works on a VM built by VS2003. However,
when the VM is built by VS2010, neither sawindbg works. So the
question should be "what's wrong with the VM?".
                                     
2010-11-17
SUGGESTED FIX

Here are the suggested set of changes:

- Reorder the code in HotSpotTypeDataBase.readVMTypes() to lookup
  and decode the 'gHotSpotVMTypes' symbol first. This should help
  make any future failures in this area more consistent.

- COFFFileParser changes:
  - add comments to clarify some of the algorithms
  - read 'Base Of Data' field in optional header when PE32
    format COFF file is read
  - change ExportDirectoryTableImpl to return the 'Export RVA' field
    without modification and to return the 'Forwarder RVA' field
    after adjusting the address into a file offset

- update debugger/win32/coff/DumpExports to include more info about
  the section headers and to dump the exported symbol info in a more
  understandable order

- update debugger/win32/coff/TestParser to more clearly access the
  section header using 1-based indices instead of 0-based indices

- update the static initializer in debugger/windbg/WindbgDebuggerLocal
  to use different library loading logic for dbgeng.dll and dbghelp.dll.
  The library pair is searched for in:

   - $JAVA_HOME/jre/bin
   - dir named by DEBUGGINGTOOLSFORWINDOWS environment variable
   - the "Debugging Tools For Windows" program directory
   - the "Debugging Tools For Windows (x86)" program directory
   - the "Debugging Tools For Windows (x64)" program directory
   - the system directory (WINDOWS/system32 is searched last)

- the sawwindb.dll is now explicitly loaded from $JAVA_HOME/jre/bin
                                     
2010-12-21
SUGGESTED FIX

Attached the fix for Code Review Round 0 as 6987812-webrev-cr0.tgz.
                                     
2010-12-21
EVALUATION

http://hg.openjdk.java.net/jdk7/hotspot-rt/hotspot/rev/d6cd0d55d0b5
                                     
2010-12-23



Hardware and Software, Engineered to Work Together