JDK-8235220 : ClhsdbScanOops.java fails with sun.jvm.hotspot.types.WrongTypeException
  • Type: Bug
  • Component: hotspot
  • Sub-Component: svc-agent
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2019-12-03
  • Updated: 2022-06-10
  • Resolved: 2020-04-13
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11 JDK 15
11.0.16Fixed 15 b19Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Description
Actually at the moment this is not a test failure since the test does not detect the exception and passes anyway. It will start to fail once JDK-8234277 is pushed and does more error checking on the clhsdb output. What I'm seeing in the ClhsdbScanOops.java on linux-x64 is the following:

0x00000006c6eb4528 java/lang/String
0x00000006c6eb45b8 java/lang/String
0x00000006c6eb45f0 java/lang/String
0x00000006c6eb47c8 java/lang/String
Error: sun.jvm.hotspot.types.WrongTypeException: No suitable match for type of address 0x0000000800000028
sun.jvm.hotspot.types.WrongTypeException: No suitable match for type of address 0x0000000800000028
	at jdk.hotspot.agent/sun.jvm.hotspot.runtime.InstanceConstructor.newWrongTypeException(InstanceConstructor.java:62)
	at jdk.hotspot.agent/sun.jvm.hotspot.runtime.VirtualBaseConstructor.instantiateWrapperFor(VirtualBaseConstructor.java:109)
	at jdk.hotspot.agent/sun.jvm.hotspot.oops.Metadata.instantiateWrapperFor(Metadata.java:74)
	at jdk.hotspot.agent/sun.jvm.hotspot.utilities.RobustOopDeterminator.oopLooksValid(RobustOopDeterminator.java:73)
	at jdk.hotspot.agent/sun.jvm.hotspot.CommandProcessor$33.doit(CommandProcessor.java:1187)
	at jdk.hotspot.agent/sun.jvm.hotspot.CommandProcessor.executeCommand(CommandProcessor.java:1983)
	at jdk.hotspot.agent/sun.jvm.hotspot.CommandProcessor.executeCommand(CommandProcessor.java:1953)
	at jdk.hotspot.agent/sun.jvm.hotspot.CommandProcessor.run(CommandProcessor.java:1833)
	at jdk.hotspot.agent/sun.jvm.hotspot.CLHSDB.run(CLHSDB.java:99)
	at jdk.hotspot.agent/sun.jvm.hotspot.CLHSDB.main(CLHSDB.java:40)
	at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.runCLHSDB(SALauncher.java:270)
	at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.main(SALauncher.java:406)

But since the output the test is looking for is still there, the test passes. The changes for JDK-8234277 will do additional output error checking and detect this exception. As a result the test will fail on linux-x64 every time.

On windows-x64, this test always runs into JDK-8230731, so it never sees this particular error. However, even JDK-8230731 does not cause this test to fail since it still finds the output it is looking for, but the extra error checking done by JDK-8234277 will cause it to start to fail every time on windows-x64 also.

On macosx-x64 the failure is different as it sees a NullPointerException rather than the WrongTypeException seen on linux-x64. This also would normally not cause the test to fail, but does after JDK-8234277. It's unclear if this is related to the linux-x64 failure. A separate bug may need to be filed for it.

0x00000007aaba9268 java/lang/String
0x00000007aaba92a0 java/lang/String
0x00000007aaba9478 java/lang/String
Error: java.lang.NullPointerException
java.lang.NullPointerException
	at jdk.hotspot.agent/sun.jvm.hotspot.memory.FileMapInfo$FileMapHeader.inCopiedVtableSpace(FileMapInfo.java:124)
	at jdk.hotspot.agent/sun.jvm.hotspot.memory.FileMapInfo.inCopiedVtableSpace(FileMapInfo.java:104)
	at jdk.hotspot.agent/sun.jvm.hotspot.types.basic.BasicTypeDataBase.findDynamicTypeForAddress(BasicTypeDataBase.java:302)
	at jdk.hotspot.agent/sun.jvm.hotspot.runtime.VirtualBaseConstructor.instantiateWrapperFor(VirtualBaseConstructor.java:102)
	at jdk.hotspot.agent/sun.jvm.hotspot.oops.Metadata.instantiateWrapperFor(Metadata.java:74)
	at jdk.hotspot.agent/sun.jvm.hotspot.utilities.RobustOopDeterminator.oopLooksValid(RobustOopDeterminator.java:73)
	at jdk.hotspot.agent/sun.jvm.hotspot.CommandProcessor$33.doit(CommandProcessor.java:1187)
	at jdk.hotspot.agent/sun.jvm.hotspot.CommandProcessor.executeCommand(CommandProcessor.java:1983)
	at jdk.hotspot.agent/sun.jvm.hotspot.CommandProcessor.executeCommand(CommandProcessor.java:1953)
	at jdk.hotspot.agent/sun.jvm.hotspot.CommandProcessor.run(CommandProcessor.java:1833)
	at jdk.hotspot.agent/sun.jvm.hotspot.CLHSDB.run(CLHSDB.java:99)
	at jdk.hotspot.agent/sun.jvm.hotspot.CLHSDB.main(CLHSDB.java:40)
	at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.runCLHSDB(SALauncher.java:270)
	at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.main(SALauncher.java:406)
Comments
Fix request [11u] I backport this to improve the sa agent. We see the test failing in our CI. The ProblemList excludes the test in less cases than in later jdks before this and other fixes. No risk for the vm, touches only a sevicability tool. Clean backport except for ProblemList. SAP nightly testing passed.
13-05-2022

A pull request was submitted for review. URL: https://git.openjdk.java.net/jdk11u-dev/pull/1083 Date: 2022-05-13 11:54:23 +0000
13-05-2022

Git URL: https://github.com/openjdk/jdk/commit/77041dc4ec7542b34c85c59b4e53f14032d9548b
12-05-2022

I determined that tlabs are indeed allocated out of gc regions, and you cannot always safely scan to the top of any region's allocated space, as indicated by the following code in ObjectHeap.iterateLiveRegions(): catch (AddressException e) { // This is okay at the top of these regions } catch (UnknownOopException e) { // This is okay at the top of these regions } So scanoops is working properly.
15-04-2020

URL: https://hg.openjdk.java.net/jdk/jdk/rev/13de0f2d8067 User: cjplummer Date: 2020-04-13 20:24:06 +0000
13-04-2020

It's unclear to me why the address range that scanoops is scanning is trigger this bug is not "initialized", or maybe "allocated" would be a better way to put it. It should be a valid range since the test scans from the "begin" address of the eden space up to (but not including) the current "top", not the "end" address of the space. However, I'm also seeing this issue with Windows. It results in a "ReadVirtual failed" error. See JDK-8230731. My conclusion there was that windows was being asked to access an invalid address, and this results in an inability to read in the page from the process. WindbgDebuggerLocal.readBytesFromProcess0() throws a DebuggerException when this happens, which no parts of the code can recover from. My assumption there (and I still believe this) is that it should instead return null like other platforms do. Further up the stack this will result in and AddressException, and in cases where the code knows it might be a bad address, it will handle it gracefully. RobustOopDeterminator.oopLooksValid() is a good example of code that recovers from AddressException. When I fix WindbgDebuggerLocal.readBytesFromProcess0() to return null then I don't see any issues with ClhsdbScanOops on Windows. However, I still question why the area of the eden space being scanned is not valid. Is "top" not accurate? If it's not, then it looks like it may not be even close to representing the end of the allocated part of the space. For example, in one Windows failure I looked at the top value was 0x00000000d805c310 and the exception happened after printing a java.lang.String at 0x00000000d71b33e8. That's 15mb before where top points to. So why would any address less than "top" ever not be valid? Maybe it's tlabs are allocated there. I'm not sure how they are allocated, but I guess it would make sense that they are in the eden space. So this could be the reason memory below "top" could still be unallocated.
12-04-2020

Ioi's fix does appear to be working for the WrongTypeException failure. The OSX NPE failure still happens, but it appears to be the same as JDK-8241158.
10-04-2020

[~iklam] I'm going to keep this fix in my local repo for a while and see how it does.
04-04-2020

This test runs SA with something like: hsdb> + scanoops 0x00000006b0d00000 0x00000006b17147c8 Apparently the end value (0x00000006b17147c8) is actually past the end of the heap allocation top, so we are scanning into uninitialized memory. RobustOopDeterminator.oopLooksValid() is supposed to catch this -- it scans memory by a stride of 8 for 64-bit and 4 for 32-bit. For every location, it tries to read the location as an object. If it fails, just go to the next location. The bug can be fixed by this patch ================== diff -r 1b6cb377d024 src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/utilities/RobustOopDeterminator.java --- a/src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/utilities/RobustOopDeterminator.java Mon Mar 23 13:27:22 2020 -0700 +++ b/src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/utilities/RobustOopDeterminator.java Wed Mar 25 15:42:13 2020 -0700 @@ -74,10 +74,13 @@ } else { Metadata.instantiateWrapperFor(klassField.getValue(oop)); } - return true; - } + return true; + } catch (AddressException e) { return false; } + catch (WrongTypeException e) { + return false; + } } ================== The problem happens if the location contains a small integer, like (0x05 ... or 0x28??), reading this location as an oop with UseCompressedOops will yield an InstanceKlass of 0x800000028. This falls in the MD region of the CDS archive, but the first word of this address doesn't point to a valid vtable, so db.findDynamicTypeForAddress returns null here: http://hg.openjdk.java.net/jdk/jdk/file/1b6cb377d024/src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/types/basic/BasicTypeDataBase.java#l303 if (VM.getVM().isSharingEnabled()) { // Check if the value falls in the _md_region FileMapInfo cdsFileMapInfo = VM.getVM().getFileMapInfo(); if (cdsFileMapInfo.inCopiedVtableSpace(loc1)) { return cdsFileMapInfo.getTypeForVptrAddress(loc1); <<<<< HERE } } as a result, instantiateWrapperFor throws WrongTypeException here: http://hg.openjdk.java.net/jdk/jdk/file/1b6cb377d024/src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/VirtualBaseConstructor.java#l109 Type type = db.findDynamicTypeForAddress(addr, baseType); if (type != null) { return (T) VMObjectFactory.newObject((Class) map.get(type.getName()), addr); } else if (unknownTypeHandler != null) { return (T) VMObjectFactory.newObject(unknownTypeHandler, addr); } throw newWrongTypeException(addr); <<<<< HERE In the past, catching AddressException was enough, because reading invalid vptrs will generally lead to invalid memory (AddressException is similar to SEGV), but when CDS is enabled, it's quite possible to get WrongTypeException.
25-03-2020

I see this on linux-aarch64 as well, the error is exactly the same: Error: sun.jvm.hotspot.types.WrongTypeException: No suitable match for type of address 0x0000000800000028 sun.jvm.hotspot.types.WrongTypeException: No suitable match for type of address 0x0000000800000028 at jdk.hotspot.agent/sun.jvm.hotspot.runtime.InstanceConstructor.newWrongTypeException(InstanceConstructor.java:62) at jdk.hotspot.agent/sun.jvm.hotspot.runtime.VirtualBaseConstructor.instantiateWrapperFor(VirtualBaseConstructor.java:109)
14-01-2020