JDK-8292201 : serviceability/sa/ClhsdbThreadContext.java fails with "'Thread "Common-Cleaner"' missing from stdout/stderr"
  • Type: Bug
  • Component: hotspot
  • Sub-Component: svc-agent
  • Affected Version: 20
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2022-08-10
  • Updated: 2022-10-27
  • Resolved: 2022-09-03
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 20
20 b14Fixed
Related Reports
Relates :  
Description
Although the summary is rather specific, the root cause is much more general, and can cause a variety of test failures. We'll leave it specific to ClhsdbThreadContext.java for now. The test failed with:

java.lang.RuntimeException: Test ERROR java.lang.RuntimeException: 'Thread "Common-Cleaner"' missing from stdout/stderr

And indeed the Common-Cleaner thread dump is missing from the output. However, the root cause is an unexpected exception threadcontext dumping:

Thread "C1 CompilerThread0" id=18 Address=0x00000253136a0650
r15: 0x000000750e4ff2e8: In java stack for thread "C1 CompilerThread0" sun.jvm.hotspot.runtime.CompilerThread@0x00000253136a0650
r14: 0x0000025377384470
r13: null
r12: 0x000000750e4ff378: In java stack for thread "C1 CompilerThread0" sun.jvm.hotspot.runtime.CompilerThread@0x00000253136a0650
r11: 0x000000750e4fdeb8: In java stack for thread "C1 CompilerThread0" sun.jvm.hotspot.runtime.CompilerThread@0x00000253136a0650
r10: 0x000000750e4fe7c0: In java stack for thread "C1 CompilerThread0" sun.jvm.hotspot.runtime.CompilerThread@0x00000253136a0650
r9: null
r8: 0x000000750e4fdffcError: java.lang.ArrayIndexOutOfBoundsException: Index 4099 out of bounds for length 4096
java.lang.ArrayIndexOutOfBoundsException: Index 4099 out of bounds for length 4096
	at jdk.hotspot.agent/sun.jvm.hotspot.debugger.Page.getLong(Page.java:182)
	at jdk.hotspot.agent/sun.jvm.hotspot.debugger.PageCache.getLong(PageCache.java:100)
	at jdk.hotspot.agent/sun.jvm.hotspot.debugger.DebuggerBase.readCInteger(DebuggerBase.java:364)
	at jdk.hotspot.agent/sun.jvm.hotspot.debugger.DebuggerBase.readAddressValue(DebuggerBase.java:462)
	at jdk.hotspot.agent/sun.jvm.hotspot.debugger.windbg.WindbgDebuggerLocal.readAddress(WindbgDebuggerLocal.java:312)
	at jdk.hotspot.agent/sun.jvm.hotspot.debugger.windbg.WindbgAddress.getAddressAt(WindbgAddress.java:71)
	at jdk.hotspot.agent/sun.jvm.hotspot.utilities.PointerFinder.find(PointerFinder.java:58)
	at jdk.hotspot.agent/sun.jvm.hotspot.runtime.JavaThread.printThreadContextOn(JavaThread.java:493)
	at jdk.hotspot.agent/sun.jvm.hotspot.CommandProcessor$46.doit(CommandProcessor.java:1699)
	at jdk.hotspot.agent/sun.jvm.hotspot.CommandProcessor.executeCommand(CommandProcessor.java:2212)
	at jdk.hotspot.agent/sun.jvm.hotspot.CommandProcessor.executeCommand(CommandProcessor.java:2182)
	at jdk.hotspot.agent/sun.jvm.hotspot.CommandProcessor.run(CommandProcessor.java:2053)
	at jdk.hotspot.agent/sun.jvm.hotspot.CLHSDB.run(CLHSDB.java:112)
	at jdk.hotspot.agent/sun.jvm.hotspot.CLHSDB.main(CLHSDB.java:44)
	at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.runCLHSDB(SALauncher.java:281)
	at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.main(SALauncher.java:500)

It looks like it's trying to fetch a 64-bit value (an address) that starts in the last 4-byte word of the page, so the read overflows the page. 64-bit values are expected always be 64-bit aligned.

This exception is triggered by doing a PointerFinder.find() on $r8, which is 0x000000750e4fdffc, so this end result is not surprising given that threadcontext calls PointerFinder.find() on whatever (possibly random) value is in each register.

PointerFinder.find() already expects a bad address to possibly fail with an exception, and it catches AddressException and WrongTypeException to deal with it (as do many other parts of SA code). I don't think adding ArrayIndexOutOfBoundsException to the list of possible exceptions is appropriate. 

It appears that a UnalignedAddressException (a subclass of AddressException) should have been thrown earlier on. There is in fact an alignment check made in DebuggerBase.readCInteger(), but the check is faulty:

           public void checkAlignment(long address, long alignment) {
             // Need to override default checkAlignment because we need to
             // relax alignment constraints on Windows/x86
             if ( (address % alignment != 0)
                &&(alignment != 8 || address % 4 != 0)) {
                throw new UnalignedAddressException(
                        "Trying to read at address: "
                      + addressValueToString(address)
                      + " with alignment: " + alignment,
                        address);
             }
           }

It purposefully allows 4 byte alignment of 8 byte values in order to support 32-bit x86. However, I don't see how this softening of this check can be expected to work. If indeed 64-bit values on x86 are allowed to be just 4 byte aligned, then x86 can run into this same problem of trying to read a 64-bit value starting in the last 4 bytes of a page and continuing in the first 4 bytes of the next page. That will fail with the same exception we are seeing here, even if the address and 64-bit value are valid.

So our options here are to either get rid of the support for 4 byte aligned 64-bit values (and hope it doesn't break x86), or modify the above check to only allow 4-byte alignment if getAddressSize() == 4.

Comments
Changeset: 767262e6 Author: Chris Plummer <cjplummer@openjdk.org> Date: 2022-09-03 16:06:25 +0000 URL: https://git.openjdk.org/jdk/commit/767262e67cec8e7a5e5eba2c6ebea7f60186d2cb
03-09-2022

A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/10090 Date: 2022-08-30 23:57:55 +0000
01-09-2022

The failure in this CR is the result of using PointerFinder ("findpc") on invalid address that starts on the last 32-bit word of a page. "page" in this case is referring to a page in SA's PageCache, which are always 4k in size. Note findpc is frequently done using an invalid address. In this test case it is being called on each x64 register, would could contain anything. findpc relies on getting some sort of AddressException when this happens, and will then say the address points to something that is unknown. However, in the case of the address pointing to the last 32-bot word of a page, it results in an ArrayIndexOutOfBoundsException when the page cache tries to read past the end of the page. This is happening in Page.getLong(), which you can see in the stack trace. The main culprit here is some weakening of the alignment checks being done. The alignment check should have resulted in an UnalignedAddressException long before we ever got to Page.getLong(). However, we have the following override, which is allowing the unaligned address to pass the alignment check. public void checkAlignment(long address, long alignment) { // Need to override default checkAlignment because we need to // relax alignment constraints on Bsd/x86 if ( (address % alignment != 0) &&(alignment != 8 || address % 4 != 0)) { throw new UnalignedAddressException( "Trying to read at address: " + addressValueToString(address) + " with alignment: " + alignment, address); } } }; This allows a pointer to a 64-bit value to only be 32-bit aligned. But there's more to it than that. I modified ClhsdbFindPC.java to fetch a tid (a thread address), and did a findpc on the address OR'd with 0xffc. `findpc` uses PointerFinder. This forced a read of a 64-bit value that starts in the last 32-bits of a page. It passed on linux-x64 but failed on windowx-x64 in the same manner described in the description of this CR. The difference between the two implementations is that windows relies on the default implementation of DebuggerBase.readCInteger() whereas linux has an override. DebuggerBase.readCInteger() does the following: public long readCInteger(long address, long numBytes, boolean isUnsigned) { utils.checkAlignment(address, numBytes); if (useFastAccessors) { if (isUnsigned) { switch((int) numBytes) { case 1: return cache.getByte(address) & 0xFF; case 2: return cache.getShort(address, bigEndian) & 0xFFFF; case 4: return cache.getInt(address, bigEndian) & 0xFFFFFFFFL; case 8: return cache.getLong(address, bigEndian); ... There is an alignment check here, but it is the "relaxed" override shown above, which allows 64-bit addresses on 32-bit boundaries. The override in LinuxDebuggerLocal is: /** Need to override this to relax alignment checks on x86. */ public long readCInteger(long address, long numBytes, boolean isUnsigned) throws UnmappedAddressException, UnalignedAddressException { // Only slightly relaxed semantics -- this is a hack, but is // necessary on x86 where it seems the compiler is // putting some global 64-bit data on 32-bit boundaries if (numBytes == 8) { utils.checkAlignment(address, 4); } else { utils.checkAlignment(address, numBytes); } byte[] data = readBytes(address, numBytes); return utils.dataToCInteger(data, isUnsigned); Although there is a relaxed alignment check here also, the code that reads from the address does not assume all the bytes are on the same page, so it won't run into the PageCache issue with reading a 64-bit value that starts in the last 32-bit word of a page. I think the introduction of these relaxed alignment checks has a muddled history, probably made more muddled by ports cloning code that maybe wasn't necessary. Probably also initial fixes (the relaxed alignment check) seemed to work at first, but later the PageCache issue was discovered, leading to readBytes() workaround in the LinuxDebuggerLocal.readCInteger() override, but was not also done on other ports (so we got this CR for windows). For 64-bit support it's clear this easing of the 64-bit alignment is not needed, and getting rid of it would result in the proper UnalignedAddressException being thrown. The question is whether it is still needed for 32-bit x86, and if so is it needed on all ports. I can't test linux-x86, so I can't tell if it still allows 64-bit values on 32-bit aligned addresses, so for now I'll assume it does. So the approach being taken is whenever an address of a 64-bit type points to the last 32-bit word of a page, use readBytes() to get the 64-bit value one byte at a time. It still uses the page cache in the end, but it doesn't try to get all 8 bytes from the same page. Note for 64-bit systems, the conditoin will never arise because of the removal of the relaxed alignment check. Instead there will be an UnalignedAddressException at an early point when the alignment check is made. One windfall of this change is now we always read 64-bit values from the page cache in a way that is much more efficient than reading a byte at a time. This has resulted in about a 25% performance improvement in a test I have that does a heap dump.
30-08-2022