JDK-8299555 : Missing timeout info
  • Type: Bug
  • Component: hotspot
  • Sub-Component: test
  • Priority: P4
  • Status: Closed
  • Resolution: Other
  • Submitted: 2023-01-03
  • Updated: 2023-01-09
  • Resolved: 2023-01-09
Related Reports
Relates :  
Description
There are several tests that timeout without any additional info, which looks like this in the output:

`
Timeout information: 
--- Timeout information end.
`

Here is a partial list of the issues from various components that seem to be affected by this:

`
https://bugs.openjdk.org/browse/JDK-8184445 JShell tests: fail intermittently if tests are run in high concurrent mode.
https://bugs.openjdk.org/browse/JDK-8286554 gc/stress/TestStressG1Humongous.java timed out
https://bugs.openjdk.org/browse/JDK-8288279 gc/z/TestHighUsage.java timed out
https://bugs.openjdk.org/browse/JDK-8251969 java/lang/invoke/RicochetTest.java timed out
https://bugs.openjdk.org/browse/JDK-8293289 gc/cslocker/TestCSLocker.java timed out
https://bugs.openjdk.org/browse/JDK-8270799 vmTestbase/nsk/jvmti/ tests timing out with JFR
https://bugs.openjdk.org/browse/JDK-8268379 java/util/Locale/LocaleProvidersRun.java and sun/util/locale/provider/CalendarDataRegression.java timed out
https://bugs.openjdk.org/browse/JDK-8265037 serviceability/sa/ClhsdbPmap.java#id1 failed with "RuntimeException: Process is still alive. Can't get its output."
https://bugs.openjdk.org/browse/JDK-8289918 serviceability/attach/AttachWithStalePidFile.java timed out with "IOException: Premature EOF"
https://bugs.openjdk.org/browse/JDK-8278369 java/nio/channels/Channels/TransferTo.java hangs in testStreamContents
https://bugs.openjdk.org/browse/JDK-8258648 vmTestbase/vm/mlvm/indy/stress/jdi/breakpointInCompiledCode/Test.java timed out
https://bugs.openjdk.org/browse/JDK-8249684 java/foreign/TestMismatch.java timed out

and of course the one that started me on this road:

https://bugs.openjdk.org/browse/JDK-8286345 runtime/NMT/ThreadedMallocTestType.java failed with java.lang.RuntimeException, we also see "exitValue = 134" and time out
`

My investigation suggests that we could try to get the hung processes to flush their output to the test's log, which would hopefully allow for further analysis, by walking through the process hierarchy and killing the hanging children processes. I have developed this into a small fix for jtreg https://bugs.openjdk.org/browse/CODETOOLS-7903217 and had a PR https://github.com/openjdk/jtreg/pull/97 but I could not build a consensus to get it checked into jtreg framework.

Instead we can try to get the MACH5 framework modified, by using the custom timeout handler, which is already provided by JTREG, as broadly suggested in the PR.
Comments
Such a solution requires a custom mechanism that walks the process tree and kills the hanging child processes. I have wrote a prototype of such a tool in C, which I chose because it keeps to a minimum the number of additional processes required for this task. It would have been better if we could do this without creating even a single additional process, but that requires a mechanism of creating a server process that starts at the beginning of MACH5 and serves all the subsequent tests (and uses inter process communication, ex. shared memory - we don't want to create additional pipes) For now, however, we should try creating a special process that does the cleanup and have it tested.
03-01-2023