Bug ID: JDK-8217413 [Testbug] runtime/ErrorHandling/TimeoutInErrorHandlingTest.java doesn't handle longer than expected timeouts

Type: Bug
Component: hotspot
Sub-Component: runtime
Affected Version: 13

Priority: P4
Status: Closed
Resolution: Duplicate
OS: solaris
CPU: sparc

Submitted: 2019-01-19
Updated: 2019-01-19
Resolved: 2019-01-19

The test describes itself as follows:

 /* Start the VM and let it crash. Specify TestUnresponsiveErrorHandler which will
         * let five subsequent error reporting steps hang. The Timeout handling triggered
         * by the WatcherThread should kick in and interrupt those steps. In theory, the
         * text "timeout occurred during error reporting in step .." (the little timeouts)
         * should occur in the error log up to four times, followed by the final big timeout
         * "------ Timeout during error reporting after xx s. ------"
         *
         * Note that there are a number of uncertainties which make writing a 100% foolproof
         * test challenging. The time the error reporting thread takes to react to the
         * timeout triggers is unknown. So it is difficult to predict how many little timeouts
         * will be visible before the big timeout kicks in. Also, once the big timeout hits,
         * error reporting thread and Watcherthread will race. The former writes his last
         * message to the error logs and flushes, the latter waits 200ms and then exits the
         * process without further synchronization with the error reporting thread.
         *
         * Because of all this and the desire to write a bullet proof test which does
         * not fail sporadically, we will not test for the final timeout message nor for all
         * of the optimally expected little timeout messages. We just test for two of the
         * little timeout messages to see that repeated timeout handling is basically working.
         */

 "-XX:ErrorLogTimeout=16", // 16 seconds big timeout = 4 seconds per little timeout

However in the failing case we see a "little timeout" of 8 seconds, which means we may hit the big timeout before seeing a second little timeout:

----------System.out:(3/1157)----------
Command line: [/opt/mach5/mesos/work_dir/jib-master/install/jdk13-jdk.184/solaris-sparcv9-debug.jdk/jdk-13/fastdebug/bin/java -cp /opt/mach5/mesos/work_dir/44f96277-5e42-4ded-8c25-00977506cd0e/testOutput/test-support/jtreg_open_test_hotspot_jtreg_hotspot_tier2_runtime/classes/6/runtime/ErrorHandling/TimeoutInErrorHandlingTest.d:/opt/mach5/mesos/work_dir/jib-master/install/jdk13-jdk.184/src.full/open/test/hotspot/jtreg/runtime/ErrorHandling:/opt/mach5/mesos/work_dir/44f96277-5e42-4ded-8c25-00977506cd0e/testOutput/test-support/jtreg_open_test_hotspot_jtreg_hotspot_tier2_runtime/classes/6/test/lib:/opt/mach5/mesos/work_dir/jib-master/install/java/re/jtreg/4.2/promoted/all/b13/bundles/jtreg_bin-4.2.zip/jtreg/lib/javatest.jar:/opt/mach5/mesos/work_dir/jib-master/install/java/re/jtreg/4.2/promoted/all/b13/bundles/jtreg_bin-4.2.zip/jtreg/lib/jtreg.jar -XX:+UnlockDiagnosticVMOptions -Xmx100M -XX:ErrorHandlerTest=14 -XX:+TestUnresponsiveErrorHandler -XX:ErrorLogTimeout=16 -XX:-CreateCoredumpOnCrash -version ]
Found hs_err file. Scanning...
Found: [timeout occurred during error reporting in step "test unresponsive error reporting step"] after 8 s..
----------System.err:(12/764)----------
java.lang.RuntimeException: hs-err file incomplete (first missing pattern: 1)
	at TimeoutInErrorHandlingTest.main(TimeoutInErrorHandlingTest.java:127)

Or we need to understand why we get such a long timeout in the first place - but our Solaris sparcv9 debug testing is notorious for inducing timeouts!

This test has been failing for a long time, intermittently. It seems the timeout steps are really error prone. See JDK-8188872 for more issues/concerns.

19-01-2019