JDK-8214797 : TestJmapCoreMetaspace.java timed out
  • Type: Bug
  • Component: hotspot
  • Sub-Component: svc-agent
  • Affected Version: 11,12,13,14,15
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • OS: os_x
  • CPU: x86_64
  • Submitted: 2018-12-04
  • Updated: 2020-12-17
  • Resolved: 2020-04-30
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11 JDK 15
11.0.9-oracleFixed 15 b22Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Description
The following test timed out in the jdk/jdk CI:

serviceability/sa/TestJmapCoreMetaspace.java

Here's a snippet of the log file:

#section:driver
----------messages:(8/260)----------
command: driver TestJmapCore run metaspace
reason: User specified action: run driver/timeout=240 TestJmapCore run metaspace 
Mode: agentvm
Agent id: 6
Timeout refired 960 times
Timeout information:
--- Timeout information end.
elapsed time (seconds): 1294.349
----------configuration:(11/1480)----------
Boot Layer
  class path: /scratch/mesos/jib-master/install/java/re/jtreg/4.2/promoted/all/b13/bundles/jtreg_bin-4.2.zip/jtreg/lib/javatest.jar 
              /scratch/mesos/jib-master/install/java/re/jtreg/4.2/promoted/all/b13/bundles/jtreg_bin-4.2.zip/jtreg/lib/jtreg.jar 
  patch:      java.base /scratch/mesos/slaves/2dd962d0-8988-479b-a804-57ab764ada59-S17888/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/3cc25363-7043-4ff2-8b71-364a6e907977/runs/c5fe3e2c-4a2a-43eb-8d8c-929eb27fe33b/testOutput/test-support/jtreg_open_test_hotspot_jtreg_hotspot_tier2_runtime/patches/java.base

Test Layer
  class path: /scratch/mesos/slaves/2dd962d0-8988-479b-a804-57ab764ada59-S17888/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/3cc25363-7043-4ff2-8b71-364a6e907977/runs/c5fe3e2c-4a2a-43eb-8d8c-929eb27fe33b/testOutput/test-support/jtreg_open_test_hotspot_jtreg_hotspot_tier2_runtime/classes/1/serviceability/sa/TestJmapCoreMetaspace.d
              /scratch/mesos/jib-master/install/jdk12-jdk.1015/src.full/open/test/hotspot/jtreg/serviceability/sa
              /scratch/mesos/slaves/2dd962d0-8988-479b-a804-57ab764ada59-S17888/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/3cc25363-7043-4ff2-8b71-364a6e907977/runs/c5fe3e2c-4a2a-43eb-8d8c-929eb27fe33b/testOutput/test-support/jtreg_open_test_hotspot_jtreg_hotspot_tier2_runtime/classes/1/test/lib
              /scratch/mesos/jib-master/install/jdk12-jdk.1015/src.full/open/test/lib

----------rerun:(25/4795)*----------
cd /scratch/mesos/slaves/2dd962d0-8988-479b-a804-57ab764ada59-S17888/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/3cc25363-7043-4ff2-8b71-364a6e907977/runs/c5fe3e2c-4a2a-43eb-8d8c-929eb27fe33b/testOutput/test-support/jtreg_open_test_hotspot_jtreg_hotspot_tier2_runtime/scratch/3 && \\
HOME=/var/folders/ph/ypsq99350ng4tf90y368jp2w0000mv/T/sparky-temp-home-5996191615045418354/user_home \\
JDK8_HOME=/scratch/mesos/jib-master/install/jdk/11/28/bundles/osx-x64/jdk-11_osx-x64_bin.tar.gz/jdk-11.jdk/Contents/Home \\
JIB_DATA_DIR=/scratch/mesos/jib-master \\
JIB_HOME=/scratch/mesos/jib-master/install/com/oracle/java/jib/jib/3.0-SNAPSHOT/jib-3.0-SNAPSHOT-distribution.zip/jib-3.0-SNAPSHOT-distribution \\
JTREG_KEYWORDS="!headful" \\
JTREG_VERBOSE=fail,error,time \\
PATH=/bin:/usr/bin \\
TEST_IMAGE_GRAAL_DIR=/scratch/mesos/jib-master/install/jdk12-jdk.1015/macosx-x64-debug.test/hotspot/jtreg/graal \\
    /scratch/mesos/jib-master/install/jdk12-jdk.1015/macosx-x64-debug.jdk/jdk-12/fastdebug/bin/java \\
        -Dtest.class.path.prefix=/scratch/mesos/slaves/2dd962d0-8988-479b-a804-57ab764ada59-S17888/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/3cc25363-7043-4ff2-8b71-364a6e907977/runs/c5fe3e2c-4a2a-43eb-8d8c-929eb27fe33b/testOutput/test-support/jtreg_open_test_hotspot_jtreg_hotspot_tier2_runtime/classes/1/serviceability/sa/TestJmapCoreMetaspace.d:/scratch/mesos/jib-master/install/jdk12-jdk.1015/src.full/open/test/hotspot/jtreg/serviceability/sa:/scratch/mesos/slaves/2dd962d0-8988-479b-a804-57ab764ada59-S17888/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/3cc25363-7043-4ff2-8b71-364a6e907977/runs/c5fe3e2c-4a2a-43eb-8d8c-929eb27fe33b/testOutput/test-support/jtreg_open_test_hotspot_jtreg_hotspot_tier2_runtime/classes/1/test/lib \\
        -Dtest.src=/scratch/mesos/jib-master/install/jdk12-jdk.1015/src.full/open/test/hotspot/jtreg/serviceability/sa \\
        -Dtest.src.path=/scratch/mesos/jib-master/install/jdk12-jdk.1015/src.full/open/test/hotspot/jtreg/serviceability/sa:/scratch/mesos/jib-master/install/jdk12-jdk.1015/src.full/open/test/lib \\
        -Dtest.classes=/scratch/mesos/slaves/2dd962d0-8988-479b-a804-57ab764ada59-S17888/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/3cc25363-7043-4ff2-8b71-364a6e907977/runs/c5fe3e2c-4a2a-43eb-8d8c-929eb27fe33b/testOutput/test-support/jtreg_open_test_hotspot_jtreg_hotspot_tier2_runtime/classes/1/serviceability/sa/TestJmapCoreMetaspace.d \\
        -Dtest.class.path=/scratch/mesos/slaves/2dd962d0-8988-479b-a804-57ab764ada59-S17888/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/3cc25363-7043-4ff2-8b71-364a6e907977/runs/c5fe3e2c-4a2a-43eb-8d8c-929eb27fe33b/testOutput/test-support/jtreg_open_test_hotspot_jtreg_hotspot_tier2_runtime/classes/1/serviceability/sa/TestJmapCoreMetaspace.d:/scratch/mesos/slaves/2dd962d0-8988-479b-a804-57ab764ada59-S17888/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/3cc25363-7043-4ff2-8b71-364a6e907977/runs/c5fe3e2c-4a2a-43eb-8d8c-929eb27fe33b/testOutput/test-support/jtreg_open_test_hotspot_jtreg_hotspot_tier2_runtime/classes/1/test/lib \\
        -Dtest.vm.opts=-XX:MaxRAMPercentage=6 \\
        -Dtest.tool.vm.opts=-J-XX:MaxRAMPercentage=6 \\
        -Dtest.compiler.opts= \\
        -Dtest.java.opts= \\
        -Dtest.jdk=/scratch/mesos/jib-master/install/jdk12-jdk.1015/macosx-x64-debug.jdk/jdk-12/fastdebug \\
        -Dcompile.jdk=/scratch/mesos/jib-master/install/jdk12-jdk.1015/macosx-x64-debug.jdk/jdk-12/fastdebug \\
        -Dtest.timeout.factor=4.0 \\
        -Dtest.nativepath=/scratch/mesos/jib-master/install/jdk12-jdk.1015/macosx-x64-debug.test/hotspot/jtreg/native \\
        -classpath /scratch/mesos/slaves/2dd962d0-8988-479b-a804-57ab764ada59-S17888/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/3cc25363-7043-4ff2-8b71-364a6e907977/runs/c5fe3e2c-4a2a-43eb-8d8c-929eb27fe33b/testOutput/test-support/jtreg_open_test_hotspot_jtreg_hotspot_tier2_runtime/classes/1/serviceability/sa/TestJmapCoreMetaspace.d:/scratch/mesos/jib-master/install/jdk12-jdk.1015/src.full/open/test/hotspot/jtreg/serviceability/sa:/scratch/mesos/slaves/2dd962d0-8988-479b-a804-57ab764ada59-S17888/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/3cc25363-7043-4ff2-8b71-364a6e907977/runs/c5fe3e2c-4a2a-43eb-8d8c-929eb27fe33b/testOutput/test-support/jtreg_open_test_hotspot_jtreg_hotspot_tier2_runtime/classes/1/test/lib:/scratch/mesos/jib-master/install/jdk12-jdk.1015/src.full/open/test/lib:/scratch/mesos/jib-master/install/java/re/jtreg/4.2/promoted/all/b13/bundles/jtreg_bin-4.2.zip/jtreg/lib/javatest.jar:/scratch/mesos/jib-master/install/java/re/jtreg/4.2/promoted/all/b13/bundles/jtreg_bin-4.2.zip/jtreg/lib/jtreg.jar \\
        TestJmapCore run metaspace
result: Error. Agent error: java.lang.Exception: Agent 6 timed out with a timeout of 960 seconds; check console log for any additional details


test result: Error. Agent error: java.lang.Exception: Agent 6 timed out with a timeout of 960 seconds; check console log for any additional details


The test has a default timeout of 240 seconds and a default
timeout factor of 4 for a total timeout of 960 seconds/16 minutes.
The test machine has 8GB of memory and 8 cores.

The task's stdout shows:

    -concurrency:4

for the test execution task.

Update: The "Reasons" column for the failure shows:

Crash: Internal Error ...debug.cpp...fatal error: OutOfMemory encountered: Metaspace

but that's normal for this test. It crashes the VM with that failure and then
tries to analyze the resulting core file.
Comments
I just saw another timeout of this test when I did 400 runs. The test took 43 minutes to run. The current timeout is 480 seconds, which is 8 minutes, but with a timeoutfactor of 4 is 32 minutes. Possibly we should bump the timeout even more, although given the rarity I'm inclined to ignore it for now.
31-07-2020

Fix request (11u) -- will label after testing completed. I would like to downport this for parity with 11.0.9-oracle. Applies clean.
02-06-2020

URL: https://hg.openjdk.java.net/jdk/jdk/rev/a26003fdb982 User: cjplummer Date: 2020-04-30 04:05:20 +0000
30-04-2020

TestJmapCoreMetaspace.java is very much like TestJmapCore.java. The former just calls the later with the "metaspace" argument instead of the "heap" argument. When in metaspace mode it drains the metaspace through endless loading of generated test classes. When in "heap" mode it drains the java heap. Both have a goal of generating a core file and then verifying that jhsdb jmap can be run on it. For the most part this timeout only seems to happen with TestJmapCoreMetaspace.java. I only see it occurring once with TestJmapCore.java, and that was with a personal job, and the timeout took 50 minutes (I confirmed that it was using the same timeout factor of 4 as the failing TestJmapCoreMetaspace runs). So it seems producing a core by filling up the metaspace causes a timeout much more easily than filling up the java heap. In any case, I think maybe increasing the timeout is called for. Currently it is 240, so with the 4x timeout factor that give 16 minutes. Doubling it to 480 hopefully will make these failures at least very very rare.
11-04-2020

The anecdotal evidence is that these kinds of timeouts always occur with core dumps involved. Exactly when/why they are sometimes so slow we seem to have zero idea.
11-04-2020

[~dholmes]I don't know if in general core dumps are slow on macOS. If I look at recent runs, they seem to be about as fast on macOS as the other platforms, mostly finishing in 3-5 minutes. If I look at the slowest runs, I see all the failing runs take 18 to 27 minutes, but I do see some passing runs taking as long as 19 minutes, and there are 20+ runs taking 14 to 19 minutes. Maybe on occasion the mac gets into some sort of memory thrashing state trying to dump the core, making it take a very long time. We could bump up the timeout, although it's hard to say how long those runs that have timed out would have taken if given more time to complete.
11-04-2020

Isn't this just the known issue that core dumps on macOS are really time consuming?
16-03-2020

I looked in artifactory and found that there are two core files associated with this test run. The first (core.89940) is: 5197942784 bytes/4.84GB and the second (core.90593) is 2545033216 bytes/2.37 GB. This test's execution time issue could simply be the amount of time it takes to save the core files to the disk. Is there some reason to not limit the amount of Java heap used by the test via something like: -ms256m -mx256m Update: Looks like there is already a limit: open/test/hotspot/jtreg/serviceability/sa/TestJmapCore.java: ProcessBuilder pb = ProcessTools.createJavaProcessBuilder(true, "-XX:+CreateCoredumpOnCrash", "-Xmx512m", "-XX:MaxMetaspaceSize=64m", "-XX:+CrashOnOutOfMemoryError", TestJmapCore.class.getName(), type); so the maximum heap size is '512m'. I'm still curious about why there are two core files. Update: I suspect that one core file is from the test itself and that the second core file is from JTREG's test timeout artifact capture. The bigger core file is probably from the test driver Java process and the smaller core file is probably from the test itself. I ran the test on my other unloaded MBP-13 using fastdebug bits: real 4m16.115s user 3m21.328s sys 0m9.075s real 2m24.513s user 2m51.817s sys 0m7.476s real 2m45.754s user 3m20.785s sys 0m7.790s So 4+ minutes on an otherwise not busy machine is the longest time.
04-12-2018