JDK-8239089 : LargeCopyWithMark.java failed due to OutOfMemoryError
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 15,16,17
  • Priority: P2
  • Status: Resolved
  • Resolution: Duplicate
  • OS: os_x
  • CPU: x86_64
  • Submitted: 2020-02-14
  • Updated: 2021-05-13
  • Resolved: 2021-04-09
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 17
17Resolved
Related Reports
Duplicate :  
Relates :  
Relates :  
Sub Tasks
JDK-8255969 :  
Description
The following test failed due to OutOfMemoryError in the JDK15 CI:

java/io/BufferedInputStream/LargeCopyWithMark.java

Here's a snippet from the log file:

#section:main
----------messages:(4/163)----------
command: main LargeCopyWithMark
reason: User specified action: run main/othervm LargeCopyWithMark 
Mode: othervm [/othervm specified]
elapsed time (seconds): 2.31
----------configuration:(0/0)----------
----------System.out:(1/1188)----------
Command line: [/mesos/work_dir/jib-master/install/jdk-15+11-329/macosx-x64.jdk/jdk-15.jdk/Contents/Home/bin/java -cp /mesos/work_dir/slaves/90726e33-be99-4e27-9d68-25dad266ef13-S550/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/e53678c2-4411-422a-8908-8ecb323c6414/runs/d24b5a50-b19d-4d6c-936f-29b1b9d9c89b/testoutput/test-support/jtreg_open_test_jdk_tier2_part2/classes/6/java/io/BufferedInputStream/LargeCopyWithMark.d:/mesos/work_dir/jib-master/install/jdk-15+11-329/src.full/open/test/jdk/java/io/BufferedInputStream:/mesos/work_dir/slaves/90726e33-be99-4e27-9d68-25dad266ef13-S550/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/e53678c2-4411-422a-8908-8ecb323c6414/runs/d24b5a50-b19d-4d6c-936f-29b1b9d9c89b/testoutput/test-support/jtreg_open_test_jdk_tier2_part2/classes/6/test/lib:/mesos/work_dir/jib-master/install/jdk-15+11-329/src.full/open/test/lib:/mesos/work_dir/jib-master/install/java/re/jtreg/4.2/promoted/all/b16/bundles/jtreg_bin-4.2.zip/jtreg/lib/javatest.jar:/mesos/work_dir/jib-master/install/java/re/jtreg/4.2/promoted/all/b16/bundles/jtreg_bin-4.2.zip/jtreg/lib/jtreg.jar -Xmx4G -ea:LargeCopyWithMark$Child LargeCopyWithMark$Child ]
----------System.err:(18/1185)----------
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
	at java.base/java.io.BufferedInputStream.fill(BufferedInputStream.java:230)
	at java.base/java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
	at java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:343)
	at LargeCopyWithMark$Child.main(LargeCopyWithMark.java:75)
java.lang.AssertionError: Test failed: exit code = 1
	at LargeCopyWithMark.main(LargeCopyWithMark.java:51)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:564)
	at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:127)
	at java.base/java.lang.Thread.run(Thread.java:832)

JavaTest Message: Test threw exception: java.lang.AssertionError: Test failed: exit code = 1
JavaTest Message: shutting down test

STATUS:Failed.`main' threw exception: java.lang.AssertionError: Test failed: exit code = 1
----------rerun:(33/5419)*----------

The test machine used for this run has 64GB of RAM.
Starting this as a P3 since this is a Tier2 test.
Comments
Closing as a duplicate of JDK-8214455
09-04-2021

Likely to be fixed with JDK-8214455.
07-04-2021

[~iklam] Thanks for picking this up!
06-04-2021

Review for JDK-8214455: https://github.com/openjdk/jdk/pull/3349
05-04-2021

# 100% reproducer on Linux/x64 -- Use -XX:HeapBaseMinAddress=0x600000000 to get the # heap at a specific location for easy reproducibility. # # The test needs about 2GB of memory, so -Xmx2500m should allow the test to pass. However, the # test fails with CDS enabled. # # (1) This is the normal heap range for -Xmx1800m $ java -Xshare:off -XX:HeapBaseMinAddress=0x600000000 -Xlog:gc+heap+exit -Xmx1800m -version 2>&1 | grep garbage-first [0.210s][info][gc,heap,exit] garbage-first heap total 1030144K, used 2048K [0x0000000600000000, 0x0000000670800000) # (2) Dump a CDS archive here. Notice that the archived regions are at the top # of the heap. $ java -XX:HeapBaseMinAddress=0x600000000 -Xmx1800m -Xshare:dump -Xlog:cds=debug | egrep 'Shared.*((ca0)|(oa0))' [2.075s][info ][cds] Shared file region (ca0) 3: 528384 bytes, addr 0x0000000670700000 file offset 0x00c62000 crc 0x8c80f536 [2.076s][info ][cds] Shared file region (oa0) 5: 483328 bytes, addr 0x0000000670600000 file offset 0x00ce3000 crc 0xcc58b622 # (3) This is how the archived regions are mapped into the middle of a 2500m heap -- due to JDK-8214455 # $ java -Xlog:cds=debug -XX:HeapBaseMinAddress=0x600000000 -Xlog:gc+heap+exit -Xmx2500m -version 2>&1 | egrep '((garbage-first)|(map heap data))' [0.034s][info][cds] Trying to map heap data: region[3] at 0x0000000670600000, size = 528384 bytes [0.034s][info][cds] Trying to map heap data: region[5] at 0x0000000670500000, size = 483328 bytes [0.199s][info][gc,heap,exit] garbage-first heap total 1034240K, used 4062K [0x0000000600000000, 0x000000069c400000) There's roughly 1.8g and 0.7g memory on the two sides of the (unmoveable) CDS regions 0x0000000670700000 - 0x0000000600000000 = 1886388224 0x000000069c400000 - 0x0000000670700000 = 735051776 # (4) Modify test/jdk/java/io/BufferedInputStream/LargeCopyWithMark.java ==> 100% failure * @run main/othervm -Xlog:cds=debug -XX:HeapBaseMinAddress=0x600000000 -Xlog:gc+heap+exit -Xmx2500m -Xshare:on * -Xlog:gc,gc+heap,gc+ergo+heap -XX:+CrashOnOutOfMemoryError * -XX:+IgnoreUnrecognizedVMOptions -XX:+G1ExitOnExpansionFailure * LargeCopyWithMark # (5) with the exact same @run arguments as above, but change to -Xshare:off ==> the test passes
05-04-2021

[~stefank] [~bpb] Will fixing this help? JDK-8214455 - CDS archived heap should always be relocated to the top of the G1 heap
05-04-2021

[~stefank] Thanks for the rapid follow up on this problem!
01-04-2021

Some more testing shows that when the CDS is dumped the heap is placed at a lower address than when the CDS dump is then used: java -Xmx4g -Xlog:cds*,gc+ergo+heap=debug,gc+heap*=debug -Xshare:dump -version Gives this heap range: 0x6c0000000-0x7c0000000 java -Xmx4g -Xlog:cds*,gc+ergo+heap=debug,gc+heap*=debug -Xshare:on -version Then uses this heap range: 0x700000000-0x800000000 but then maps the unmovable archive regions just below the end of the previous largest heap address (0x7c0000000).
01-04-2021

I've reassigned this to hotspot/gc, reset the bug for retriage, and unassigned [~bpb].
01-04-2021

This is the heap layout from a run with more instrumentation. Adjacent regions of the same type have been compressed to make it easier to see the gaps: 0x0000000700000000 0x0000000700800000 F 8MB 0x0000000700800000 0x0000000700a00000 O 2MB 0x0000000700a00000 0x000000073fc00000 F 1010MB 0x000000073fc00000 0x000000073fe00000 HS 2MB 0x000000073fe00000 0x000000077fe00000 HC 1024MB 0x000000077fe00000 0x00000007bfc00000 F 1022MB 0x00000007bfc00000 0x00000007bfe00000 OA 2MB 0x00000007bfe00000 0x00000007c0000000 CA 2MB 0x00000007c0000000 0x0000000800000000 F 1024MB E=young(eden), S=young(survivor), O=old, HS=humongous(starts), HC=humongous(continues), CS=collection set, F=free, OA=open archive, CA=closed archive, TAMS=top-at-mark-start (previous, next) It looks like the CDS archive (OA, CA) splits two large memory areas (1022MB, 1024MB). Maybe this is one reason why this fails? I don't know why it is placed at that location. Another interesting thing to note is that it looks like the "total" heap size shrinks after the last Full GC: Event: 1.171 GC heap before {Heap before GC invocations=4 (full 1): garbage-first heap total 4194304K, used 1053076K [0x0000000700000000, 0x0000000800000000) region size 2048K, 0 young (0K), 0 survivors (0K) Metaspace used 388K, committed 576K, reserved 1056768K class space used 31K, committed 128K, reserved 1048576K } Event: 1.201 GC heap after {Heap after GC invocations=5 (full 2): garbage-first heap total 3522560K, used 1053072K [0x0000000700000000, 0x0000000800000000) region size 2048K, 0 young (0K), 0 survivors (0K) Metaspace used 388K, committed 576K, reserved 1056768K class space used 31K, committed 128K, reserved 1048576K } I think this needs to be investigated by the G1 devs.
01-04-2021

I wonder if the heap expansion fails, or if there's a fragmentation problem. Maybe there's a bug in the heap expansion vs full GC compaction. I've created JDK-8264489, to add logging and also crash the JVM when we get an OOME. The later will generate a map of all allocated heap regions, and should help aid debugging any fragmentation issues. I think.
31-03-2021

There is no record of this test having failed since 2020-10-22. It is hence being resolved as irreproducible. If an occurrence of the failure is subsequently observed, then this issue may be reopened.
08-02-2021

As of today there have been no recorded sightings of this issue in three months. Time to resolve it as Cannot Reproduce?
22-01-2021

Adding the option -Xms4G to the test sub-process as follows --- a/test/jdk/java/io/BufferedInputStream/LargeCopyWithMark.java +++ b/test/jdk/java/io/BufferedInputStream/LargeCopyWithMark.java @@ -43,9 +43,8 @@ System.out.println("Test runs on 64 bit platforms"); return; } - ProcessBuilder pb = createJavaProcessBuilder("-Xmx4G", - "-ea:LargeCopyWithMark$Child", - "LargeCopyWithMark$Child"); + ProcessBuilder pb = createJavaProcessBuilder("-Xms4G", "-Xmx4G", + "-ea:LargeCopyWithMark$Child", "LargeCopyWithMark$Child"); int res = pb.inheritIO().start().waitFor(); if (res != 0) { throw new AssertionError("Test failed: exit code = " + res); made 5/5 runs of the test pass on the machine in question whereas without this option 5/5 runs of the test failed. For some reason on this machine heap memory seems to be able to allocated up front but not later in the process.
27-03-2020

Contrary to the label this bug does not appear to be intermittent in the usual sense. It appears to fail 100% of the time on a specific machine. So the intermittence is in terms of how often that machine is used in the CI test runs.
19-03-2020

The test passes on the host in question if the max memory is increased to 8G but this is not really a good solution.
04-03-2020

I've only seen this on the one particular host thus far. It's baffling. If one prints the value of Runtime::freeMemory() one line before the allocation which fails with OOME, then the printed value is greater than the number of bytes which is attempted to be allocated.
04-03-2020

This test is rather strange. It is already run in othervm mode, but the main() of the test uses ProcessBuilder which creates yet another process in which a child class of the test is run and that child does the actual testing. The ProcessBuilder code could just be removed if the jtreg tags were modified appropriately, e.g., "@requires (sun.arch.data.model == "64" & os.maxMemory > 8g)" and "@run main/othervm -Xmx8g LargeCopyWithMark".
02-03-2020