JDK-8214217 : [TESTBUG] runtime/appcds/LotsOfClasses.java failed with fragmented heap
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 12
  • Priority: P2
  • Status: Resolved
  • Resolution: Fixed
  • OS: generic
  • CPU: generic
  • Submitted: 2018-11-22
  • Updated: 2019-05-28
  • Resolved: 2018-11-27
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 12
12 b22Fixed
Related Reports
Relates :  
Description
The test failed with following error message:
Error occurred during initialization of VM
Unable to use shared archive.
];
 stderr: [Java HotSpot(TM) 64-Bit Server VM warning: Pre JDK 1.5 class not supported by CDS: 45.3 javax/management/remote/rmi/RMIConnectionImpl_Stub
Java HotSpot(TM) 64-Bit Server VM warning: Pre JDK 1.5 class not supported by CDS: 45.3 javax/management/remote/rmi/RMIConnectionImpl_Stub
Java HotSpot(TM) 64-Bit Server VM warning: Pre JDK 1.5 class not supported by CDS: 45.3 javax/management/remote/rmi/RMIServerImpl_Stub
Java HotSpot(TM) 64-Bit Server VM warning: Pre JDK 1.5 class not supported by CDS: 45.3 javax/management/remote/rmi/RMIServerImpl_Stub
Java HotSpot(TM) 64-Bit Server VM warning: Pre JDK 1.5 class not supported by CDS: 45.3 java/rmi/activation/ActivationGroup_Stub
Java HotSpot(TM) 64-Bit Server VM warning: Pre JDK 1.5 class not supported by CDS: 45.3 java/rmi/activation/ActivationGroup_Stub
Java HotSpot(TM) 64-Bit Server VM warning: Pre JDK 1.5 class not supported by CDS: 45.3 sun/rmi/server/Activation$ActivationSystemImpl_Stub
Java HotSpot(TM) 64-Bit Server VM warning: Pre JDK 1.5 class not supported by CDS: 45.3 sun/rmi/server/Activation$ActivationSystemImpl_Stub
Java HotSpot(TM) 64-Bit Server VM warning: Pre JDK 1.5 class not supported by CDS: 45.3 com/sun/jndi/rmi/registry/ReferenceWrapper_Stub
Java HotSpot(TM) 64-Bit Server VM warning: Pre JDK 1.5 class not supported by CDS: 45.3 com/sun/jndi/rmi/registry/ReferenceWrapper_Stub
An error has occurred while processing the shared archive file.
Unable to write archive heap memory regions: number of memory regions exceeds maximum due to fragmentation
]
 exitValue = 1

java.lang.RuntimeException: Expected to get exit value of [0]

	at jdk.test.lib.process.OutputAnalyzer.shouldHaveExitValue(OutputAnalyzer.java:419)
	at jdk.test.lib.cds.CDSTestUtils.checkDump(CDSTestUtils.java:269)
	at LotsOfClasses.main(LotsOfClasses.java:62)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:567)
	at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:246)
	at java.base/java.lang.Thread.run(Thread.java:835)
Comments
[~dcubed]Thanks for the suggestion! I've also asked Ioi about ProblemListing the test last night. Ioi said he's okay with my test fix for now. I'll update my change with more info in the error message and have it re-reviewed and integrated today.
27-11-2018

[~jiangli] - Please ProblemList this test while you and Ioi hash out the details.
27-11-2018

[~iklam]It would be good to bring your finding to the GC expert. It doesn't seem to be a heap expending issue to me, because there is enough space to do the normal allocation. With this stress test, the issue is because GC activities cause some of the high address allocated regions being freed before archiving start. However, the freed space (gap in the allocated high address regions) is not large enough for archiving in the failed case. Used regions at high address doesn't cause issue for archiving because we continue search for a free region at lower address until one is find. The regions lower than the highest free one are not used, as long as there is no GC activities (not including allocations). That's our normal use case so far.
27-11-2018

[~jiangli] Look for " 55M->39M(668M)" in my output above. The "668M" is what I meant by "committed size" of the heap.
27-11-2018

The requested GC region does not need to be committed before archive allocation. When GC finds a free region at the heap top, the memory is then committed (if not yet already). Please see HeapRegionManager::find_highest_free(bool* expanded).
27-11-2018

We have at least one bug in GC -- failure to expand the heap to maximize contiguous heap regions: I ran the test on a Solaris machine that had a recent failure. -XX:MaxRAMPercentage=1 is taken from the test log, which affects how much heap is requested by the JVM. This machine has lots of memory, so even at 1% of max ram, we end up with a max heap size of 1308MB. $ xxxxxx/jib-master/install/jdk12-jdk.957/solaris-sparcv9-debug.jdk/jdk-12/fastdebug/bin/java -Xlog:gc* -XX:MaxRAMPercentage=1 -Xshare:dump -XX:SharedArchiveFile=/tmp/xxx -XX:ExtraSharedClassListFile=/tmp/LotsOfClasses-test.classlist --add-modules ALL-SYSTEM [0.031s][info][gc,heap] Heap region size: 1M [0.133s][info][gc ] Using G1 [0.133s][info][gc,heap,coops] Heap address: 0x000000076e400000, size: 1308 MB, Compressed Oops mode: Zero based, Oop shift amount: 3 Allocated shared space: 3221225472 bytes at 0x0000000800000000 Loading classes to share ... [ ...snip ....] [ ... the following is the last GC log before the error ...] [131.186s][info][gc,start ] GC(15) Pause Young (Normal) (G1 Evacuation Pause) [131.186s][info][gc,task ] GC(15) Using 16 workers of 25 for evacuation [131.274s][info][gc,phases ] GC(15) Pre Evacuate Collection Set: 0.1ms [131.274s][info][gc,phases ] GC(15) Evacuate Collection Set: 70.6ms [131.274s][info][gc,phases ] GC(15) Post Evacuate Collection Set: 16.3ms [131.274s][info][gc,phases ] GC(15) Other: 1.3ms [131.274s][info][gc,heap ] GC(15) Eden regions: 32->0(28) [131.274s][info][gc,heap ] GC(15) Survivor regions: 1->5(5) [131.274s][info][gc,heap ] GC(15) Old regions: 15->23 [131.274s][info][gc,heap ] GC(15) Archive regions: 0->0 [131.274s][info][gc,heap ] GC(15) Humongous regions: 0->0 [131.274s][info][gc,metaspace ] GC(15) Metaspace: 158472K->158472K(1189888K) [131.274s][info][gc ] GC(15) Pause Young (Normal) (G1 Evacuation Pause) 47M->27M(668M) 88.223ms [131.274s][info][gc,cpu ] GC(15) User=1.08s Sys=0.01s Real=0.09s [139.022s][info][gc,start ] GC(16) Pause Young (Normal) (G1 Evacuation Pause) [139.022s][info][gc,task ] GC(16) Using 16 workers of 25 for evacuation [139.143s][info][gc,phases ] GC(16) Pre Evacuate Collection Set: 0.1ms [139.143s][info][gc,phases ] GC(16) Evacuate Collection Set: 103.9ms [139.143s][info][gc,phases ] GC(16) Post Evacuate Collection Set: 16.7ms [139.143s][info][gc,phases ] GC(16) Other: 1.1ms [139.143s][info][gc,heap ] GC(16) Eden regions: 28->0(28) [139.144s][info][gc,heap ] GC(16) Survivor regions: 5->5(5) [139.144s][info][gc,heap ] GC(16) Old regions: 23->36 [139.144s][info][gc,heap ] GC(16) Archive regions: 0->0 [139.144s][info][gc,heap ] GC(16) Humongous regions: 0->0 [139.144s][info][gc,metaspace ] GC(16) Metaspace: 158472K->158472K(1189888K) [139.144s][info][gc ] GC(16) Pause Young (Normal) (G1 Evacuation Pause) 55M->39M(668M) 121.817ms [139.144s][info][gc,cpu ] GC(16) User=1.57s Sys=0.02s Real=0.12s Rewriting and linking classes: done [... snip ...] Relocating SystemDictionary::_well_known_klasses[] ... Removing java_mirror ... done. An error has occurred while processing the shared archive file. Unable to write archive heap memory regions: number of memory regions exceeds maximum due to fragmentation Error occurred during initialization of VM Unable to use shared archive. The heap was still at 668M. If the heap is expanded to its max size (1308MB) we will have enough contiguous space to allocate the 32MB of archived objects. However, running with -Xmx1308M on my Linux box does not cause the failure. Perhaps this is due to a larger number of concurrent GC threads and slower CPUs on the Solaris box? Note that the committed heap is even smaller on Linux (only 140M), but somehow dumping succeeded. This is the last gc report: [78.532s][info][gc,heap ] GC(24) Eden regions: 6->0(8) [78.532s][info][gc,heap ] GC(24) Survivor regions: 1->1(1) [78.532s][info][gc,heap ] GC(24) Old regions: 37->41 [78.532s][info][gc,heap ] GC(24) Archive regions: 0->0 [78.532s][info][gc,heap ] GC(24) Humongous regions: 0->0 [78.532s][info][gc,metaspace ] GC(24) Metaspace: 177847K->177847K(1208320K) [78.532s][info][gc ] GC(24) Pause Young (Normal) (G1 Evacuation Pause) 42M->40M(140M) 169.019ms [78.532s][info][gc,cpu ] GC(24) User=0.49s Sys=0.01s Real=0.17s Rewriting and linking classes: done So this bug is concerning both in terms of randomness/uncertainty, and in terms of excessive usage of heap resources.
27-11-2018

Changing to P2, since the failure rate in tier3 is high.
27-11-2018

If GC decides to allocate old regions at higher address (top of the java heap), doing a full GC might make the problem worse. I think we should not make assumptions about the GC activities and implementation details in this case. The current behavior that checks for allocated GC regions for archiving is the right approach.
27-11-2018

Maybe forcing a GC would cause the allocated regions to be consolidated and thus free up more contiguous space? It's worth trying and looking at the GC log to see what happens. When I run with 128MB, the very last GC shows "total 131072K, used 31744K": [10.978s][debug ][gc,heap ] GC(11) Heap after GC invocations=12 (full 0): garbage-first heap total 131072K, used 31744K [0x00000000f8000000, 0x0000000100000000) [10.978s][debug ][gc,heap ] GC(11) region size 1024K, 5 young (5120K), 5 survivors (5120K) [10.978s][debug ][gc,heap ] GC(11) Metaspace used 173511K, capacity 173839K, committed 176104K, reserved 1204224K [10.978s][debug ][gc,heap ] GC(11) class space used 21212K, capacity 21322K, committed 21452K, reserved 1048576K [10.978s][info ][gc ] GC(11) Pause Young (Normal) (G1 Evacuation Pause) 52M->31M(128M) 18.245ms [10.978s][info ][gc,cpu ] GC(11) User=0.04s Sys=0.01s Real=0.02s [11.072s][debug ][gc,refine ] Activated worker 0, on threshold: 28, current: 28 [11.073s][debug ][gc,refine ] Deactivated worker 0, off threshold: 25, current: 25, processed: 3 [11.126s][debug ][gc,refine ] Activated worker 0, on threshold: 28, current: 28 [11.127s][debug ][gc,refine ] Deactivated worker 0, off threshold: 25, current: 25, processed: 3 [11.158s][debug ][gc,refine ] Activated worker 0, on threshold: 28, current: 28 [11.159s][debug ][gc,refine ] Deactivated worker 0, off threshold: 25, current: 25, processed: 3 [11.190s][debug ][gc,refine ] Activated worker 0, on threshold: 28, current: 28 [11.191s][debug ][gc,refine ] Deactivated worker 0, off threshold: 25, current: 25, processed: 3 [11.206s][debug ][gc,refine ] Activated worker 0, on threshold: 28, current: 28 [11.207s][debug ][gc,refine ] Deactivated worker 0, off threshold: 25, current: 25, processed: 3 [11.253s][debug ][gc,refine ] Activated worker 0, on threshold: 28, current: 28 [11.254s][debug ][gc,refine ] Deactivated worker 0, off threshold: 25, current: 25, processed: 3 [11.279s][debug ][gc,refine ] Activated worker 0, on threshold: 28, current: 28 [11.280s][debug ][gc,refine ] Deactivated worker 0, off threshold: 25, current: 25, processed: 3 Rewriting and linking classes: done This was about the time where all classes have been load, so there would be very little object allocation afterwards. This means that there is over 96MB of free space in the heap. Why would it be fragmented when the CA/OA allocation only needs about 32MB?
26-11-2018

Forcing full GC would not help much since very few objects would be freed. The determine factor is the java heap size. Updating the error message without more info sounds okay to me.
26-11-2018

We should have a better error message than "number of memory regions exceeds maximum due to fragmentation". Maybe some info like "you should expand the heap size (current heap size is xxx)". Also, I wonder if we will have more contiguous space by forcing a full GC before entering the safepoint (VM_PopulateDumpSharedSpace::doit()).
26-11-2018

I can force test to fail on Linux by running with -vmoptions:"-Xlog:gc*=debug -Xmx128m". An error has occurred while processing the shared archive file. Unable to write archive heap memory regions: number of memory regions exceeds maximum due to fragmentation Error occurred during initialization of VM Unable to use shared archive. The failure can no longer be reproduced with -Xmx256m. In a successful run, the size of the oa and ca regions are: ca0 space: 573440 [ 0.2% of total] out of 573440 bytes [100.0% used] at 0x00000000fe900000 ca1 space: 23068672 [ 6.1% of total] out of 23068672 bytes [100.0% used] at 0x00000000fea00000 oa0 space: 892928 [ 0.2% of total] out of 892928 bytes [100.0% used] at 0x00000000fe200000 oa1 space: 6291456 [ 1.7% of total] out of 6291456 bytes [100.0% used] at 0x00000000fe300000 So a total of about 30MB heap space is used by the archive. Since the ca and oa spaces are a copy of the original objects, there's another 30MB of the original objects in the heap, plus some other heap objects that are not archived.
26-11-2018

I ran the test on solaris with gc region logging enabled, all allocations were done within the lower 2G range. Using a 3G java heap probably should be able to avoid the heap fragmentation issue. I tried 3G java heap and ran 100 times on solaris-sparcv9. All 100 runs passed.
26-11-2018

Spotted in my jdk-12+21 Thread-SMR stress testing on Linux-X64: $ unzip -l jdk-12+21_linux.8214217.zip Archive: jdk-12+21_linux.8214217.zip Length Date Time Name --------- ---------- ----- ---- 68001 2018-11-22 20:56 jdk-12+21_1/failures.linux-x86_64/LotsOfClasses.jtr.fastdebug 68442 2018-11-22 19:34 jdk-12+21_1/failures.linux-x86_64/LotsOfClasses.jtr.slowdebug 69429 2018-11-23 16:10 jdk-12+21_2/failures.linux-x86_64/LotsOfClasses.jtr.fastdebug 69866 2018-11-23 14:48 jdk-12+21_2/failures.linux-x86_64/LotsOfClasses.jtr.slowdebug 67987 2018-11-24 11:30 jdk-12+21_3/failures.linux-x86_64/LotsOfClasses.jtr.fastdebug 66715 2018-11-24 12:02 jdk-12+21_3/failures.linux-x86_64/LotsOfClasses.jtr.release 69867 2018-11-24 10:10 jdk-12+21_3/failures.linux-x86_64/LotsOfClasses.jtr.slowdebug --------- ------- 480307 7 files
26-11-2018

Spotted in my jdk-12+21 Thread-SMR stress testing on Solaris-X64: $ unzip -l jdk-12+21_solx64.8214217.zip Archive: jdk-12+21_solx64.8214217.zip Length Date Time Name --------- ---------- ----- ---- 69127 11-23-2018 18:33 jdk-12+21_2/failures.solaris-x86_64/LotsOfClasses.jtr.slowdebug 66209 11-23-2018 21:05 jdk-12+21_2/failures.solaris-x86_64/LotsOfClasses.jtr.release 69109 11-24-2018 14:42 jdk-12+21_3/failures.solaris-x86_64/LotsOfClasses.jtr.slowdebug --------- ------- 204445 3 files
26-11-2018

This appears to be a test issue. The VM's behavior is expected and correct. Archiving java heap objects requires allocating consecutive GC regions for the copied objects. If it fails to allocate consecutive regions, the archiving process is aborted and error is reported. The test should use a larger java heap for CDS dumping as it loads a lot of java classes.
25-11-2018