JDK-8309322 : [GenShen] TestAllocOutOfMemory#large failed
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2023-06-01
  • Updated: 2023-07-11
  • Resolved: 2023-06-07
Related Reports
Relates :  
Relates :  
Relates :  
Description
Reported by Martin Doerr:

One test has failed on PPC64:
gc/shenandoah/oom/TestAllocOutOfMemory.java#large
Execution failed: `main' threw exception: java.lang.RuntimeException: 'java.lang.OutOfMemoryError: Java heap space' missing from stdout/stderr
 
----

See comments below for further updates on the investigation.

Comments
The relaxation of the card table page size commit quantization that results in the extra alignment will be addressed separately in a ticket that I shall link here.
07-06-2023

The changes were rolled into PR https://github.com/openjdk/jdk/pull/14185 in commit https://github.com/openjdk/jdk/pull/14185/commits/88958669d3f6c60bb6d115cc4e345f7ac1a2686e. I'll close this as resolved, pending verification by [~mdoerr] (thanks!).
07-06-2023

Attached a patch that modifies the test so it's cognizant of the actual heap size on account of alignment related size adjustments. Here's a description: JDK-8309322: [GenShen] TestAllocOutOfMemory#large failed When generational Shenandoah is used, there may be an additional alignment related heap size adjustment that the test should be cognizant of. Such alignment might also happen in the non-generational case, but in this case the specific size used in the test was affected on machines with larger than usual os page size settings. The alignment related adjustment would have affected all generational collectors (except perhaps Gen Z). In the future, we might try and relax this alignment constraint.
07-06-2023

My bad; I had assumed that the test failed because it didn't get an OOM in the requisite time because of the larger than requested heap, but you are right it doesn't get an OOM at all because of the larger than requested heap. I'll modify the test so it's cognizant of the upward alignment in the generational case, depending on the native page size. The correct fix is the RFE to avoid the potentially unreasonable upward alignment, an issue with all current generational collectors (other than Generational ZGC).
05-06-2023

Wait. This issue is not related to any timeout AFAICS. The test expects an allocation failure which we never hit because -Xmx16m is ignored and we get 32m instead. I can't see any relationship to JDK-8309317.
05-06-2023

Martin confirmed that all the generational collectors other than GenZ, align heap size upwards based on os-page-sized quanta of card table., and are all similarly affected. This may be too strong for platforms with larger page sizes since there is a "dilatation" of heap size that is best avoided, and at the very least should be communicated to the user. This more general question will be addressed in an RFE that I'll link here. Meanwhile, the timeout itself resulting from the larger max heap size is addressed more generally in https://bugs.openjdk.org/browse/JDK-8309317, so the test should now pass even though the test nominally sizes the heap larger than requested. I'll close the ticket with links to JDK-8309317 and to the aforementioned RFE once I have filed it.
05-06-2023

From Y. S .Ramakrishna: The 64K page size seems to be the root cause (and related machinations in gen shen related to card table sizing quanta), as was conjectured by William. We’ll take a look at it and see how we can address this. Appreciate the extra information, thanks Martin!
01-06-2023

From Martin Doerr: jdk/bin/java -Xmx16m -XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC -XX:ShenandoahGCMode=generational -Xlog:gc+heap=debug,pagesize=debug -version [0.018s][info][pagesize] CodeHeap 'non-nmethods': min=2M max=8640K base=0x00007fff6b120000 page_size=64K size=8640K [0.018s][info][pagesize] CodeHeap 'profiled nmethods': min=2M max=118528K base=0x00007fff63d60000 page_size=64K size=118528K [0.019s][info][pagesize] CodeHeap 'non-profiled nmethods': min=2M max=118592K base=0x00007fff6b990000 page_size=64K size=118592K [0.022s][debug][gc,heap ] Minimum heap 6815744 Initial heap 16777216 Maximum heap 33554432 [0.038s][info ][pagesize] Card Table: min=128K max=128K base=0x00007fff80000000 page_size=64K size=128K [0.038s][info ][pagesize] Card Table: min=128K max=128K base=0x00007fff5dc90000 page_size=64K size=128K
01-06-2023

From Y. S. Ramakrishna: What do you see if you add `pagesize=debug` to the `-Xlog:` incantation on that machine where the `-Xmx16m` is giving an effective max heap size 32 MB below?
01-06-2023

From Martin Doerr: the PPC64 test failure is not a general PPC64 problem. The test has passed on another PPC64 machine. The machine on which I had seen the failure has more than 120GB of memory. The VM seems to ignore the -Xmx16m setting: jdk/bin/java -Xmx16m -XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC -XX:ShenandoahGCMode=generational -Xlog:gc+heap=debug -version [0.025s][debug][gc,heap] Minimum heap 6815744 Initial heap 16777216 Maximum heap 33554432 With that, the allocation passes, and the expected allocation failure doesn’t get hit. When I omit the generational mode, I’m getting “Maximum heap 16777216” which satisfies the test’s needs.
01-06-2023