JDK-8258481 : gc.g1.plab.TestPLABPromotion fails on Linux x86
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 17
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • CPU: x86
  • Submitted: 2020-12-16
  • Updated: 2021-01-14
  • Resolved: 2021-01-08
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 17
17 b05Fixed
Related Reports
Relates :  
Description
On Linux x86, gc.g1.plab.TestPLABPromotion fails for me on many submit tests. 

Eg:
https://github.com/tstuefe/jdk/runs/1562960505
https://github.com/tstuefe/jdk/runs/1561833114
https://github.com/tstuefe/jdk/runs/1549560140

Also happens to others:
https://github.com/stuart-marks/jdk/runs/1561547066

Test fails with:

STDERR:
java.lang.RuntimeException: Expect that Survivor PLAB allocation are similar to all mem consumed
	at gc.g1.plab.TestPLABPromotion.checkLiveObjectsPromotion(TestPLABPromotion.java:188)
	at gc.g1.plab.TestPLABPromotion.checkResults(TestPLABPromotion.java:156)
	at gc.g1.plab.TestPLABPromotion.main(TestPLABPromotion.java:122)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:78)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:567)
	at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:298)
	at java.base/java.lang.Thread.run(Thread.java:831)

For details, please see GH action logs.


Comments
Changeset: b549cbd3 Author: Thomas Schatzl <tschatzl@openjdk.org> Date: 2021-01-08 10:52:08 +0000 URL: https://git.openjdk.java.net/jdk/commit/b549cbd3
08-01-2021

There is one test case that allocates byte arrays of ~3500 bytes, and expects that almost all allocations occur in the PLABs given a PLAB waste threshold of 20%. On x64 this is good, as the PLAB size is 4096 *words*, i.e. 32kb, and 20% of that is ~6.5kb. On x86 the PLAB size of 4096 words is only 16kb, and 20% of that is ~3.2kb. This threshold is less than these 3500 bytes, so the test fails. It does not fail always (but very often) because of the broken calculation for meeting the threshold: unless really *all* objects are of that 3500 bytes, the current calculation truncates that to 0% actual waste, which is below 20%.... The suggested fix is to lower the size of that array to 3250 bytes, which meets both criteria (and fix the broken calculations). Note that we should not change this array size to much lower, because there is another test that fails otherwise (i.e. it sets the waste threshold to 10%, i.e. the target threshold for this array is some value between 10 and 20%).
18-12-2020

The problem is that on 32 bits in that failing sub-test PLAB size is half as large as on 64 bits. So the increase in size of one of the test allocations in JDK-8257145 causes these allocations to be allocated directly and not in PLABs, causing the test failure. Another bug related to PLAB sizing is that before JDK-8257145 PLAB sizes have been dependent on the number of gc threads the VM is started with and the number of GC threads that are actually used which the test wasn't aware of. I.e. in the called VM you need to fix ParallelGCThreads too. A third bug is in the calculation of the ratios in the verification: they use integer math, which means that they are truncating operations. It's actually kind of surprising that the test works at all.
16-12-2020

Not all of my gh actions show this error, eg: https://github.com/tstuefe/jdk/runs/1550494050 does not. I sync very frequently and am usually close to the master HEAD.
16-12-2020

I strongly suspect that this is a test bug hence tagging it as such: I remember that when looking at this test for JDK-8257145 some object (array) sizes are specified in elements, but TLAB sizes are always passed in HeapWords. So the sizes the test uses may not be appropriate for x86, but then it should fail always on x86, not only "often".
16-12-2020