JDK-8229120 : Intermittent gc/stress/gclocker/TestExcessGCLockerCollections.java test failures
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 11.0.5-oracle,14,15,16
  • Priority: P5
  • Status: Open
  • Resolution: Unresolved
  • Submitted: 2019-08-05
  • Updated: 2024-07-30
Related Reports
Blocks :  
Relates :  
Relates :  
Sub Tasks
JDK-8229156 :  
Description
(synopsis is provisional, please change as you see fit)

After JDK-8048556 was fixed, my tier2 test runs caught a failure like this:

$ CONF=linux-x86_64-server-fastdebug make images run-test TEST=hotspot_gc TEST_VM_OPTS="-XX:+UseParallelGC -XX:+UseParallelOldGC -XX:+UseNUMA"

[2.862s][info][gc] GC(0) Pause Young (Allocation Failure) 113M->9M(992M) 225.825ms
[3.227s][info][gc] GC(1) Pause Young (Allocation Failure) 135M->10M(992M) 115.803ms
[3.387s][info][gc] GC(2) Pause Young (Allocation Failure) 106M->10M(992M) 81.497ms
[3.523s][info][gc] GC(3) Pause Young (Allocation Failure) 113M->10M(992M) 65.679ms
[3.675s][info][gc] GC(4) Pause Young (GCLocker Initiated GC) 111M->10M(992M) 80.585ms
[3.871s][info][gc] GC(5) Pause Young (Allocation Failure) 116M->11M(1013M) 94.161ms
[4.114s][info][gc] GC(6) Pause Young (GCLocker Initiated GC) 154M->11M(1005M) 85.957ms
[4.266s][info][gc] GC(7) Pause Young (GCLocker Initiated GC) 96M->11M(1009M) 46.096ms
[4.400s][info][gc] GC(8) Pause Young (GCLocker Initiated GC) 94M->11M(1000M) 49.102ms
...
[301.585s][info][gc] GC(2820) Pause Young (GCLocker Initiated GC) 255M->84M(528M) 5.565ms
[301.670s][info][gc] GC(2821) Pause Young (GCLocker Initiated GC) 303M->85M(528M) 4.062ms
[301.739s][info][gc] GC(2822) Pause Young (GCLocker Initiated GC) 255M->86M(528M) 3.772ms
[301.803s][info][gc] GC(2823) Pause Young (GCLocker Initiated GC) 264M->86M(528M) 4.339ms
[301.865s][info][gc] GC(2824) Pause Young (GCLocker Initiated GC) 269M->87M(528M) 4.608ms
Done.
];
 stderr: []
 exitValue = 0

java.lang.RuntimeException: '\[gc\s*\] .* \(GCLocker Initiated GC\) [1-9][0-9]?M' found in stdout 

	at jdk.test.lib.process.OutputAnalyzer.stdoutShouldNotMatch(OutputAnalyzer.java:388)
	at gc.stress.gclocker.TestExcessGCLockerCollections.main(TestExcessGCLockerCollections.java:186)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:565)
	at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:298)
	at java.base/java.lang.Thread.run(Thread.java:830)

JavaTest Message: Test threw exception: java.lang.RuntimeException
JavaTest Message: shutting down test


Seems like GC(7) and GC(8) fit the "bad" regexp. Is this a test bug? The failure seems very intermittent, though, and caught once on a busy CI server. I failed to reproduce it so far in my configs.
Comments
The TestExcessGCLockerCollections.java test that was part of JDK-8048556 doesn't actually work, and seems to occasionally provide false negatives. Fixing it seems hard; there doesn't seem to be enough logged information to do the desired test. For now we're going to problemlist the test. The test is looking at information that doesn't reflect what it is really trying to check for. What it should be doing is comparing the eden capacity with the amount actually allocated in eden when a GC occurs, to make sure there wasn't a lot of unused capacity. There isn't an obvious way to get that information at all for at least some collectors, and certainly not in a manner that is shared across the collectors. However, JDK-8227225 (ParallelGC) and followups for other GCs will provide logging output containing the information this test needs. The test is currently using -Xmn256M, thereby setting NewSize and MaxNewSize. This was done in the mistaken belief that it set a minimum eden or young gen size, but it doesn't. Might need additional options for at least some collectors, to ensure eden size is large enough that a too early collection is reasonably distinguishable from a normal collection. Things like -XX:-UseAdaptiveSizePolicy. And not sure how to deal with G1's pause goal and its affect on young gen size. I did some "manual" examination of logging output with and without the change from JDK-8048556 to convince myself that the problem had actually been fixed, despite the not good regression test. This involved looking for unusually short times between the end of one collection and the start of the next. Without the fix those occur when running the test. With the fix the times between collections are pretty stable and consistent, with no low outliers. But I think that would be difficult to turn into a reliable regression test. Better to wait for JDK-8227225. (I used the attached excess-gclocker-check.py python script and temporary variations on it to look at the logging output.)
06-08-2019