JDK-8257145 : Performance regression with -XX:-ResizePLAB after JDK-8079555
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 9,11,12,13,14,15,16
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2020-11-26
  • Updated: 2022-06-27
  • Resolved: 2020-12-11
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 17
17 b02Fixed
Related Reports
Relates :  
Relates :  
Description
When using hbase PerformanceEvaluation to do some writing performance tests,I found jdk11 slower by about 20%-30% than jdk8 with -XX:-ResizePLAB.

The returned desired_plab_sz is different between 8 and 11. When disabling ResizePLAB the desired_plab_sz should return default PLAB size, but actually in jdk11 the code returns PLABSize/no_of_gc_workers. JDK8 returns the default PLAB size.

Young/OldPLABSize are different in jdk11, resulting in much more direct allocation in some situations, e.g. hbase PerformanceEvaluation which will create many big objects.

The same result for jdk/jdk

Proposed fix:
diff --git a/src/hotspot/share/gc/shared/plab.cpp b/src/hotspot/share/gc/shared/plab.cpp
index 2afde91..7a5e108 100644
--- a/src/hotspot/share/gc/shared/plab.cpp
+++ b/src/hotspot/share/gc/shared/plab.cpp
@@ -135,6 +135,9 @@ void PLABStats::log_sizing(size_t calculated_words, size_t net_desired_words) {
 
 // Calculates plab size for current number of gc worker threads.
 size_t PLABStats::desired_plab_sz(uint no_of_gc_workers) {
+  if (!ResizePLAB) {
+    return _desired_net_plab_sz;
+  }
   return align_object_size(clamp(_desired_net_plab_sz / no_of_gc_workers, min_size(), max_size()));
 }
 

Comments
Changeset: b28b0947 Author: Dongbo He <dongbohe@openjdk.org> Committer: Fei Yang <fyang@openjdk.org> Date: 2020-12-11 09:06:10 +0000 URL: https://git.openjdk.java.net/jdk/commit/b28b0947
11-12-2020

Thanks for the information!
30-11-2020

In our test, it is better to enable PLABSize (increase %4-%5). However, if YoungPLABSize is set large enough, it will be the same as when PLABSize is disabled.
28-11-2020

Btw, what is your experience with -XX:+ResizePLAB on JDK11? Maybe you can give some feedback on whether JDK11 automatic PLAB sizing is better as before or even as good as manual tuning.
27-11-2020

Note that also the dynamic number of gc thread selection might cause the regression. I.e. in your tests gc is running with a much less number of gc threads than expected (although unlikely assuming that used heap configuration is appropriately large). The actual issue here is that _desired_net_plab_sz is supposed to contain PLAB size for all threads participating in a gc, but the constructor for G1EvacStats only gets the PLAB size for a single thread. So the correct value to pass would probably be Young/OldPLABSize * ParallelGCThreads to the G1EvacStats constructor instead of special-casing this when returning a thread's individual PLAB size (as PLABStats::desired_plab_sz() does). Without dynamic number of gc threads this results in exactly the same behavior as before. I think with dynamic number of threads this is a good (although conservative) estimate too.
26-11-2020

Potential mitigations: - use -XX:+ResizePLAB which is the default - manually set Young/OldPLABSize to Young/OldPLABSize * ParallelGCThreads (with -XX:-ResizePLAB)
26-11-2020

Workarounds do not preclude fixing the bug. :) The suggestion is about to always initialize the desired_net_plab_sz to Young/OldPLABSize * ParallelGCThreads regardless of whether PLAB resizing is enabled or not. With JDK-8079555 the semantics of _desired_net_plab_sz changed from "plab size per gc thread" to "plab size for all gc threads" without fixing the code to pass the correct value parameter. I'm not sure about the question about the semantics of PLABSize - it does not change, in the end it is still PLAB size per thread (i.e. we multiply in advance so that the code later does the division). As mentioned, the semantics of G1EvacStats:_desired_net_plab_sz changed (from per-thread to all-threads), and continuing to pass the per-thread PLAB size is just wrong (alternatively the constructor could do the multiply, did not think about what's better). As for the suggested change: while it is a point change that only fixes this particular situation, it does not fix the -XX:+ResizePLAB case starting with a too small PLAB (where it does not really hurt as PLABs will be resized quickly to the "best" value). Changing it only in this situation also adds a special case deep down somewhere which is very bad from a code maintainability POV.
26-11-2020

Set Young/OldPLABSize to Young/OldPLABSize * ParallelGCThreads is a good solution,but our concern is that some customers may not know this solution. When disabling ResizePLAB, does returning Young/OldPLABSize * ParallelGCThreads change the semantics of PLABSize? We think that the following changes are a better solution (with -XX:-ResizePLAB): // Calculates plab size for current number of gc worker threads. size_t PLABStats::desired_plab_sz(uint no_of_gc_workers) { + if (!ResizePLAB) { + return Young/OldPLABSize // pseudocode + } return align_object_size(clamp(_desired_net_plab_sz / no_of_gc_workers, min_size(), max_size())); }
26-11-2020