JDK-8024394 : Parallel GC lays out the array references in the reverse order
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 8
  • Priority: P4
  • Status: Open
  • Resolution: Unresolved
  • Submitted: 2013-09-06
  • Updated: 2020-07-30
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdUnresolved
Related Reports
Relates :  
Description
See the original discussion at:
  http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2013-September/008196.html

In short, when laying out the Object[], ParallelGC pushes the array elements on stack during the traversal, and pops them back as it actually process them. This results in the reverse memory layout for referenced elements.

This issue should be fixed because:
 a) Depending on HW, you may or may not have the same performance walking back the memory; in particular, think about the non-x86 embedded scenarios where you don't have the luxury of advanced memory prefetchers;
 b) Even if you *do* have the good memory prefetcher ready at your disposal, accessing the first element will entail two memory accesses,
because the first element is rather far off the base; keeping the first element closer to base may have the effect of having the first element
right there on the same cache line;
 c) The Parallel GC layout is inconsistent with the layouts other GCs produce; which can have the surprising performance differences vs other collectors; I don't like surprising behaviors, and think we should minimize them where possible.
Comments
Aleksey, thanks for adding the statistically significant parts of the results. I interpret results that are not statistically significant as "don't knows". Meaning the result gen1t(s) 1160 24.006 0.02069 0.043 0.0083 gen1t(s) 1160 24.765 0.02135 0.045 0.0086 0.000% 3.162% 3.18995% 4.651% 3.6145% 0.060 does not tell us if there is a regression or not. Is that a fair interpretation? In particular it does not say there is no regression?
14-10-2013

Jon had aided me with the GCcompare script. Most of the changes are statistically insignificant. SPECjbb2005 experiences statistically significant improvement in GC time. $ awk -f CompareGCStats.ysr specjvm2008-baseline specjvm2008-patched ... commit1(MB) 25678 43563292.000 1696.52200 5686.000 1550.3032 commit1(MB) 25600 41151046.000 1607.46273 5612.000 1349.4084 -0.304% -5.537% -5.24952% -1.301% -12.9584% 0.000 * Yes * $ awk -f CompareGCStats.ysr specjbb2005-baseline specjbb2005-patched ... what count total mean max stddev pvalue sig gen0t(s) 16374 258.799 0.01581 0.170 0.0087 gen0t(s) 16360 249.029 0.01522 0.171 0.0084 -0.086% -3.775% -3.73182% 0.588% -3.4483% 0.000 * Yes * GC(s) 16414 259.463 0.01581 0.170 0.0087 GC(s) 16400 249.701 0.01523 0.171 0.0084 -0.085% -3.762% -3.66856% 0.588% -3.4483% 0.000 * Yes * commit1(MB) 16374 16766976.000 1024.00000 1024.000 0.0000 commit1(MB) 16360 16752640.000 1024.00000 1024.000 0.0000 -0.086% -0.086% 0.00000% 0.000% 0.0000% 0.000 * Yes * $ awk -f CompareGCStats.ysr specjvm98-baseline specjvm98-patched ... commit1(MB) 1400 957600.000 684.00000 684.000 0.0000 commit1(MB) 1400 957600.000 684.00000 684.000 0.0000 0.000% 0.000% 0.00000% 0.000% 0.0000% 0.000 * Yes *
11-10-2013

The patch seems to improve the GC time for SPECjvm2008 and SPECjbb2005, but also somewhat degrade on SPECjvm98. $ awk -f PrintGCStats.metaspace.exp -v cpus=8 linux-x64-specjvm98/baseline/results_*/log what count total mean max stddev gen0t(s) 1400 4.052 0.00289 0.012 0.0014 gen1t(s) 1160 24.006 0.02069 0.043 0.0083 GC(s) 2560 28.057 0.01096 0.043 0.0105 $ awk -f PrintGCStats.metaspace.exp -v cpus=8 linux-x64-specjvm98/patched/results_*/log what count total mean max stddev gen0t(s) 1400 4.042 0.00289 0.014 0.0014 gen1t(s) 1160 24.765 0.02135 0.045 0.0086 GC(s) 2560 28.808 0.01125 0.045 0.0109 $ awk -f PrintGCStats.metaspace.exp -v cpus=8 linux-x64-specjvm2008/baseline/results_*/log what count total mean max stddev gen0t(s) 25678 146.664 0.00571 0.382 0.0094 gen1t(s) 153 19.821 0.12955 1.031 0.1952 GC(s) 25831 166.485 0.00645 1.031 0.0200 $ awk -f PrintGCStats.metaspace.exp -v cpus=8 linux-x64-specjvm2008/patched/results_*/log what count total mean max stddev gen0t(s) 25600 143.028 0.00559 0.399 0.0092 gen1t(s) 146 18.189 0.12458 0.994 0.1891 GC(s) 25746 161.217 0.00626 0.994 0.0191 $ awk -f PrintGCStats.metaspace.exp -v cpus=8 linux-x32-specjbb2005/baseline/results_*/log what count total mean max stddev gen0t(s) 16374 258.799 0.01581 0.170 0.0087 gen1t(s) 40 0.664 0.01659 0.025 0.0047 GC(s) 16414 259.463 0.01581 0.170 0.0087 $ awk -f PrintGCStats.metaspace.exp -v cpus=8 linux-x32-specjbb2005/patched/results_*/log what count total mean max stddev gen0t(s) 16360 249.029 0.01522 0.171 0.0084 gen1t(s) 40 0.672 0.01681 0.023 0.0045 GC(s) 16400 249.701 0.01523 0.171 0.0084
08-10-2013

Webrev: http://cr.openjdk.java.net/~shade/8024394/webrev.00/ RFR: http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2013-September/008346.html This patch does not degrade either {SPECjbb2005, SPECjbb2013, SPECjvm2008, SPECjvm98} x {Linux x86/x86_64, Solaris x86/x86_64/SPARCv9}. Passes JPRT.
08-10-2013

There are multiple places where it should be fixed, e.g.: diff -r 428025878417 src/share/vm/oops/objArrayKlass.cpp --- a/src/share/vm/oops/objArrayKlass.cpp Wed Sep 04 12:56:03 2013 -0700 +++ b/src/share/vm/oops/objArrayKlass.cpp Fri Sep 06 15:45:14 2013 +0400 @@ -412,11 +412,11 @@ #define ObjArrayKlass_SPECIALIZED_OOP_ITERATE(T, a, p, do_oop) \ { \ - T* p = (T*)(a)->base(); \ - T* const end = p + (a)->length(); \ - while (p < end) { \ + T* const b = (T*)(a)->base(); \ + T* p = b + (a)->length(); \ + while (b < p) { \ + p--; \ do_oop; \ - p++; \ } \ } ...also other macros in the same file. This limited and untested change fixes the layout issue in the original test.
06-09-2013