JDK-8266431 : Dual-Pivot Quicksort improvements (Radix sort)
  • Type: Enhancement
  • Component: core-libs
  • Sub-Component: java.util
  • Priority: P4
  • Status: Open
  • Resolution: Unresolved
  • Submitted: 2021-05-03
  • Updated: 2024-07-09
Related Reports
Duplicate :  
Description
This is a placeholder for an update of the Dual-Pivot Quicksort implementation used by java.util.Arrays sort() and parallelsort().

Proposed changes:
- fix tryMergeRuns() to better handle almost sorted datasets (all types)
- adopt radixsort() for sequential and parallel sorts on int[] / long[] / float[] / double[] arrays (almost random and length > 6K)

Vladimir Yaroslavskiy will present the changes on core-libs-dev in details.
Comments
A pull request was submitted for review. Branch: master URL: https://git.openjdk.org/jdk/pull/13568 Date: 2023-04-20 21:05:55 +0000
09-07-2024

I checked the code and had run again the complete test: system vs dpqs 21.5 vs 21.11: https://github.com/bourgesl/nearly-optimal-mergesort-code/blob/master/sort-bench/results/2021/2021-12-final/DPQS-sort-1k-1M-stats.out Full results: https://github.com/bourgesl/nearly-optimal-mergesort-code/blob/master/sort-bench/results/2021/2021-12-final/sort-1k-1M-stats.out All good, on 1k, 10k, 100k, 1m integer arrays. --- STATS --- Comparing sorters SYSTEM vs DPQS_2111_NEW winners : 41 / 116 avg (%): 130.3060 stats (%): [360: µ=160.37976 σ=141.73625 rms=302.116 min=37.943222 max=801.65845]
15-12-2021

Last benchmarks on the final DPQS version from Vladimir Yaroslavskiy : 50% gain in average ! For larger arrays (1m), it is up to 6x times faster (random) Performance stats vs OpenJDK 16 (%): [360: µ=146.05562 σ=106.212036 rms=252.26765 min=29.021154 max=618.46735] Full results: https://github.com/bourgesl/nearly-optimal-mergesort-code/blob/master/sort-bench/results/2021/2105-final/DPQS-Summary-sort-1k-1M-stats.out RadixSortBenchmark (sort & parralel sort): https://github.com/bourgesl/radix-sort-benchmark/blob/main/results/2021-ref/cmp-16-1M-global-final-2.log Tested patch: https://github.com/bourgesl/jdk-official/tree/dpqs21 Vladimir will propose PR as soon as his OCA / github account are validated.
08-05-2021

I have run many benchmarks on OpenJDK-16 b36 with the latest DPQS: - Sort-bench results from 1k to 1M: https://github.com/bourgesl/nearly-optimal-mergesort-code/blob/master/sort-bench/results/2021/DPQS-sort-1k-1M-stats.out --- STATS --- Comparing sorters SYSTEM vs NEW_DPQS_REF avg (%): 126.4137 stats (%): [360: µ=148.11488 σ=110.76419 rms=258.87906 min=28.879934 max=643.38055] Stats per keys: IDENT_____:SPIRAL stats (%): [12: µ=147.4725 σ=68.61302 rms=216.08553 min=81.03784 max=299.12326] IDENT_____:STAGGER stats (%): [12: µ=127.015945 σ=94.51046 rms=221.5264 min=88.13188 max=426.46103] IDENT_____:SAWTOTH stats (%): [12: µ=124.75403 σ=55.18854 rms=179.94257 min=97.26711 max=253.25453] IDENT_____:_RANDOM stats (%): [12: µ=249.83005 σ=227.9462 rms=477.77625 min=83.69652 max=637.4421] IDENT_____:PLATEAU stats (%): [12: µ=95.78107 σ=8.487346 rms=104.26842 min=81.59188 max=100.87573] IDENT_____:SHUFFLE stats (%): [12: µ=91.31415 σ=20.711887 rms=112.02604 min=57.745495 max=125.83337] REVERSE___:SPIRAL stats (%): [12: µ=217.03809 σ=87.91948 rms=304.95758 min=90.75964 max=351.22873] REVERSE___:STAGGER stats (%): [12: µ=127.922165 σ=89.81609 rms=217.73827 min=94.505844 max=412.78644] REVERSE___:SAWTOTH stats (%): [12: µ=122.653244 σ=43.17766 rms=165.8309 min=96.863365 max=216.63614] REVERSE___:_RANDOM stats (%): [12: µ=251.68561 σ=221.266 rms=472.9516 min=87.73558 max=636.92975] REVERSE___:PLATEAU stats (%): [12: µ=104.7079 σ=6.6150866 rms=111.32299 min=98.29155 max=120.24485] REVERSE___:SHUFFLE stats (%): [12: µ=101.98144 σ=19.666725 rms=121.64816 min=69.60369 max=139.13005] DITHER____:SPIRAL stats (%): [12: µ=181.93686 σ=62.26847 rms=244.20534 min=99.955475 max=267.6208] DITHER____:STAGGER stats (%): [12: µ=172.84383 σ=124.501335 rms=297.34515 min=93.536224 max=462.1796] DITHER____:SAWTOTH stats (%): [12: µ=192.09015 σ=114.97007 rms=307.0602 min=96.47614 max=365.7877] DITHER____:_RANDOM stats (%): [12: µ=187.86221 σ=134.9509 rms=322.8131 min=72.19382 max=449.41077] DITHER____:PLATEAU stats (%): [12: µ=102.02701 σ=5.4123306 rms=107.43934 min=93.43425 max=114.45702] DITHER____:SHUFFLE stats (%): [12: µ=87.219376 σ=43.538296 rms=130.75766 min=28.879934 max=177.7619] REVERSE_BA:SPIRAL stats (%): [12: µ=179.9999 σ=84.92344 rms=264.9233 min=95.22771 max=311.38278] REVERSE_BA:STAGGER stats (%): [12: µ=131.67981 σ=101.47058 rms=233.15039 min=98.353676 max=453.4599] REVERSE_BA:SAWTOTH stats (%): [12: µ=112.77734 σ=29.747051 rms=142.5244 min=97.209305 max=203.71677] REVERSE_BA:_RANDOM stats (%): [12: µ=226.6934 σ=190.85951 rms=417.55292 min=86.951454 max=643.38055] REVERSE_BA:PLATEAU stats (%): [12: µ=95.67597 σ=8.271914 rms=103.94788 min=81.59549 max=100.70663] REVERSE_BA:SHUFFLE stats (%): [12: µ=100.77126 σ=23.106997 rms=123.87826 min=65.39107 max=147.39545] REVERSE_FR:SPIRAL stats (%): [12: µ=192.51532 σ=74.68721 rms=267.2025 min=98.87437 max=355.6169] REVERSE_FR:STAGGER stats (%): [12: µ=133.38321 σ=105.10974 rms=238.49295 min=97.99067 max=466.86765] REVERSE_FR:SAWTOTH stats (%): [12: µ=112.296745 σ=28.818796 rms=141.11554 min=97.94214 max=201.68077] REVERSE_FR:_RANDOM stats (%): [12: µ=258.59274 σ=218.30382 rms=476.89655 min=97.36564 max=638.50055] REVERSE_FR:PLATEAU stats (%): [12: µ=102.76343 σ=5.0133595 rms=107.77679 min=97.087555 max=116.68499] REVERSE_FR:SHUFFLE stats (%): [12: µ=110.16168 σ=40.6426 rms=150.80429 min=63.60649 max=225.63222] - RadixSortBenchmark from 10k to 10M: https://github.com/bourgesl/radix-sort-benchmark/tree/main/results/2021-ref Benchmark (bits) (padding) (scenario) (seed) (size) Mode Cnt Score Error Units RadixSortBenchmark.arraysSort 23 7 UNIFORM 0 1000000 thrpt 5 8,869 ± 0,077 ops/s RadixSortBenchmark.arraysSort 23 7 GAUSSIAN 0 1000000 thrpt 5 9,080 ± 0,027 ops/s RadixSortBenchmark.arraysSort 23 7 CONTIGUOUS 0 1000000 thrpt 5 2059,714 ± 12,294 ops/s RadixSortBenchmark.arraysSort 23 7 CONTIGUOUS_REVERSE 0 1000000 thrpt 5 799,438 ± 10,660 ops/s RadixSortBenchmark.arraysSort 23 7 ALMOST_CONTIGUOUS 0 1000000 thrpt 5 60,510 ± 1,170 ops/s RadixSortBenchmark.arraysSort 23 7 SORTED 0 1000000 thrpt 5 16,648 ± 0,191 ops/s RadixSortBenchmark.arraysSort 23 7 ALMOST_SORTED 0 1000000 thrpt 5 13,183 ± 0,167 ops/s RadixSortBenchmark.arraysSort 23 7 EXP 0 1000000 thrpt 5 10,300 ± 0,094 ops/s RadixSortBenchmark.dpqs21Ref 23 7 EXP 0 1000000 thrpt 5 72,167 ± 1,285 ops/s RadixSortBenchmark.dpqs21Ref 23 7 UNIFORM 0 1000000 thrpt 5 79,901 ± 2,865 ops/s RadixSortBenchmark.dpqs21Ref 23 7 GAUSSIAN 0 1000000 thrpt 5 79,794 ± 2,165 ops/s RadixSortBenchmark.dpqs21Ref 23 7 CONTIGUOUS 0 1000000 thrpt 5 1451,910 ± 18,540 ops/s RadixSortBenchmark.dpqs21Ref 23 7 CONTIGUOUS_REVERSE 0 1000000 thrpt 5 797,361 ± 7,606 ops/s RadixSortBenchmark.dpqs21Ref 23 7 ALMOST_CONTIGUOUS 0 1000000 thrpt 5 851,668 ± 8,289 ops/s RadixSortBenchmark.dpqs21Ref 23 7 SORTED 0 1000000 thrpt 5 16,385 ± 0,208 ops/s RadixSortBenchmark.dpqs21Ref 23 7 ALMOST_SORTED 0 1000000 thrpt 5 79,697 ± 1,890 ops/s - Finally I tested Arrays.parallelSort(): 2 times better: https://github.com/bourgesl/radix-sort-benchmark/blob/main/results/2021-ref/cmp-16-1M-parallel.log Benchmark (bits) (padding) (scenario) (seed) (size) Mode Cnt Score Error Units RadixSortBenchmark.parallelArraysSort 23 7 UNIFORM 0 1000000 thrpt 5 22,936 ± 6,626 ops/s RadixSortBenchmark.parallelArraysSort 23 7 GAUSSIAN 0 1000000 thrpt 5 24,576 ± 0,501 ops/s RadixSortBenchmark.parallelArraysSort 23 7 CONTIGUOUS 0 1000000 thrpt 5 1568,905 ± 96,613 ops/s RadixSortBenchmark.parallelArraysSort 23 7 CONTIGUOUS_REVERSE 0 1000000 thrpt 5 779,611 ± 43,307 ops/s RadixSortBenchmark.parallelArraysSort 23 7 ALMOST_CONTIGUOUS 0 1000000 thrpt 5 128,497 ± 6,665 ops/s RadixSortBenchmark.parallelArraysSort 23 7 SORTED 0 1000000 thrpt 5 38,053 ± 0,648 ops/s RadixSortBenchmark.parallelArraysSort 23 7 ALMOST_SORTED 0 1000000 thrpt 5 37,401 ± 0,530 ops/s RadixSortBenchmark.parallelArraysSort 23 7 EXP 0 1000000 thrpt 5 25,808 ± 0,306 ops/s RadixSortBenchmark.parallelDpqs21Ref 23 7 UNIFORM 0 1000000 thrpt 5 52,352 ± 1,153 ops/s RadixSortBenchmark.parallelDpqs21Ref 23 7 GAUSSIAN 0 1000000 thrpt 5 54,645 ± 1,443 ops/s RadixSortBenchmark.parallelDpqs21Ref 23 7 CONTIGUOUS 0 1000000 thrpt 5 2073,169 ± 19,994 ops/s RadixSortBenchmark.parallelDpqs21Ref 23 7 CONTIGUOUS_REVERSE 0 1000000 thrpt 5 777,157 ± 37,853 ops/s RadixSortBenchmark.parallelDpqs21Ref 23 7 ALMOST_CONTIGUOUS 0 1000000 thrpt 5 867,565 ± 43,417 ops/s RadixSortBenchmark.parallelDpqs21Ref 23 7 SORTED 0 1000000 thrpt 5 35,376 ± 0,543 ops/s RadixSortBenchmark.parallelDpqs21Ref 23 7 ALMOST_SORTED 0 1000000 thrpt 5 104,948 ± 3,910 ops/s RadixSortBenchmark.parallelDpqs21Ref 23 7 EXP 0 1000000 thrpt 5 59,308 ± 2,192 ops/s Note: I explicitely set -Djava.util.concurrent.ForkJoinPool.common.parallelism=4 on my laptop i7, only 280%/400% used in this test.
03-05-2021