JDK-6804124 : (coll) Replace "modified mergesort" in java.util.Arrays.sort with timsort
  • Type: Enhancement
  • Component: core-libs
  • Sub-Component: java.util:collections
  • Affected Version: 6
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • OS: generic
  • CPU: generic
  • Submitted: 2009-02-11
  • Updated: 2017-05-16
  • Resolved: 2009-08-14
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 7
7 b70Fixed
Related Reports
Relates :  
Relates :  
Description
The algorithm used by java.util.Arrays.sort and (indirectly) by  java.util.Collections.sort to sort object references is a "modified mergesort (in which the merge is omitted if the highest element in the
low sublist is less than the lowest element in the high sublist)."  It is a reasonably fast stable sort that guarantees O(n log n) performance and requires O(n) extra space.  In its day (it was written
in 1997 by Joshua Bloch), it was a fine choice, but today but we can do much better.

Since 2003, Python's list sort has used an algorithm known as timsort (after Tim Peters, who wrote it). It is a stable, adaptive, iterative mergesort that requires far fewer than n log(n) comparisons when running on partially sorted arrays, while offering performance comparable to a traditional mergesort when run on random arrays. Like all proper mergesorts timsort is stable and runs in O(n log n) time (worst case). In the worst case, timsort requires temporary storage space for n/2 object references; in the best case, it requires only a small constant amount of space. Contrast this with the current implementation, which always requires extra space for n object references, and beats n log n only on nearly sorted lists.

Timsort is described in detail here:
http://svn.python.org/projects/python/trunk/Objects/listsort.txt.

Tim Peters's original implementation is written in C. Joshua Bloch ported it from C to Java and end tested, benchmarked, and tuned the resulting code extensively. The resulting code is a drop-in replacement for
java.util.Arrays.sort. On highly ordered data, this code can run up to 25 times as fast as the current implementation (on the HotSpot server VM). On random data, the speeds of the old and new implementations are comparable. For very short lists, the new implementation is substantially faster that the old even on random data (because it avoids unnecessary data copying).

Comments
EVALUATION see description. changeset: http://hg.openjdk.java.net/jdk7/tl/jdk/rev/bfd7abda8f79
31-07-2009