A highly-optimized (and tiny) Introsort variant that draws from PDQsort, Java, and Orson Peter's branchless Lomuto partitioning. Iterative version Here.
Best | Average | Worst | Memory |
---|---|---|---|
n | n log n | n log n | log n |
The.Sound.of.Sorting.0.6.5.-.http___panthema.net_2013_sound-of-sorting.Ubuntu-22.04.2024-02-18.20-59-40.mp4
The decades-old partitioning algorithm recently made a resurgence when researchers discovered ways to remove the inner branch. Orson Peter's method— which he published on his blog a little under two months ago— is the fastest yet. It employs a gap in the data to move elements twice per iteration rather than swapping them (three moves). For arithmetic and pointer types, Blipsort employs branchless Lomuto partitioning. For other, larger types, Blipsort uses branchful Hoare partitioning.
Blipsort carefully selects the pivot from the middle of five sorted candidates. These candidates allow the sort to determine whether the data in the current interval is approximately descending and inform its "partition left" strategy.
Blipsort is introspective, switching to a guaranteed nlog(n) sort if it becomes quadratic. Like PDQsort, Blipsort switches to Heapsort after log(n) "bad" partitions— partitions that are significantly unbalanced.
Blipsort uses Insertion sort on small intervals where asymptotic complexity matters less and instruction overhead matters more. Blipsort employs Java's Pair Insertion sort on every interval except the leftmost. Pair insertion sort inserts two elements at a time and doesn't need to perform a lower bound check, making it slightly faster than normal insertion sort in the context of quicksort.
Similar to PDQsort, if any of the three middlemost candidate pivots is equal to the rightmost element of the partition at left, Blipsort moves equal elements to the left with branchless Lomuto and continues to the right, solving the dutch-flag problem and yeilding linear time on data comprised of equal elements.
Similar to PDQsort, if the partition is "good" (not highly unbalanced), Blipsort switches to insertion sort. If the Insertion sort makes more than a constant number of moves, Blipsort bails and resumes quicksort. This allows Blipsort to achieve linear time on already-sorted data.
Like PDQsort, if the partition is bad, Blipsort scrambles some elements to break up patterns.
When all of the candidate pivots are strictly descending, it is very likely that the interval is descending as well. Lomuto partitioning slows significantly on descending data. Therefore, Blipsort neglects to sort descending candidates and instead swap-rotates the entire interval before partitioning.
Blipsort allows its user to implement a custom boolean comparator. A comparator is best implemented with a lambda and no branches. A comparator implemented with a lambda can be inlined by an optimizing compiler, while a comparator implemented with a constexpr/inline function typically cannot.
Here is the PDQsort algorithm by Orson Peters
Here is the branchless Lomuto blog post by Orson Peters
Here is Java's Dual Pivot Quicksort