Quicksort Quicksort

Quicksort - Definition and Overview

Quicksort is a well-known sorting algorithm developed by C. A. R. Hoare that, on average, makes O(n log n) comparisons to sort n items. However, in the worst-case, it makes O(n2) comparisons. Typically, Quicksort is significantly faster in practice than other O(n log n) algorithms, because its inner loop can be efficiently implemented on most architectures.

Contents

The algorithm

Quicksort sorts by employing a divide and conquer strategy to divide a list into two sub-lists.

The steps are:

  1. Pick an element, called a pivot, from the list.
  2. Reorder the list so that all elements which are less than the pivot precede the pivot and so that all elements greater than the pivot succeed it. After this partitioning, the pivot is in its final position.
  3. Recursively sort the sub-list of lesser elements and the sub-list of greater elements.

The base case of the recursion are lists of size one, which are always sorted. The algorithm always terminates because it puts at least one element in its final place on each iteration.

A simple pseudocode expression of the complete algorithm is:

 function partition(a, left, right, pivotIndex)
     pivotValue := a[pivotIndex]
     swap(a[pivotIndex], a[right]) // Move pivot to end
     storeIndex := left
     for i from left to right-1
         if a[i] <= pivotValue
             swap(a[storeIndex], a[i])
             storeIndex := storeIndex + 1
     swap(a[right], a[storeIndex]) // Move pivot to its final place
     return storeIndex
 
 function quicksort(a, left, right)
     if right > left
         select a pivot value a[pivotIndex]
         pivotNewIndex := partition(a, left, right, pivotIndex)
         quicksort(a, left, pivotNewIndex-1)
         quicksort(a, pivotNewIndex+1, right)

Performance details

Quicksort's inner loop, which performs the partition, is amenable to optimization for two main reasons:

  • All comparisons are being done with a single pivot value, which can be stored in a register.
  • The list is being traversed sequentially, which produces very good locality of reference and cache behavior for arrays.

This close fit with modern architectures makes quicksort one of the fastest sorting algorithms on average. Because of this excellent average performance and simple implementation, quicksort has become one of the most popular sorting algorithms in practical use.

Although it is often believed to work-in-place, quicksort uses O(log n) additional stack space on average in recursive implementations and O(n) stack space in the worst case.

The most crucial concern of a quicksort implementation is the choosing of a good pivot element. A naïve implementation of quicksort, like the ones below, will be terribly inefficient for certain inputs. For example, if the input is already sorted, a common practical case, this implementation of quicksort degenerates into a selection sort with O(n2) running time. Furthermore, the recursion depth becomes linear, requiring O(n) extra stack space.

This worst-case performance is much worse than comparable sorting algorithms such as heapsort or merge sort. However, if pivots are chosen randomly, most poor choices of pivots are unlikely; the worst-case, for example, has only probability 1/n! of occurring. This variation, called randomized quicksort, can be shown to use O(n log n) comparisons on any input with very high probability.

Note that the partition procedure only requires the ability to traverse the list sequentially; therefore, quicksort is not confined to operating on arrays (it can be used, for example, on linked lists). Choosing a good pivot, however, benefits from random access, as we will see.

Average-case complexity

The average number of comparisons over all possible inputs sequences needed by quicksort is:

<math>C(n) = n - 1 + \frac{1}{n} \sum_{i=0}^{n-1} C(i)+C(n-i-1)) = \Theta(1.39log_2 n)<math>

This means that, on average, quicksort performs closer to the best case than worst case. This fast average runtime is another reason for quicksort's practical dominance over other sorting algorithms.

Choosing a better pivot

The worst-case behavior of quicksort is not merely a theoretical problem. When quicksort is used in web services, for example, it is possible for an attacker to deliberately exploit the worst case performance and choose data which will cause a slow running time or maximize the chance of running out of stack space. See competitive analysis for more discussion of this issue.

Sorted or partially sorted data is quite common in practice and a naïve implementation which selects the first element as the pivot does poorly with such data. To avoid this problem the middle element may be chosen. This works well in practice, but attacks can still cause worst-case performance.

A better choice is to select the median of the first, middle and last elements as the pivot. Furthermore, adding two randomly selected elements resists chosen data attacks, especially if a cryptographically sound random number generator is used to reduce the chance of an attacker predicting the "random" elements. The use of the fixed elements reduces the chance of bad luck causing a poor pivot selection for partially sorted data when not under attack. These steps increase overhead, so it may be worth skipping them once the partitions grow small and the penalty for poor pivot selection drops.

Interestingly, since finding the true median value to use as the pivot can be done in O(n) time, the worst-case scenario can be completely removed while maintaining O(n log n) performance. Of course, this is almost never faster in practice and is only appropriate for immunizing against attack.

Partitioning concerns

As virtually all of the quicksort computation time is spent partitioning, a good partitioning implementation is important. In particular, if all of the elements being partitioned are equal, the above partition algorithm degenerates into the worst-case and needlessly swaps identical elements. This becomes a serious problem in any datasets which contain many equal elements, as many of the 'bottom tier' of partitions will become uniform.

A good variation in such cases is to test separately for elements equal to the pivot and store these in a 'fat pivot' in the center of the partition. A C implementation of this variation is shown below.

The way that partitioning is done determines whether or not a Quicksort implementation is a stable sort. Because the above pseudocode procedure will sometimes swap elements which are equal, it is slightly unstable.

Other optimizations

Another optimization is to switch to a different sorting algorithm once the list becomes small, perhaps ten or less elements. Selection sort might be inefficient for large data sets, but it is often faster than Quicksort on small lists.

One widely used implementation of quicksort, that in the 1997 Microsoft C library, used a cutoff of 8 elements before switching to insertion sort, asserting that testing had shown that to be a good choice. It used the middle element for the partition value, asserting that testing had shown that the median of three algorithm did not, in general, increase performance.

Sedgewick (1978) suggested an enhancement to the use of simple sorts for small numbers of elements, which reduced the number of instructions required by postponing the simple sorts until the quicksort had finished, then running an insertion sort over the whole array. This is effective because insertion sort requires only O(kn) time to sort an array where every element is less than k places from its final position.

LaMarca and Ladner (1997) consider "The Influence of Caches on the Performance of Sorting", a very significant issue in microprocessor systems with multi-level caches and high cache miss times. They conclude that while the Sedgewick optimization decreases the number of instructions, it also decreases locality of cache references and worsens performance compared to doing the simple sort when the need for it is first encountered. However, the effect was not dramatic and they suggested that it was starting to become more significant with more than 4 million 64 bit float elements. Greater improvement was shown for other sorting types.

Because recursion requires additional memory, Quicksort has been implemented in a non-recursive, iterative form. This has the advantage of predictable memory use regardless of input, and the disadvantage of considerably greater code complexity. Those considering iterative implementations of quicksort would do well to also consider introsort or heapsort instead.

A simple alternative for reducing Quicksort's memory consumption uses true recursion only on the smaller of the two sublists and tail recursion on the larger. This limits the additional storage of Quicksort to O(log n). The procedure quicksort in the preceding pseudocode would be rewritten as

 function quicksort(a, left, right)
     while right > left
         select a pivot value a[pivotIndex]
         pivotNewIndex := partition(a, left, right, pivotIndex)
         if (pivotNewIndex-1) - left < right - (pivotNewIndex+1)
             quicksort(a, left, pivotNewIndex-1)
             left  := pivotNewIndex+1
         else
             quicksort(a, pivotNewIndex+1, right)
             right := pivotNewIndex-1

Introsort optimization

Main article: Introsort

An optimization of quicksort which is becoming widely used is introspective sort, often called introsort. This starts with quicksort and switches to heapsort when the recursion depth exceeds a preset value. This overcomes the overhead of increasingly complex pivot selection techniques while ensuring O(n log n) worst-case performance. Musser reported that on a median-of-3 killer sequence of 100,000 elements running time was 1/200th that of median-of-3 quicksort. Musser also considered the effect of Sedgewick's delayed small sorting on caches, reporting that it could double the number of cache misses when used on arrays, but its performance with double-ended queues was significantly better. See introsort for more details.

Competitive sorting algorithms

The most direct competitor of Quicksort is heapsort. Heapsort is typically somewhat slower than Quicksort, but the worst-case running time is always O(n log n). Quicksort is usually faster, though there remains the chance of worst case performance except in the introsort variant. If it's known in advance that heapsort is going to be necessary, using it directly will be faster than waiting for introsort to switch to it. Heapsort also has the important advantage of using only constant additional space (heapsort is in-place), whereas even the best variant of Quicksort uses O(log n) space. However, heapsort requires efficient random access to be practical.

Quicksort is a space-optimized version of the binary tree sort. Instead of inserting items sequentially into an explicit tree, Quicksort organizes them concurrently into a tree that is implied by the recursive calls. The algorithms make exactly the same comparisons, but in a different order.

Relationship to selection

A selection algorithm chooses the kth smallest of a list of numbers; this is an easier problem in general than sorting. One simple but effective selection algorithm works nearly in the same manner as quicksort, except that instead of making recursive calls on both sublists, it only makes a single tail-recursive call on the sublist which contains the desired element. This small change lowers the average complexity to linear or O(n) time, and makes it an in-place, constant-space algorithm. A variation on this algorithm brings the worst-case time down to O(n) (see selection algorithm for more information).

Conversely, once we know a worst-case O(n) selection algorithm is available, we can use it to find the ideal pivot (the median) at every step of Quicksort, producing a variant with worst-case O(n log n) running time. In practical implementations, however, this variant is considerably slower on average.

Sample implementations

Main article: Quicksort implementations

Here we demonstrate a number of quicksort implementations in various languages. We show only some of the most popular or unique ones here; for additional implementations, see the article quicksort implementations.

C

void sort(int array[], int begin, int end) {
   if (end > begin) {
      int pivot = array[begin];
      int l = begin + 1;
      int r = end;
      while(l < r) {
         if (array[l] <= pivot) {
            l++;
         } else {
            r--;
            swap(array[l], array[r]); 
         }
      }
      l--;
      swap(array[begin], array[l]);
      sort(array, begin, l);
      sort(array, r, end);
   }
}

Java

This example also demonstrates a generic quicksort, rather than just one on a set of integers.

import java.util.Comparator;
import java.util.Random;

public class Quicksort {
    public static final Random RND = new Random();	
    private void swap(Object[] array, int i, int j) {
	Object tmp = array[i];
	array[i] = array[j];
	array[j] = tmp;
    }
    private int partition(Object[] array, int begin, int end, Comparator cmp) {
	int index = begin + RND.nextInt(end - begin + 1);
	Object pivot = array[index];
	swap(array, index, end);	
	for (int i = index = begin; i < end; ++ i) {
	    if (cmp.compare(array[i], pivot) <= 0) {
		swap(array, index++, i);
	    }
        }
	swap(array, index, end);	
	return (index);
    }
    private void qsort(Object[] array, int begin, int end, Comparator cmp) {
	if (end > begin) {
	    int index = partition(array, begin, end, cmp);
	    qsort(array, begin, index - 1, cmp);
	    qsort(array, index + 1,  end,  cmp);
        }
    }
    public void sort(Object[] array, Comparator cmp) {
	qsort(array, 0, array.length - 1, cmp);
    }
}

Python

def qsort(L):
   if L == []: return []
   return qsort([x for x in L[1:] if x< L[0]]) + L[0:1] + \
          qsort([x for x in L[1:] if x>=L[0]])

Joy

 DEFINE sort == [small][]
                [uncons [>] split]
                [[swap] dip cons concat] binrec .

Haskell

  sort :: (Ord a)   => [a] -> [a]
  
  sort []           = []
  sort (pivot:rest) = sort [y | y <- rest, y < pivot]
                      ++ [pivot] ++ 
                      sort [y | y <- rest, y >=pivot]

Prolog

append([], L, L).
append([H | L1], L2, [H | Result]) :- append(L1, L2, Result).

partition([], _, [], []).
partition([H | T], X, [H | Left], Right) :- H =< X, partition(T, X, Left, Right).
partition([H | T], X, Left, [H | Right]) :- H > X, partition(T, X, Left, Right).

qsort([],[]).
qsort([H | Tail], Sorted) :-
        partition(Tail, H, Right, Left),
        qsort(Left, SortedLeft),
        qsort(Right, SortedRight),
        append(SortedLeft, [X | SortedRight], Sorted).

Check the wikipedia Prolog article for a simpler version.

Ruby

def sort(array)
  return [] if array.empty?
  left, right = array[1..-1].partition { |y| y <= array.first }
  sort(left) + [ array.first ] + sort(right)
end

External links

References

  • Hoare, C. A. R. "Partition: Algorithm 63," "Quicksort: Algorithm 64," and "Find: Algorithm 65." Comm. ACM 4, 321-322, 1961
  • R. Sedgewick. Implementing quicksort programs, Communications of the ACM, 21(10):847857, 1978.
  • David Musser. Introspective Sorting and Selection Algorithms, Software Practice and Experience vol 27, number 8, pages 983-993, 1997
  • A. LaMarca and R. E. Ladner. "The Influence of Caches on the Performance of Sorting." Proceedings of the Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, 1997. pp. 370-379.

Copyright 2009 WordIQ.com - Privacy Policy  :: Terms of Use  :: Contact Us  :: About Us
This article is licensed under the GNU Free Documentation License. It uses material from the this Wikipedia article.