Sorting III: Quick Select, Sort Properties, Sorting Bounds
Author: Josh Hug

Overview

Selection. A simpler problem than sorting, in selection, we try to find the Kth largest item in an array. One way to solve this problem is with sorting, but we can do better. Linear time approach developed in 1972, but we did not cover this approach in class.

Quick Select. Using partitioning, we can solve the selection problem in expected linear time. We partition the array, and then quick select on the side of the array containing the median. Best case time is Θ(N), expected time is Θ(N), and worst case time is Θ(N^2). You should know how to show the best and worst case times.

Stability. A sort is stable if the order of equal items is preserved. This is desirable, for example, if we want to sort on two different properties of our objects. Know how to show the stability or instability of an algorithm.

Adaptiveness. Sorts which exploit existing order in an array can exhibit better performance. Python and Java utilize a sort called Timsort that has a number of improvements, resulting in, for example Θ(N) performance on almost sorted arrays.

Seeking a Sorting Lower Bound. We've found a number of sorts that complete execution in Θ(N log N) time. The question is: Could we do better. Suppose we have a hypothetical best sorting algorithm X. We know that its worst case runtime is O(N log N) because we already know an algorithm whose worst case runtime is Θ(N log N), and X's best case runtime is Ω(N) because we have to at least look at every item. Without further discussion, it seems like we might be able to do better than Θ(N log N) worst case time.

Seeking a Sorting Lower Bound. As a fanciful exercise, we played a game called puppy-cat-dog, in which we have to identify which of three boxes contains a puppy, cat, or dog. Since there are 3! = 6 permutations, we need at least ceil(lg(6)) = 3 questions to resolve the answer. In other words, if playing a game of 20 questions with 6 possible answers, we have to ask at least 3 questions to be sure we have the right answer. Since sorting is one way to solve puppy-cat-dog, then any lower bound on puppy-cat-dog also applies to sorting. Given N items, there are N! permutations, meaning we need lg(N!) questions to win the game of puppy-cat-dog, and by extension, we need at least lg(N!) to sort N items with yes/no questions. Since lg(N!) = Θ(N log N), we can say that the hypothetical best sorting algorithm that uses yes/no questions must require Ω(N log N) yes/no questions. Thus, there is no comparison based algorithm that can do better than Θ(N log N) compares.

C level

  1. Problem 3 from my Fall 2014 midterm.

B level

  1. Make sure that you understand exactly why I chose Ω, Θ, and O in Seeking a Sorting Lower Bound above.
  2. My Fall 2013 midterm, problem 7, particularly part b.
  3. My Fall 2014 midterm, problem 6.

A level

  1. Find the optimal decision tree for playing puppy, cat, dog, walrus.
  2. My Spring 2013 midterm, problem 7.

3.