Tries
Author: Josh Hug

Overview

Terminology.

Tries. Analogous to LSD sort. Know how to insert and search for an item in a Trie. Know that Trie nodes typically do not contain letters, and that instead letters are stored implicitly on edge links. Know that there are many ways of storing these links, and that the fastest but most memory hungry way is with an array of size R. We call such tries R-way tries.

TSTs. Instead of R links, a TST node has only 3 links. Know how to insert and search for an item in a ternary search trie. Be aware that TSTs can become unbalanced. (By the way, TSTs are analgous to a sort known as 3-way radix quicksort, which is just quicksort applied digit by digit TMYK!). Know that each node typically contains a character, except the root, which contains no character.

Advantages of Tries and TSTs. Both flavors of tries have very fast lookup times, as we only ever look at as many characters as they are in the data we're trying to retrieve. However, their chief advantage is the ability to efficiently support various operations not supported by other map/set implementations including:

Throughout, we define R as the size of the alphabet, and N as the number of items in a trie.

C level

  1. Problems 1 and 2 from Princeton's Coursera Coure. A multiway trie is just our standard non-TST trie.
  2. Problem 5 from Princeton's Spring 2008 final.
  3. Problem 8 from Princeton's Spring 2011 final.
  4. Problem 8 from Princeton's Spring 2012 final.
  5. Draw the R-way trie and TST that result after inserting the strings: sam, sad, sap, same, a, awls.

B level

  1. When looking for a single character string in a Trie, what is the worst case time to find that string in terms of R and N? In a TST?
  2. Problem 5 from Princeton's Fall 2009 final.
  3. Problem 9 from Princeton's Fall 2012 final.
  4. Problem 1 from my Fall 2013 final.
  5. Give an example of an input that causes a TST to be unbalanced, and thus slow.
  6. True or false: The number of character compares required to construct an R-way trie is always less than or equal to the number required to construct an LLRB.
  7. True or false: The number of character compares required to construct an R-way trie is always less than or equal to the number of character accesses needed to construct a hash table.

A level

  1. What is the worst case runtime to find a single character string in a perfectly balanced TST in terms of R and N?
  2. In project 3, some of you used R-way trie nodes that contained TreeMaps, some of you used R-way trie nodes that contained HashMaps, and some of you used TST nodes. For the average node, which of these three choices is likely to use the most memory? Why does this mean that this type of node takes longer to construct?
  3. Problem 9 from my Fall 2014 midterm 2.