Disjoint Sets Study Guide

Author: Josh Hug

Overview

Algorthm Development. Developing a good algorithm is an iterative process. We create a model of the problem, develop an algorithm, and revise the performance of the algorithm until it meets our needs. This lecture serves as an example of this process.

The Dynamic Connectivity Problem. The ultimate goal is to develop a data type that support the following operations on a fixed number N of objects:

connect(int p, int q) (called union in our optional textbook)
isConnected(int p, int q) (called connected in our optional textbook)

We do not care about finding the actual path between p and q. We care only about their connectedness. A third operation we can support is very closely related to connected():

find(int p): The find() method is defined so that find(p) == find(q) iff connected(p, q). We did not use this in class, but it's in our textbook.

Key observation: Connectedness is an equivalence relation. Saying that two objects are connected is the same as saying they are in an equivalence class. This is just fancy math talk for saying "every object is in exactly one bucket, and we want to know if two objects are in the same bucket". When you connect two objects, you're basically just pouring everything from one bucket into another.

Quick find. This is the most natural solution, where each object is given an explicit number. Uses an array id[] of length N, where id[i] is the bucket number of object i (which is returned by find(i)). To connect two objects p and q, we set every object in p's bucket to have q's number.

connect: May require many changes to id. takes N time in the worst case (to connect large sets).
isConnected (and find): takes constant time.

Performing M operations takes M N time. If M is proportional to N, this results in N² time.

Quick union. An alternate approach is to change the meaning of our id array. In this strategy, id[i] is the parent object of object i. An object can be its own parent. The find() method climbs the ladder of parents until it reaches the root (an object whose parent is itself). To connect p and q, we set the root of p to point to the root of q.

connect: Requires only one change to id[], but also requires root finding (worst case N time).
isConnected (and find): Requires root finding (worst case N time).

Performing M operations takes NM time in the worst case. Again, this results in quadratic behavior.

Weighted quick union. Rather than connect(p, q) making the root of p point to the root of q, we instead make the root of the smaller tree point to the root of the larger one. The tree's size is the number of nodes, not the height of the tree. Results in tree heights of lg N.

connect: Requires only one change to id, but also requires root finding (worst case log N time).
isConnected (and find): Requires root finding (worst case log N time).

Warning: if the two trees have the same size, the book code has the opposite convention as quick union and sets the root of the second tree to point to the root of the first tree. This isn't terribly important (you won't be tested on trivial details like these).

Weighted quick union with path compression. When find is called, the tree is compressed. Results in nearly flat trees. Making M calls to union and find with N objects results in no more than M log*(N) array accesses. For any conceivable values of N in this universe, log*(N) is at most 5. It is possible to derive an even tighter bound.

Example Implementations

You are not responsible for knowing these for exams, but these may help in your understanding of the concepts.

QuickFind

QuickUnion

WeightedQuickUnion

Weighted Quick Union with Path Compression

Recommended Problems

To do the Coursera problems, you will need to register for an account. These problems will not be graded, and your progress will not be tracked by the 61B staff in any way. Signing up does not obligate you to do anything at all (i.e. Kevin Wayne will not come to your house and ask in a sad voice why you have not finished his class).

C level

Coursera problems, though problem 3 is a bit of overkill for our course (but isn't bad to know).

B level

Problem 1 from the Princeton Fall 2011 midterm.
Problem 1 from the Princeton Fall 2012 midterm.

(From Textbook 1.5.8) Does the following implementation of Quick-Find work? If not, give a counter-example:

public void connect(int p, int q) {
    if (connected(p, q)) return;

    // Rename p’s component to q’s name.
    for (int i = 0; i &lt; id.length; i++) {
        if (id[i] == id[p]) id[i] = id[q];
    }
    count -= 1;
}

A level

(From Textbook 1.5.10): In weighted quick-union, suppose that we set id[find(p)] to q instead of id[find(q)]. Would the resulting algorithm be correct?
If we're concerned about tree height, why don't we use height for deciding tree size instead of weight? What is the worst-case tree height for weighted-quick-union vs. heighted-quick-union? What is the average tree height?
Try writing weighted-quick-union-with-path-compression without looking at the code on the booksite. You may look at the API. Compare your resulting code with the textbook's code.