Day: January 10, 2021

Efficient Union-Find in C++ – or: Disjoint-set forests with Path Compression and Ranks



Disjoint-set data structures are useful in cases where you have a (possibly very large) number of elements and want to repeatedly execute two operations: marking two elements as connected, and checking whether two elements are connected. A query of the latter kind does not necessarily mean that the two elements p and q were explicitly connected to each other by the caller – it is possible, e.g., that p was first connected to the element r, and only much later in time q was also connected to r.

The disjoint sets in the data structure are commonly represented by trees, where each node in the tree contains a link to its parent, the parent in turn contains a link to the parent’s parent, and so on. The node at the root of the tree contains contains a link to itself, i.e. it is its own parent. All nodes of a tree are in the same set. It can be easily checked whether two elements are in the same set by evaluating whether they have the same root. The set of trees is called the disjoint-set forest. While this approach is common and can be implemented efficiently, the C++ standard library does not contain an implementation – this post tries to offer one.

Let us assume the data structure is used for N elements, each having an index between 0 and N-1. The links to the parent nodes of all elements can then easily be saved in an array of length N. If parent[p] == q, then q is the parent of p. Whenever two sets containing the elements p and q should be merged, this can be achieved by attaching the root of one element to the root of the other.

Read More