The Binary Search Tree (BST) is a tree like data structure that allows for the quick lookup of data. They work by storing inserted data in a tree like structure that uses the “divide and conquer” approach to locate data. Each node in the tree has, at most, two children. The left hand side (lhs) node will always contain a value that is less than it’s parent. The right hand side (rhs) node will always contain a value that is greater-than or equal to it’s parent.

Using this property we can very quickly traverse the tree to find a value, with the number of look-ups required being no greater than the maximum height of the tree. If we assume that the tree is well balanced this gives us an amortized time complexity of order O(log n). In other words, the look-up time complexity is logarithmic. It’s important to know; however, that this only holds true if the tree is balanced. If it’s not the worse case asymptotic time complexity is order O(n), or linear.

Why is this? Simply because in the worse case an unordered tree is just a link list. Or, put another way, a linked list can be thought of as a special case tree; one that has just one branch and no divergences. Imagine a situation where we have a BST that isn’t self balancing and we inject data into it in an ascending order. Each value inserted will be put to the left of the last, which is just another way of saying we’ve created a list of sorted numbers!

So, from all this we can conclude that for a BST to be useful it really needs to be balanced. It just so happens that there are a number of BST implementations that will take care of this, automatically, when data is inserted. They have a “self balancing” property. Examples of such trees are the AVL Tree (named after the initialise of the inventors) and the Red/Black tree (often the basis of the std::set and std::map).

Of course, this article isn’t specifically about BST implementations but it is important to have a basic understanding of how a BST works before moving on to the topic in hand; namely, how to find the Lowest Common Ancestor of two values in our BST?!

I guess the first thing we should do is actually define what we mean by Lowest Common Ancestor. Put simply, the Lowest Common Ancestor of any node in a BST is the node where traversal of each of the child nodes diverges down different branches. This might sound complicated but it is anything but.

Let’s look as a simple BST.

8 / (3) 9 / [1] 6 / 4 [7]

In this example, the Lowest Common Ancestor of 1 and 7 is 3. This is because node 3 is the node where traversal in locating these two nodes, 1 and 7, diverges into separate branches. Our task is to figure out a simple algorithm for finding the Lowest Common Ancestor that will work for any BST of any size. Sounds complicated, right? Actually, it’s about a simple an operation as things get when it comes to messing with trees. Here’s how it works…

We start at the root node and check to see if 1 and 7 are less than it’s value. If they are we haven’t found the LCA but we do know it’s somewhere down the left hand side branch so we can now move down to that node. At the next node we perform the same check to see if 1 and 7 are less than it’s value. This time around only one of them is less. At this point we can stop since we’ve now found the LCA.

We know this because at this point one number can be found to the Lhs of the current node and the other can be found on the Rhs of the current node. In other words at this point the traversal will diverge. Let’s think about it a little more abstractly by introducing a little pseudo code…

Let N be the current node. Let X be one child node and Y be the other. while not done do If X < N and Y < N then N = Lhs else If X > N and Y > N then N = Rhs else done = True repeat

This loop will continue to traverse the tree one node at a time, either going left because both values can be found to the left of the current node or going right because bot values can be found to the right of the current node. As soon either of these statements is no longer true we have found the Lowest Common Ancestor.

And now, the real code…

/** * @brief Lowest Common Ancestor */ #include <memory> #include <string> #include <iostream> namespace evilrix { namespace mostlycoding { /** * @brief A node object for our tree, below. * * @tparam T Generic type parameter representing our data. */ template <typename T> struct Node { using Data = T; ///< The data using PNode = std::shared_ptr<Node<Data>>; ///< The node /** * @brief Initializes a new instance of the main class. * * @param data (Optional) the data. */ Node(Data const & data = 0) : data(data) {} PNode plhs; ///< The plhs PNode prhs; ///< The prhs Data data; ///< The data }; /** * @brief A tree, implemented as a BST. * * @tparam T T Generic type parameter representing our data. */ template <typename T> class Tree { public: using Data = T; ///< The data using PNode = std::shared_ptr<Node<Data>>; ///< The node /** * @brief Initializes a new instance of the main class. */ Tree() {} /** * @brief Inserts the given data. * * @param data The data. */ void insert(Data const & data) { PNode * pproot = &proot_; // see my note in the "find_lca" function as to why I am using an // iterative rather than recursive traversal approach. while (*pproot) { pproot = data < (*pproot)->data ? &(*pproot)->plhs : &(*pproot)->prhs; } (*pproot) = PNode(new Node<Data>(data)); } /** * @brief Searches for the first lca. * * @param x The first data item. * @param y The second data item. * * @return The found lca data item. */ PNode find_lca(Data const & x, Data const & y) const { PNode const * pproot = &proot_; // Whilst tree traversal is normally done using recursion I've // opted for iterative just because it's easier to visualise what's // happening. Also, unless your compiler supports "tail recursion" // there is always a danger that the recursive approach could blow // up. Even if it doesn't, recursion is has a O(n) space complexity // whereas iteration has a O(1) space complexity. Both have a time // complexity of O(n), assuming no branching is required. while (*pproot) { // traverse down the left hand side? if (x < (*pproot)->data && y < (*pproot)->data) { pproot = &(*pproot)->plhs; } else // traverse down the right hand side? if (x >(*pproot)->data && y >(*pproot)->data) { pproot = &(*pproot)->prhs; } else { // we've reached the divisor so this node is the LCA break; } } return (*pproot); } PNode proot_; }; } } using namespace evilrix::mostlycoding; /** * @brief Main entry-point for this application. * * @return Exit-code for the process - 0 for success, else an error code. */ int main(/* int argc, char * argv[] */) { /* * 8 * / * (3) 9 * / * [1] 6 * / * 4 [7] */ Tree<int> tree; tree.insert(8); tree.insert(3); tree.insert(1); tree.insert(6); tree.insert(4); tree.insert(7); tree.insert(9); Tree<int>::PNode pnode = tree.find_lca(1, 7); std::cout << (pnode ? std::to_string(pnode->data) : "(null)") << std::endl; }

Hopefully, this post has shown that finding the Lowest Common Ancestor of a BST is really not actually that hard. Like most algorithms, once you know and understand the logic behind how it works it’s pretty simple stuff. Of course, knowing is 90% of the battle! 🙂

Pingback: Lowest Common Ancestor (non-BST) | evilrix