We're going to talk about binary search trees. Binary search trees are a data structure that you should properly learn in your 2270 data structures class. But they are so important and so useful and they appear all over the place, that it's worth talking about binary search trees in an algorithms class, because many algorithms will be built on top of binary search trees. Therefore it is very important. I'm going to mostly focus on algorithms, on complexity. I'm not going to focus so much on: how do you implement a binary search tree? How do you use pointers? How do you do it in Python using references and classes, and so on? I'm not going to focus so much on that. I'm going to focus more on the algorithms so that it can serve either as a review, if you have already done this in data structures, or it can serve as a preview, if you're yet to do this in data structures. What's a binary search tree? Well, a binary such tree is an abstract data type that supports the following operations. It's going to support the ability to insert an element, it's going to support the ability to delete an element and it's going to support the ability to find if an element exists, and it's going to support various kinds of traversals that can be used to support operations like map, iter, and so on. Now what kind of a data type is it for? It's a data type for collections; I should have mentioned that. It's going to store a collection or a set of objects, and so on. One assumption I will make is elements cannot repeat in this collection, so it's going to be a set-based collection. Elements cannot repeat, there can only be one copy of 1,2, even though that can be relaxed. You can have binary search trees that have multiple copies of the same element. Right now I am going to have one copy. Now each element has an important member called the key of the element and the key is always going to be a number. I'm going to pretend that all the elements are numbers, even though they may be different from numbers. But let's assume they have a key and the key is going to tell us a number corresponding to that element, so that it provides a way of comparing elements. Now using binary search trees, I can build other data structures. I can build a data structure for a set, I can build a data structure for a dictionary or a map, and these can be built using binary search trees. Let us investigate what's the structure of a binary search tree. A binary search tree has a bunch of nodes. This is what a tree looks like and it has some leaves. I'm going to use a special character for leaves and they're all called nil. Every leaf is called a nil. I'm going to soon stop writing these leaves. Every node has two children and these are called internal nodes. Internal nodes, which are shown in circles, has two children. Every leaf, which are shown as nil, has no children. This is exactly how we would write down a binary search tree. Now what's the property? Every node has an element, let's call this e_1, e_2, e_3, e_4, e_5. The leaves don't have elements. Now the main property, the important property, binary search tree property. The important property is that if, I have an element e_1, then e_1 dot key must be greater than or equal to e_2 dot key. The key of the left child must be less than or equal to the key of the parent. In other words, I can say e_2 dot key must be less than or equal to e_1 dot key. So the left child key must be less than equal to parent, and the parent must be less than equal to e_3 dot key. The right child must be greater than or equal to parents. If I move to the left child, the key reduces. If I move to the right child, the key increases. This is the right child. What else? Note that leaves do not have any key, so I have to modify the statement a little bit. Likewise, e_4's key is less than equal to e_2.key and e_5's key is greater than equal to e_2.key. More importantly, every node in the left sub tree of e_1, the key must be less than equal to every node in the right sub-tree, the key must be greater than equal to. To summarize, let us draw examples of binary search trees that are valid and non-example. Here is an example. Let me just draw the key. Let's say the root has 25. The left sub-tree all has to have keys less than or equal to 25. This is important. Less than or equal to 25, not just the node immediately to the immediate left child, but everything in the left sub-tree has to be less than or equal to 25. Now, everything in the left sub-tree of 20 has to be less than or equal to 20, let's say 18, let's say 15. Now it has to be greater than 18 but less than 20. Let's say this is 19 and let's say here it has to be greater than 20, less than 25, let's say 23, 21, 24 greater than 20 but less than equal to 25. Let's say 29, 31, and then let's say nil, nil, nil, nil, nil, nil, nil, nil those are going to be the leaves. I will still soon stop writing these leaves, but for now I will write these leaves. Notice that if I have the right sub-tree of the node and the key of the node is 25, the right sub-tree has all nodes greater than or equal to 25. Since I say that the keys must be unique, it has to be strictly greater. I'm not allowing keys to be equal. If I allowed keys to be equal, I would say greater than or equal to, but right now I'm saying greater than. Likewise, everything to the left has to be less than 20 for 25. Let's take this node with key 20, the node which I am pointing to in green, for the left sub-tree has to be less than 20. I've already said that in the right sub-tree it has to be greater than 20. But since it's also the left sub-tree of 25, everything in the right sub-tree of 20 has to still be less than equal to 25. Hopefully you get an idea of binary search trees in the structure and this is a 31. Hopefully you are getting an idea of what binary search trees are looking like. Let me give you some non-examples, so what would be some non-examples of binary search tree, some bad binary search trees which are going to fail this property, so take an example. What about this example? In this example, 20 is the root node. This is called the root node. It's left child is 19, which is fine. Nineteenth, left child is 17, which is fine. The right child is 21 which is fine with regard to 19. But if I think of 20, I note that everything in the left sub-tree of 20 needed to be less than 20 or less than equal to 20, but I'm not allowing duplicates. Let me remind you of that. In this case, this node is illegal because it cannot be here, it cannot be where it is, because it cannot be to the left of 20, because it's greater than 20. If it were legal then it should have been to the right of 20, it should have been in the right sub-tree of 20. This is super important, so I've given you an example of a binary search tree. I've given you an example of something that's not a binary search tree. What are the important properties of each node? For each node in a binary search tree, if the node is not nil, if n is not nil, then n.left is the left child, n.right is the right child. N.parent is its parent. Just for completeness, I will keep the parent of root to be nil. Following the book convention, the parent of root will be kept to be nil, and that's how you will know if a node is a root, so n.parent is its parent, n.key is the key. This is all following CLRS convention. Now that we have binary search trees, I want to define a few things for binary search trees. Next step is let me define the height of a binary search tree. What's the height of a binary search tree? Well, if you start with a binary search tree, let's say an example like that. I'm not going to show all the nils, I will just try to show some of the nils. Suppose this is a binary search tree, we define the height of a node as this length of the path, or the length of the longest path from that node to a leaf. In this case, if you start from the root, the longest path is going to go all the way right to a leaf. Or you could go all the way, right and then left. Same thing. The length of the longest path is going to be 1,2,3,4,5. Many books will differ on whether to count the node itself or not to count the node itself. In the case of CLRS, I believe the height is simply defined as 1,2,3,4, and you do not take the very last node into account. You do not take the leaf into account. That allows us to define the leaf as having height 0. The leaf has height 0. This node, let's call it n1 has height 1. The longest path can go either way, it has height 1. For example, let's take node n7. N7 has height 1,2,3,4. How do you define the height of a node? Height of a leaf is defined as 0. Now some books will define it as 1 so be careful there. But whatever answers I get you will be off by 1, so we are not going to really worry too much about 1 or 0. Let's just define the height of a leaf to be 0 and start. Height of a node is recursively defined as height, take the height of n.left, take the height of n.right, and take the max of these and add 1. Take the max of the height of n.left or n.right, whichever one is larger, take the max of this and add 1 to the result, and that gives you the height of a node. For example, the height is 0 at these two leaves, the height is going to be 1 at the node here, the height is going to be 0 at the node here, let's call this n3. The height is going to be max of the height of its right child, which is 1, height of its left child, which is 0, so max of 1, 0 is 1 plus 1, so the height of this node becomes 2. Likewise, the height of this node becomes 3. The height of the root becomes 4. The height of a tree is defined recursively by defining the height of each node, starting with the height of the leaf being 0. If a binary search tree has n nodes, how high can the root be? That is defined as the height of the tree. The height of the tree is simply the height of the root node. What's the height of a binary search tree? There are two cases. In the worst case, the tree can be completely unbalanced. What is a completely unbalanced tree? Well, every node just has a single child which is not nil. The tree is simply almost like a linked list. It's almost like a linked list, every node just has one child, which is an internal node, one of the child is nil, except for this node which has nil and nil, so 1, 2, all the way to n. In this case, the height of this tree is going to be 1, 2, 3, 4, all the way to n. Height of the tree is going to be n, which is the number of nodes. This is the worst case. What's the best case? The best case is a completely balanced tree where every node has two internal nodes, except for the nodes at the very end, which are going to have two leaves nil, nil. This is going to be all the leaves. If there are n nodes, what is the height of this tree? If there are n internal nodes, n circles, how high can this tree be? Well, we can talk about this. If there are n internal nodes, okay, there is one root, two nodes at this level, three nodes at this level, and let's say the height is h. There are going to be four nodes, so 2^0, 2^1, 2^2, 2^h minus 1. Where h is the total height of the tree. What is the equation? Equation is 2^0 plus 2^1, all the way to 2^h minus 1 equals the number of nodes in the tree, number of internal nodes n, or 2^h minus 1. This is the geometric summation, geometric series, and this sums up to two to the power h minus 1 is n, our h is going to be log to the base 2 of n plus 1. The height of a tree with n nodes is going to be roughly log to the base 2 of n. If you want to be precise, it could be ceiling log to the base 2_n. This is the best case. This kind of a tree is called perfectly balanced. Here, this tree is highly imbalanced, so much so that it becomes a linked list, right? The tree becomes a linked list. What about trees in the middle? Trees in the middle are going to be somewhere between the worst case, which has to have a height equal to the number of internal nodes, and the best case, which has to have height equal to log n. Remember, there's going to be a plus-minus 1 here, depending on how I define the base case. I'm not going to be super accurate in this class. What about other trees? Generally, trees are going to be somewhere in the middle, so they are going to have some imbalance. I'm not going to write the remaining nil. You can see the tree is imbalanced. For example, the longest path from the root to leaf that defines the height of the tree is going to be something like this. Whereas you can see there's another path which is much shorter. This is going to be imbalanced, not perfectly balanced, not highly imbalanced, but somewhere in the middle. There is a sliding scale where the height of the tree, in the worst case can be n, in the best case can be something like log n. Which is a big range to be. If you insert, for example, n equals 2^18, which is roughly 1 million. If you insert about 1 million nodes, the height can range from about 18 in the best case to about 1 million in the worst case. This is amazing range of heights. The worst case is when the tree is just a single path. In this case, I'm just showing everything going to the left. It could go to the left or to the right, but the point being that every internal node here has exactly one internal node children, and then there's one internal node which has two leaves, and it looks like a straight line tree, or almost like a linked list. Whereas in the perfect case, the tree can be perfectly balanced. Every internal node has two internal node children, except at the very end, where it has two leaves. The height of a tree is going to be very important because the algorithms we're going to talk about for insertion, deletion, and so on are all going to depend on the height, so we're going to talk about that in a second.