In the last video, we talked about the hypothesis representation for logistic regression. What Id like to do now is tell you about something called the decision boundary, and this will give us a better sense of what the logistic regressions hypothesis function is computing. To recap, this is what we wrote out last time, where we said that the hypothesis is represented as h of x equals g of theta transpose x, where g is this function called the sigmoid function, which looks like this. It slowly increases from zero to one, asymptoting at one. What I want to do now is try to understand better when this hypothesis will make predictions that y is equal to 1 versus when it might make predictions that y is equal to 0. And understand better what hypothesis function looks like particularly when we have more than one feature. Concretely, this hypothesis is outputting estimates of the probability that y is equal to one, given x and parameterized by theta. So if we wanted to predict is y equal to one or is y equal to zero, here's something we might do. Whenever the hypothesis outputs that the probability of y being one is greater than or equal to 0.5, so this means that if there is more likely to be y equals 1 than y equals 0, then let's predict y equals 1. And otherwise, if the probability, the estimated probability of y being over 1 is less than 0.5, then let's predict y equals 0. And I chose a greater than or equal to here and less than here If h of x is equal to 0.5 exactly, then you could predict positive or negative, but I probably created a loophole here, so we default maybe to predicting positive if h of x is 0.5, but that's a detail that really doesn't matter that much. What I want to do is understand better when is it exactly that h of x will be greater than or equal to 0.5, so that we'll end up predicting y is equal to 1. If we look at this plot of the sigmoid function, we'll notice that the sigmoid function, g of z is greater than or equal to 0.5 whenever z is greater than or equal to zero. So is in this half of the figure that g takes on values that are 0.5 and higher. This notch here, that's 0.5, and so when z is positive, g of z, the sigmoid function is greater than or equal to 0.5. Since the hypothesis for logistic regression is h of x equals g of theta and transpose x, this is therefore going to be greater than or equal to 0.5, whenever theta transpose x is greater than or equal to 0. So what we're shown, right, because here theta transpose x takes the role of z. So what we're shown is that a hypothesis is gonna predict y equals 1 whenever theta transpose x is greater than or equal to 0. Let's now consider the other case of when a hypothesis will predict y is equal to 0. Well, by similar argument, h(x) is going to be less than 0.5 whenever g(z) is less than 0.5 because the range of values of z that cause g(z) to take on values less than 0.5, well, that's when z is negative. So when g(z) is less than 0.5, a hypothesis will predict that y is equal to 0. And by similar argument to what we had earlier, h(x) is equal to g of theta transpose x and so we'll predict y equals 0 whenever this quantity theta transpose x is less than 0. To summarize what we just worked out, we saw that if we decide to predict whether y=1 or y=0 depending on whether the estimated probability is greater than or equal to 0.5, or whether less than 0.5, then that's the same as saying that when we predict y=1 whenever theta transpose x is greater than or equal to 0. And we'll predict y is equal to 0 whenever theta transpose x is less than 0. Let's use this to better understand how the hypothesis of logistic regression makes those predictions. Now, let's suppose we have a training set like that shown on the slide. And suppose a hypothesis is h of x equals g of theta zero plus theta one x one plus theta two x two. We haven't talked yet about how to fit the parameters of this model. We'll talk about that in the next video. But suppose that via a procedure to specified. We end up choosing the following values for the parameters. Let's say we choose theta 0 equals 3, theta 1 equals 1, theta 2 equals 1. So this means that my parameter vector is going to be theta equals minus 3, 1, 1. So, when given this choice of my hypothesis parameters, let's try to figure out where a hypothesis would end up predicting y equals one and where it would end up predicting y equals zero. Using the formulas that we were taught on the previous slide, we know that y equals one is more likely, that is the probability that y equals one is greater than or equal to 0.5, whenever theta transpose x is greater than zero. And this formula that I just underlined, -3 + x1 + x2, is, of course, theta transpose x when theta is equal to this value of the parameters that we just chose. So for any example, for any example which features x1 and x2 that satisfy this equation, that minus 3 plus x1 plus x2 is greater than or equal to 0, our hypothesis will think that y equals 1, the small x will predict that y is equal to 1. We can also take -3 and bring this to the right and rewrite this as x1+x2 is greater than or equal to 3, so equivalently, we found that this hypothesis would predict y=1 whenever x1+x2 is greater than or equal to 3. Let's see what that means on the figure, if I write down the equation, X1 + X2 = 3, this defines the equation of a straight line and if I draw what that straight line looks like, it gives me the following line which passes through 3 and 3 on the x1 and the x2 axis. So the part of the infospace, the part of the X1 X2 plane that corresponds to when X1 plus X2 is greater than or equal to 3, that's going to be this right half thing, that is everything to the up and everything to the upper right portion of this magenta line that I just drew. And so, the region where our hypothesis will predict y = 1, is this region, just really this huge region, this half space over to the upper right. And let me just write that down, I'm gonna call this the y = 1 region. And, in contrast, the region where x1 + x2 is less than 3, that's when we will predict that y is equal to 0. And that corresponds to this region. And there's really a half plane, but that region on the left is the region where our hypothesis will predict y = 0. I wanna give this line, this magenta line that I drew a name. This line, there, is called the decision boundary. And concretely, this straight line, X1 plus X equals 3. That corresponds to the set of points, so that corresponds to the region where H of X is equal to 0.5 exactly and the decision boundary that is this straight line, that's the line that separates the region where the hypothesis predicts Y equals 1 from the region where the hypothesis predicts that y is equal to zero. And just to be clear, the decision boundary is a property of the hypothesis including the parameters theta zero, theta one, theta two. And in the figure I drew a training set, I drew a data set, in order to help the visualization. But even if we take away the data set this decision boundary and the region where we predict y =1 versus y = 0, that's a property of the hypothesis and of the parameters of the hypothesis and not a property of the data set. Later on, of course, we'll talk about how to fit the parameters and there we'll end up using the training set, using our data. To determine the value of the parameters. But once we have particular values for the parameters theta0, theta1, theta2 then that completely defines the decision boundary and we don't actually need to plot a training set in order to plot the decision boundary. Let's now look at a more complex example where as usual, I have crosses to denote my positive examples and Os to denote my negative examples. Given a training set like this, how can I get logistic regression to fit the sort of data? Earlier when we were talking about polynomial regression or when we're talking about linear regression, we talked about how we could add extra higher order polynomial terms to the features. And we can do the same for logistic regression. Concretely, let's say my hypothesis looks like this where I've added two extra features, x1 squared and x2 squared, to my features. So that I now have five parameters, theta zero through theta four. As before, we'll defer to the next video, our discussion on how to automatically choose values for the parameters theta zero through theta four. But let's say that varied procedure to be specified, I end up choosing theta zero equals minus one, theta one equals zero, theta two equals zero, theta three equals one and theta four equals one. What this means is that with this particular choose of parameters, my parameter effect theta theta looks like minus one, zero, zero, one, one. Following our earlier discussion, this means that my hypothesis will predict that y=1 whenever -1 + x1 squared + x2 squared is greater than or equal to 0. This is whenever theta transpose times my theta transfers, my features is greater than or equal to zero. And if I take minus 1 and just bring this to the right, I'm saying that my hypothesis will predict that y is equal to 1 whenever x1 squared plus x2 squared is greater than or equal to 1. So what does this decision boundary look like? Well, if you were to plot the curve for x1 squared plus x2 squared equals 1 Some of you will recognize that, that is the equation for circle of radius one, centered around the origin. So that is my decision boundary. And everything outside the circle, I'm going to predict as y=1. So out here is my y equals 1 region, we'll predict y equals 1 out here and inside the circle is where I'll predict y is equal to 0. So by adding these more complex, or these polynomial terms to my features as well, I can get more complex decision boundaries that don't just try to separate the positive and negative examples in a straight line that I can get in this example, a decision boundary that's a circle. Once again, the decision boundary is a property, not of the trading set, but of the hypothesis under the parameters. So, so long as we're given my parameter vector theta, that defines the decision boundary, which is the circle. But the training set is not what we use to define the decision boundary. The training set may be used to fit the parameters theta. We'll talk about how to do that later. But, once you have the parameters theta, that is what defines the decisions boundary. Let me put back the training set just for visualization. And finally let's look at a more complex example. So can we come up with even more complex decision boundaries then this? If I have even higher polynomial terms so things like X1 squared, X1 squared X2, X1 squared equals squared and so on. And have much higher polynomials, then it's possible to show that you can get even more complex decision boundaries and the regression can be used to find decision boundaries that may, for example, be an ellipse like that or maybe a little bit different setting of the parameters maybe you can get instead a different decision boundary which may even look like some funny shape like that. Or for even more complete examples maybe you can also get this decision boundaries that could look like more complex shapes like that where everything in here you predict y = 1 and everything outside you predict y = 0. So this higher autopolynomial features you can a very complex decision boundaries. So, with these visualizations, I hope that gives you a sense of what's the range of hypothesis functions we can represent using the representation that we have for logistic regression. Now that we know what h(x) can represent, what I'd like to do next in the following video is talk about how to automatically choose the parameters theta so that given a training set we can automatically fit the parameters to our data.