In this video we'll talk about how to get logistic regression to work for multiclass classification problems. And in particular I want to tell you about an algorithm called one-versus-all classification. What's a multiclass classification problem? Here are some examples. Lets say you want a learning algorithm to automatically put your email into different folders or to automatically tag your emails so you might have different folders or different tags for work email, email from your friends, email from your family, and emails about your hobby. And so here we have a classification problem with four classes which we might assign to the classes y = 1, y =2, y =3, and y = 4 too. And another example, for medical diagnosis, if a patient comes into your office with maybe a stuffy nose, the possible diagnosis could be that they're not ill. Maybe that's y = 1. Or they have a cold, 2. Or they have a flu. And a third and final example if you are using machine learning to classify the weather, you know maybe you want to decide that the weather is sunny, cloudy, rainy, or snow, or if it's gonna be snow, and so in all of these examples, y can take on a small number of values, maybe one to three, one to four and so on, and these are multiclass classification problems. And by the way, it doesn't really matter whether we index is at 0, 1, 2, 3, or as 1, 2, 3, 4. I tend to index my classes starting from 1 rather than starting from 0, but either way we're off and it really doesn't matter. Whereas previously for a binary classification problem, our data sets look like this. For a multi-class classification problem our data sets may look like this where here I'm using three different symbols to represent our three classes. So the question is given the data set with three classes where this is an example of one class, that's an example of a different class, and that's an example of yet a third class. How do we get a learning algorithm to work for the setting? We already know how to do binary classification using a regression. We know how to you know maybe fit a straight line to set for the positive and negative classes. You see an idea called one-vs-all classification. We can then take this and make it work for multi-class classification as well. Here's how a one-vs-all classification works. And this is also sometimes called one-vs-rest. Let's say we have a training set like that shown on the left, where we have three classes of y equals 1, we denote that with a triangle, if y equals 2, the square, and if y equals three, then the cross. What we're going to do is take our training set and turn this into three separate binary classification problems. I'll turn this into three separate two class classification problems. So let's start with class one which is the triangle. We're gonna essentially create a new sort of fake training set where classes two and three get assigned to the negative class. And class one gets assigned to the positive class. You want to create a new training set like that shown on the right, and we're going to fit a classifier which I'm going to call h subscript theta superscript one of x where here the triangles are the positive examples and the circles are the negative examples. So think of the triangles being assigned the value of one and the circles assigned the value of zero. And we're just going to train a standard logistic regression classifier and maybe that will give us a position boundary that looks like that. Okay? This superscript one here stands for class one, so we're doing this for the triangles of class one. Next we do the same thing for class two. Gonna take the squares and assign the squares as the positive class, and assign everything else, the triangles and the crosses, as a negative class. And then we fit a second logistic regression classifier and call this h of x superscript two, where the superscript two denotes that we're now doing this, treating the square class as the positive class. And maybe we get classified like that. And finally, we do the same thing for the third class and fit a third classifier h super script three of x, and maybe this will give us a decision bounty of the visible cross fire. This separates the positive and negative examples like that. So to summarize, what we've done is, we've fit three classifiers. So, for i = 1, 2, 3, we'll fit a classifier x super script i subscript theta of x. Thus trying to estimate what is the probability that y is equal to class i, given x and parametrized by theta. Right? So in the first instance for this first one up here, this classifier was learning to recognize the triangles. So it's thinking of the triangles as a positive clause, so x superscript one is essentially trying to estimate what is the probability that the y is equal to one, given that x is parametrized by theta. And similarly, this is treating the square class as a positive class and so it's trying to estimate the probability that y = 2 and so on. So we now have three classifiers, each of which was trained to recognize one of the three classes. Just to summarize, what we've done is we want to train a logistic regression classifier h superscript i of x for each class i to predict the probability that y is equal to i. Finally to make a prediction, when we're given a new input x, and we want to make a prediction. What we do is we just run all three of our classifiers on the input x and we then pick the class i that maximizes the three. So we just basically pick the classifier, I think whichever one of the three classifiers is most confident and so the most enthusiastically says that it thinks it has the right clause. So whichever value of i gives us the highest probability we then predict y to be that value. So that's it for multi-class classification and one-vs-all method. And with this little method you can now take the logistic regression classifier and make it work on multi-class classification problems as well