[SOUND] Pearson's correlation that we discussed so far is a very useful tool. But sometimes different kinds of correlations are more appropriate. They are called rank correlations. Let us consider a couple of examples where rank correlations are better than Pearsons correlation. First, let us consider the following data set. Here you see that there is more or less perfect correlation between these two variables in the following sense, if x is larger, then y is also larger. But the relation between variables is not linear. And in this case, the correlation coefficient, Pearson's correlation coefficient will be less than 1. So we see that despite the fact that we have in a sense perfect dependency between X and Y, the Pearson's correlation coefficient is not equal to 1 it is less than 1. So Pearson's correlation coefficient cannot catch the idea that this kind of dependence is also in a sense perfect. This is one possible problem. Another possible problem is outliers. Let us consider another example. Assume that my data set looks like the following. If we look at this part of the picture, we see here strong positive correlation between X and Y. But if you consider the whole data set and find a Pearson's correlation coefficient, this coefficient can be low or even negative, because this point can add some negative term to their correlation coefficient that can be very large. This point is far enough to the right and to the bottom. This point in fact can be just an error of measurement. So probably we want to ignore it. But it is a better idea to use a statistical tool that is not sensitive to such points, to such outliers. And rank correlations can deal with the situation as well. So what is the idea of rank correlations? The idea is to go from the initial data set to data set of ranks. So let us discuss ranks. Let us assume that I have some sequence of numbers. For example 1, 5, 9, 3, then I can do the following thin. First, I will sort this numbers. And get the following sequence, 1, 3, 5, 9. Then we enumerate these numbers, so this is the first number, this is the second, this is the third and this is the fourth. And I will attach these pink numbers to my original numbers. Then I will restore the order of these blue numbers. In this case, I have 1, 5, 9, 3, and these numbers are attached. Here is 1, here is 3, this 3, here is 4 and here is 2. Then this sequence of numbers is called sequence of ranks of my original numbers. So if I denote this sequence by X, then this sequence is rank of x. So this is the result of ranks transformation of sequence x. Note that if I want to compare elements in x, it is the same as to compare these elements here. For example, 5 is less than 9, and here 3 is less than 4, or 5 is larger than 3 and here 3 is larger than 2. So the order is preserved, but the numerical values are lost. This is what our rank transformation do. Now how can we use this rank transformation to find correlations? Let us assume that I have two series are of values, x1 and so on xn, and y1 and so on, yn. This will be denoted by x and this will be denoted by y. Then first I applied my rank transformation to both of these variables. I will get rank x and rank y. Then I will calculate Pearson's correlation between these transformed variables. The result of this operation is called Spearman's correlation between x and y. It is usually denoted by Greek letter rho. Let us apply this Spearman's correlation to our graph here. Let me assume that the exact numbers that we see here form geometric progression. So if this is x and this is y, then I have the following numbers. For example, x minus 2, y 1/4,- 1, 1/2, 0, 1, 1, 2, 2, 4. First we have to find a rank transformation of both variables. But we see that both variables are already sorted. So the rank transformation is very simple. Rank x is just consequent numbers 1, 2, 3, 4, 5. And rank y is the same thing, 1, 2, 3, 4, 5. It means that in this case, the Spearman's correlation between these two variables is equal to Pearson's correlation between these two variables. And these two variables are perfectly correlated, and the Spearman's correlation for this set, which is the same as Pearson's correlation for this set have to be equal to 1. The same situation is here because our rank transformation ignores the numeric value. This outlier does not affect seriously our correlation, because it is just a one point. And the numeric values of x and y coordinates of these points are mostly ignored by our procedure. So this Spearman's correlation is more robust for outliers. Now, let us consider another rank correlation, which is called Kendall's Tau. This Kendall's Tau can be calculated in the following way. First, for a pair of points, we call this kind of points concordant. And this pair of points discordant. Concordant means that for larger value of x we have larger value of y. And for discordant, for larger value of x, we have lower value of y. Then we consider our data set and we consider all possible pair of points in this data set. Some of them are concordant and some of them are discordant. Then we do the following thing. We find number of concordant points. Concordant pairs. Subtract number of discordant pairs. And divide it by the number of all possible pairs. Actually number of all pairs can be easily calculated. If we have n elements in a sample, then number of all pairs is just n times n minus 1 over 2. So this relation is called Kenda''s Tau. Let us return to this picture. Here if you select any pair of points, they will be concordant because here for a larger x you have larger y. It means that here the number of concordant points pairs is the same as the number of all players, and here is zero. So for this picture tau is also equal to 1, just like rho. We see that Kendall's Tau again catch the idea that this relation is perfectly monotonic. And ignores the fact that this relation is nonlinear, because it deals only with order of points and not with their numeric values just like Spearman's correlation. So this Kendall's Tau and Spearman's correlation use only the order between our values. So they can be applied not only to numeric values but also to any values which are categorical but ordered. Sometimes they can be a very useful alternative to Pearson's correlation. [SOUND]