We measure nearness by means of a distance metric. Let X be the set. A function d from the Cartesian product of X with itself to the set of real numbers R is called a distance function if and only if for all x, y in X, the following properties are satisfied. The first one is we want the distance between x and y to be greater or equal to zero. This is the non-negativity property. It means the distance function only takes non-negative values in R. The second property is, we would expect the distance between x and y to be equal to zero if and only if x is equal to y. The third property, the distance of x to y is the same as the distance from y to x. That is, the distance function is symmetric. A distance function is called a metric if it satisfies the triangle inequality, d of x, y plus d of y, x is greater or equal to d of x, z. We say that d is a metric on X. The pair X, d such that X is a set of points and d is a metric on X is called a metric space. Let us now look at an example. Let X be a set and let our distance metric be defined on our set x by d of x, y, given as the modulus of the difference of x and y. So, we are looking at the absolute value of x minus y. Then d is the usual metric on the real line. To see this, we only need to check that it satisfies all the properties of a distance metric. You're encouraged to try proving this. In the meantime, let's look at the properties. Indeed, the modulus of x minus y is zero if and only if x is equal to y, and clearly, the modulus of x minus y is positive by definition whenever x and y are different. It is also clear that the distance from x to y is the same as the distance from y to x. It only remains to prove the fourth property. This is left for you to work out. There are specific distance functions for different data representations. In this section, we shall only be concerned with the Lp distance function. So we're going to look at the Lp distance function now. The underlying object of the Lp distance function is the space which is the m-dimensional Euclidean space Rm defined over the reals. An object in this space, is an m-dimensional vector. Given any two vectors x and y in Rm, that is, x and y are one by m vectors given by: x is equal to x1 to xm and y is given as: y1 to ym and a value p such that one is less than or equal to p which is strictly less than infinity, the Lp distance is given by the following equation. It can be shown for all values of p, 1 less than or equal to p which is less than infinity, the Lp distance is a metric. The proof is beyond the scope of this MOOC, although you may try it for yourself. The most common values for the parameter p are: p equal one, which gives the L1 distance, known as the Manhattan Distance. When p is equal to two, it yields the L2 distance, known as the Euclidean Distance. Also when p is equal to infinity, it gives the L-infinity distance, also known as the L-infinity distance. In this section, we shall mainly be concerned with the Euclidean Distance given by this equation. Note that since the square of xi minus yi is always positive and has the same value as the square of the modulus of xi minus yi, we may replace the modulus expression by the square of xi minus yi in the above equation. First, we have the following equation. This could also be written using this equation. The Euclidean distance offers special computational advantages, particularly important for large data sets. It corresponds to the straight-line distance between two points. This is an important metric for the K-means cluster and we'll get to use them later.