Now let's move beyond linearity to working with nonlinear transformations. What we've talked about so far with principal component analysis and singular value decomposition. Everything that we were working with there, were all linear transformations. We're using linear transformations to map our original data set to a lower dimension. Now, data in general can very often have nonlinear features. When we work with nonlinear features and we try to perform PCA, this can cause our dimensionality reduction to ultimately fail. Here we have this example data set, and we can see here we're doing a mapping from two dimensions to two principal components, so it will end up not changing the space. But in general, as we try to map from higher dimensions to lower dimensions, and we have nonlinear features, we won't be able to maintain that variance while reducing the number of dimensions as we've done so far with linear PCA. If you recall our discussion during support vector machines, they're going to be kernel functions, which we can use to apply nonlinear transformations to our data set. Now, if you did think back to support-vector machines, what probably came to mind is with the kernel functions, we're mapping up to a higher dimensional space, and the goal here is to map to lower-dimensional space. But the key is, that when you use these kernel functions and map the higher-dimensional space, you're able to uncover nonlinear structures within your data set, and use that to map down using a linear function similar to how you were able to then come up with a linear boundary. Once you map up those higher dimensions, you can use that linear PCA in order to actually come up with less dimensions. here we see from that original space that we saw earlier using kernel PCA projection, we're able to come up with a linearly separable space, so we're able to adjust the space. Now, in the figure here on the left, we're going to be applying PCA directly, and we see this curvature in our data. We wouldn't be able to maintain the total amount of variance if we just directly applied linear PCA. Instead, we apply this kernel, which will map our data to a linear space, and then we can reduce it down to a lower number of dimensions, without losing the information that we would lose by squashing down our data on that original linear projection. How do we actually perform kernel PCA using Python? As usual, we're going to import the class containing the dimensionality reduction method. Once we import from sklearn.decomposition the kernel PCA, we then initiate our class, and we're going to say the number of components we want, what type of kernel we want to use. There's actually different kernels available, as there were with support vector machines, as well as choosing the GAMMA. If you recall, the GAMMA will identify how curvy or how complex you want it to be in regards to the non-linearity of that original data set. Then same as working with just PCA, we can call that object the.fit transform on our data set and we have our transform data set using the kernel PCA. Now let's talk briefly about Manifold learning. There's going to be another class of non-linear dimensionality reduction. What we are working with here is going to be multidimensional scaling or MDS. Now, MDS, unlike PCA, will not strive to preserve that variance within the data. Recall with PCA, the goal is to maintain as much of the variance within the original data. With MDS instead, the goal is to maintain the geometric distances between each one of the different points. The figure on the left is supposed to be a sphere and three dimensions. Under MDS, it's mapped to a disk, and the distances between each of the points in three-dimensions is trying to be maintained as we move down to these two dimensions. Now, in order to run MDS within Python, we are going to import the class containing dimensionality reduction method. From sklearn.decomposition again, we import MDS, we create an instance of the class as well as the number of components that we ultimately want. Again, we just call the MDS, and we call fit transform on our data set. Then we will have x_MDS as our transform data set, that is now it only has two columns or two features. Now, other popular manifold learning methods exists, such as ISO map, which will use nearest neighbors and try to maintain the nearest neighbor ordering in a way, or TSNE, which tries to keep similar points closer together, and dissimilar points further apart and can be very good for visualization. They're going to be several ways to do decomposition, and generally we say try a few out. A good approach would be to try those out, and then perhaps if you're able to move down to two or three dimensions, using EDA and visualization to see how well you were able to come up with clusters, or maintain the amount of variance that was originally there. Now, that closes out our discussion here in regards to principal component analysis as well as the different types of manifold learning. In the next lesson, we're going to go through a demo of using PCA in practice. I'll see you in the notebook.