In addition to cleaning your data to address data quality issues, data preparation also includes selecting features to use for analysis. After this video, you will be able to explain what feature selection involves, discuss the goal of feature selection, and list three approaches for selecting features. Feature selection refers to choosing the set of features to use that is appropriate for the subsequent analysis. The goal of feature selection is to come up with the smallest set of features that best captures the characteristics of the problem being addressed. The smaller the number of features used, the simpler the analysis will be. But of course, the set of features used must include all features relevant to the problem. So, there must be a balance between expressiveness and compactness of the feature set. There are several methods to consider in selecting features. New features can be added, some features can be removed, features can be re-coded or features can be combined. All these operations affect the final set of features that will be used for analysis. Of course some features can be kept as is as well. New features can be derived from existing features. For example, a new feature to specify whether a student is in state or out of state can be added based on the student's state of residence. For an application such as college admissions, this new feature represents an important aspect of an application and so would be very helpful as a separate feature. Another example is adding a feature to indicate the color of a vehicle, which can play an important role in an auto insurance application. Features can also be removed, candidates for removal are features that are very correlated. During data exploration, you may have discovered that two features are very correlated, that is they change in very similar ways. For example, the purchase price of a product and the amount of sales tax paid are likely to be very correlated. The higher the purchase price the higher the sales tax. In this case, you might want to drop one of these features, since these features have essentially duplicate information. And keeping both features makes the feature set larger and the analysis unnecessarily more complex. Features with a high percentage of missing values may also be good candidates for removal. The validity and usefulness of features with a lot of missing values are in question. So removing may not result in any loss of information. Again these features would have been discovered during the data exploration step. Irrelevant features should also be removed from the data set. Irrelevant features are those that contain no information that is useful for the analysis task. An example of this is employee ID in predicting income. Other fields used simply for identification such as row number, person's ID, etc., are good candidates for removal. Features can also be combined if the new feature presents important information that is not represented by looking at the original features individually. For example, BMI, which is body mass index is an indicator of whether a person is underweight, average weight, or overweight. This is an important feature to have for a weight loss application. It represents information about how much a person weighs relative to their height that is not available by looking at just the person's height or weight alone. A feature can be re-coded as appropriate for the application. A common example of this is when you want to turn a continuous feature in to a categorical one. For example for a marketing application you might want to re-code customer's age into customer categories such as teenager, young adult, adult and senior citizen. So you would map ages 13 to 19 to teenager, ages 20 to 25 to young adult, 26 to 55 as adult and over 55 as senior. For some applications you may want to make use of finery features. As an example, you might want a feature to capture whether a customer tends to buy expensive items or not. In this case, you would want a feature that maps to one for a customer with an average purchase price over a certain amount and maps to zero otherwise. Re-coding features can also result in breaking one feature into multiple features. A common example of this is to separate an address feature into its constituent parts street address, city, state and zip code. This way you can more easily group records by state, for example, to provide a state by state analysis. Future selection aims to select the smallest set of features to best capture the characteristics of the data for your application. Know from the examples represented that domain knowledge once again place a key role in choosing the appropriate features to use. Good understanding of the application is essential in deciding which features to add, drop or modify. It should also be noted that feature selection can be referred to as feature engineering, since what you're doing here is to engineer the best feature set for your application.