At this point, we have explored the data and chosen the features that we want to use. Now, we are ready to build the model. Once we build the model, we'll spend the rest of the course taking the model and operationalizing it. First though, let's do a quick recap of what we learned in the first specialization about building machine learning models. We learned to build machine learning models using TensorFlow. TensorFlow is an open-source high-performance library for numerical computation. The way TensorFlow works is that you create a directed graph to represent your computation. In this schematic, the nodes represent mathematical operations, things like adding, subtracting and multiplying. Connecting the nodes are the edges, the input and output of the mathematical operations. The edges represent arrays of data. So, where does the name TensorFlow come from? A tensor is an N-dimensional array of data, so your data in TensorFlow are tensors, they flow through the graph, hence TensorFlow. Estimators are part of the high-level TensorFlow API. So, this is the high-level API that you will use in the upcoming lab. To work with the estimator API, there are two parts, the first part is a noun. It is a static part, it is how to set the machine learning problem up. Imagine that you want to create a machine learning model to predict the cost of a house given the square footage. So, the first question is, is this regression or classification? You're predicting price, price is a number, so this is a regression model. What is the label? The label is a price. What are the features? Here, there's only one feature, the Square Footage The second part of working with the estimator API is the verb. Once you have a machine-learning model, what things can you do with it? Well, you can train the model, you can evaluate the model to see how well it performs, and you can predict with the model. So, this is how to write an estimator API model. First, we import the TensorFlow package. Then we define a list of feature columns, the square bracket here, the square bracket in Python is a list. In this case, the list contains only one feature column, is numeric and its name is square footage, and then we instantiate a linear regression model. You do that by calling the linear regressor constructor passing in the list of feature columns and the output directory for the trained model, that's the noun part. You've created the machine learning model. Now, for the verbs. To train the model, you call model.train. But instead of passing in a training dataset directly, instead, you pass in a Python function that returns features and labels. This way, it is possible to hold data that is larger than what you can hold in memory. The other input to model.train is a number of steps of gradient descent for you to perform. Once you have trained the model, you can call model.predict. To predict, you need to send in the features or rather a function that returns the features. Then the predict returns the predictions for those features. So, in the previous slide, we created a numeric column because the input square footage is just a number. But what if you have a categorical column such as the city name or zip code? If you note the full set of possible values beforehand, then use categorical column with vocabulary list, so that's what I'm doing here, categorical column with vocabulary list, the name of the column is zip code and the vocabulary lists. The full set of zip codes for my particular problem, the problem of predicting house prices, I know that these are the five zip codes I care about and so I pass them in as a list of strings, so that is one way to create a categorical column. The second option is if your categorical column is already indexed, maybe you have a list of states and the states are stored in your database as number codes, zero, one, two, et cetera, then, use categorical column with identity, the name of the column is state ID and there are 50 states, so the number of buckets is 50. Once you have the categorical column, whether it is from a vocabulary list or from an identity, you can one-hot encoded using the indicator column, and then you pass the indicator column to, for example, a deep neural net regressor. So, create a categorical column and then pass it to an indicator column. A linear model can handle the sparse data directly. So you could pass in the categorical column directly to a linear model. But if you want to pass in it to a deep neural net, you need to take the sparse column and make it dense, and one option to make a sparse column dense is to use an indicator column and another option is to use an embedding column. We also learned how to write an input function that was capable of reading and parsing comma separated files. To read CSV files, create a text line dataset, you pass in a file name or more commonly, you use TensorFlow's glob operator to do pattern matching and pass that into tax line dataset, and then you call the map function, and by calling map, we ensure that for each line of text that's read from these files that decode CSV function is called. In the decode CSV function, I'm calling tf.decode_csv, getting back the column values. The zip function in Python attaches a header names, square footage, city, amount, et cetera, to the column values. The dictionary now, the features dictionary, also includes the label, the amount. So, we can basically pop to the label column and that way I have features which is a dict and label which is simply the label value. You typically do a few things on the dataset. If we are training, we shuffle the data and we read the data indefinitely, so that's what we're doing here. In training, number of epochs none to read it indefinitely and we're calling shuffle. If we're evaluating, on the other hand, we read the data just once, and then we also read the data in batches. The gradients are computed not on the entire dataset, but on just the batch. Even during evaluation, batching is helpful so that we can read large datasets without overwhelming the machine's memory. Although we can call train and evaluate separately, it is better to call, train, and evaluate, passing in the training parameters and the evaluation parameters. This method is very, very nice. It does distributed training, managing all the necessary book keeping to distribute the graph, share the variables, and evaluate not just at the end but periodically, every say a thousand training steps or every 60 minutes or something like that. When training is happening, it's possible that the machines might fail, so train and evaluate stores periodic checkpoint files so that it can recover from failures. It also saves summaries to tensor board so that you can look at the loss functions et cetera. When you call train and evaluate, you pass in the estimator that you created on the previous slides but you also pass in a train_spec and an eval_spec. The TrainSpec consists of the things that you normally pass into the train method, nor what do you do to the train method. You'd normally pass in an input function that gives back features and labels corresponding to the training dataset, the number of steps that you want to train on and the mode is that it is string. When you're doing distributed training, think in terms of steps and not in terms of epochs. An epoch is rather arbitrary, especially because your training dataset will keep growing and you might want to focus only on the fresh data when you retrain a model. So, it's helpful to think in terms of the number of examples that you want to show the model to retrain it. The way it works is that the training loop saves the model into a checkpoint. The evaluation loop restores a model from the checkpoint and uses it to evaluate the model. When we check point, we want to make sure that the model we save is complete and can be used for prediction. Because it's possible that in the steps that follow the model will start to overfit, and in that case, we want to use a use a current checkpoint as the best model, hence we think of checkpointing as exporting and we'll use an exporter to do this.