Let's start by talking about data augmentation. We will use the iris dataset, where each data point describes a flower. Imagine you're trying to build a classification model to predict the species of flower and your data look like this scatterplot. The position in space represents our features and the color represents the true class which each point belongs, which in this case is a species of iris. Here's a question. I've introduced a new point in green. Which species with this point belong to? Probably virginica. But how sure are you? Now, what about in this case? Our unknown data point is not changed at all, but I've included one other class from the data. Maybe now it's not so obvious. During data augmentation, we must determine the ground truth at points in feature space where we have no label. Now, ask yourself, how comfortable are you making these guesses so far? Depending on the relationship between the features and the labels, and where the point in question is, we may be more or less comfortable making these sorts of guesses. However, by framing data augmentation in this way, it should be clear that making the right sorts of guesses is a problem that can be just as hard as training a good model. One way we can make our task easier is by being intentional about which points we add. So, instead of picking a random point in feature space, we can pick points strategically. For example, we could pick points inside a cluster of data points that all belong to the same class. It turns out though that unstructured data like images, these sorts of neighborhoods don't really exist. Most data points are actually very far from each other. So, we need another strategy. Another strategy is to pick a point and think carefully about its neighborhood and which points around it we can safely treat as similar enough to have the same class. The trick is figuring out what we mean by similar enough, and that will vary with every domain and can be more art than science. Both techniques are valid, but we'll focus on using the latter today. Let's see some examples. There are many commonly used data augmentation practices used in the images domain. Here's a picture of a bridge. Let's pretend we want our augmented dataset with images that would also belong to the same class and which are similar to this picture. You can blur the image. You can sharpen the image. You can resize it. You can crop it. You can rotate it. You can flip it either vertically or horizontally. You can change the hue, or the brightness, or the contrast. However, you always have to be careful and consider whether the transformed image will still have the same class and whether forcing the model to deal with this new image will ultimately help it learn or hinder it. Sometimes color is informative. If not for the dark edges on this mushroom, they basically look identical. But one of these is poisonous and the other delicious. Sometimes orientation is important. One of these is the flag of the Ivory Coast and the other the flag for Ireland. Sometimes small details are important, the kind that might disappear with blurring. One of the differences between jaguars and leopards are the small dots that appear in there fir. Sometimes performing a transformation will change an image from one we're likely to encounter in inference time to one we're very unlikely to encounter. That's because often, the set of all inputs we see in inference time is not uniformly distributed. It has some pattern. Forcing the model to try and learn about data that it will never encounter in production could easily make your model worse. Imagine if you horizontally flipped all of your pictures of natural language in an OCR model. Text encountered at prediction time will generally not be flipped this way. What would happen if you augmented your data at decision time? Generally speaking, researchers don't do this. Users are interested in the predictions at the features that they submit and not at similar features. However, sometimes to boost performance, engineers will augment the data many times at decision time and then take the average of all of those predictions. This is actually a form of ensembling like random forests. Let's say our task is to accept a professionally photographed flower picture and classify it by species. Would randomly changing the brightness and contrast during data augmentation likely improved performance? No, it will not. The reason it wouldn't is because these pictures are professionally photographed, and thus, there are consistent brightness and contrast levels both in the dataset, but also in the expected future input. If the task were to classify any picture of the flower however, manipulating brightness and contrast would likely improve performance. Let's see how you would implement your image data augmentation in TensorFlow. Well, you could implement data augmentation as part of a pre-processing pipeline and then store the results on disk. Because feature space is infinite, doing so would mean limiting yourself to a tiny slice of the feature space or potentially incurring massive storage costs. So, instead, we're going to implement it as part of our input function, so that the model gets a different augmented image every time it trains. We'll be making use of the TensorFlow image branch, the library, which has many directly usable functions. For brevity, I'm going to import this branches TFI in my code. Our implementation of augmentation follows the same pattern of how we parse CSV previously during datasets. We wanted to find a function that performs augmentation on an image and then apply it using the map function. Here, I assume that our data set is a CSV files, file names, and labels. The first lines of this function should be familiar to you. We read a CSV file and map the decode CSV function to each row in the file. However, unlike our structured examples previously, where the CSV file contain the data we wanted to parse to our model, and so our decode CSV function simply returned the features dictionary and labels that the model expected, here we do something different. We begin by reading the image that lives at the path specified in the file with the read_file function. This converts that file into bytes in memory. After we've collected the bytes, we need to convert them into JPEGs. To do so, we use another map, but this time, we use the decode_jpeg function. The decode_jpeg function yields a tensor of integers, but we need flows for our model. So, next, we called the convert_image_dtype function, which takes the integer values, which range from zero to 255, and maps them to the range of zero to one. Now we're ready to do some basic resizing to match the shape that our model expects. This is crucial, or we'll get shape errors. Resize bilinear expects a batch of images. So, first, we expand the dimensions and create a batch dimension. However, once we've resized, we no longer need the batch. So, we discard it, we squeeze. This removes the dimensions with only one element in them. After resizing, we're ready to augment. That's is simple as calling map yet again and parsing in our augment_image function. TensorFlow image comes packed with the number of image augmentation functions that we'll use. I encourage you to explore the library. However, even though we have a lot of powerful functions, as we saw on the section on CNNs, a lot of powerful image processing techniques are actually based on simple convolutions behind the scenes. In my implementation of augment image, I first expand the dims by adding an extra dimension at the zeroth index. I do this so that we can call resize bilinear again. Note that when we resize here, we're actually making the resulting image bigger than height and width. We do this. When we crop later on, we're less likely to crop out something crucial in the image. After that, we use a number of functions each with the word random in them, which means that they'll have different effects every time they're called. Random crop randomly crops a portion of the image. Note that we parse in the height, width, and num_channels that our model expects, and so the cropped image is the right size. Then we randomly flip horizontally. Then we randomly change the brightness. Note the parameter max delta that we parse in. As I said before, data augmentation requires a deep understanding of the domain and what transformations are licensed by your task. In this case, we're committing a change of up to 23 percent of the brightness. Determining the best value for these sorts of parameters will be a combination of what you know about the domain and possibly also hyperparameter tuning. The same thing applies to random contrast. The last thing we need to do is to make our image ready for your machine learning code. Our postprocessed_image function takes the values in our image, which currently range from zero to one, and maps them instead to the negative one to one range. This is helpful for the optimizer. Finally, it packages the image backup inside a dictionary.