So in Week 3 of Course 4, you're going to cover many different applications of NLP. One thing you are going to look at is question answering. Given the question and some contexts, can you tell us what the answer is going to be inside that context. Another thing you're going to cover is transfer learning. For example, knowing some information by training something in a specific task. How can you make use of that information and apply it to a different task? You're going to look at BERT, which is known as the Bidirectional Encoder Representation, which makes use of transformers. You'll see how you can use bidirectionality to improve performance. Then you're going to look at the T5 model. Basically what this model does, you can see here it has several possible inputs. So it could be question, you get an answer. It could be review and you will get the rating over here. It's all being fed into one model. Let's look at question answering. Over here you have context-based question-answering, meaning you take in a question and the contexts. It tells you where the answer is inside that's context over here. So this is the highlighted stuff, which is the answer. Then you have closed book question answering, which only takes the question and it returns the answer without having access to a context, so it comes up with its own answer. Previously we've seen how innovations in model architecture, improved performance, and we've also seen how data preparation could help. But over here, you're going to see that innovations in the way that training is being done also improves performance. In which case, you will see how transfer learning will improve performance. This is the classical training that you're used to seeing. You have a course review, this goes through a model and let's you predict the rating. Then you just predict the rating the same way as you've always been doing. So nothing changed here, this is just an overview of the classical training that you're used to. Now in transfer learning, let's look at this example. Let's say that you have movie reviews and then you feed them into your model and you predict a rating. Over here you have the pretrain task, which is on movie reviews. Now when training, you're going to take the existing model or movie reviews and then you're going to find units or train it again on course reviews. You'll predict the rating for that review. So as you can see over here, instead of initializing the weights from scratch, you start with the weights that you got from the movie reviews and you use them as a starter point when training for the course reviews. At the end you do some inference over here. You do the inference the same way you're used to doing, you just take the course review, you feed this into your model, and you get your prediction. So you can also use transfer learning on different tasks is another example, where you feed in the ratings and some review and it gives you sentiment classification. Then you can train it on a downstream tasks like question-answering, where you take the initial weights over here and you train it on question answering. So when is pay day? The model answer's March 14th. Then you can ask them model the same question. When's my birthday over here? It does not know the answer. But this is just another example of how you can use transfer learning on different tasks. Now we're going to look at BERT, which makes use of bi-directional context. In this case, you have learning from deep learning AI, it's like watching the sunset with my best friend. Over here the context is everything that's come before. Then let's say you're trying to predict the next word, deep-learning AI. Now when doing bi-directional representations, you'll be looking at the context from this side and from this side to predict the middle word. This is one of the main takeaways for bidirectionality. Now let's look at single task versus multitask. Over here you have a single model which takes in a review and then predicts a rating. Over here you have another model which takes in a question and predicts an answer. This is a single task, each like one module per task. Now, what you can do here with T5 is it is the same model that's being used to take the review, predict the rating, and then take the question and predict the answer. So instead of having two independent models, you end up having one model. Let's look at T5. Over here, the main takeaway is that the more data you have, generally the better performance there is. For example, the English Wikipedia dataset is around 13 gigabytes compared to the C4 Colossal Clean Crawled.Corpus is about a 800 gigabytes, which is what T5 was trained on. This is just to give you, how much larger the C4 data sets is, when compared to the English Wikipedia. What are the desirable goals for transfer learning? First of all, you want to reduce training time because you already have a pre-trained model. Hopefully, once you use transfer learning, you'll get faster convergence. It will also improve predictions because you'll learn a few things from different tasks that might be helpful and useful for your current predictions on the task you're training on. Finally, you might require or need less data because your model has already learned a lot from other tasks. So if you have a smaller dataset, then transfer learning might help you.