Congratulations on completing the natural language processing on Google Cloud course. We hope you enjoyed the course and learned valuable information about natural language processing that will help advance your career. Throughout the course, you explored how Google does NLP. You started with the NLP products and services on Google Cloud. Specifically the NLP solutions such as document AI and CCAI, short for contact center AI, and the pre-built APIs, such as the Dialogflow API. You practiced configuring the key components of the Dialogflow API, including intent, entity, and context in a hands-on lab. The application of these products and services does not require in-depth knowledge or coding background. Instead, it might need more field knowledge. Then you proceed to NLP development and explored Vertex AI, a unified ML development platform that combines two solutions to build an NLP model from end to end. AutoML, a no-code solution and custom training, a code-based solution. It opens a door to users without coding experience and profound ML background to build a custom NLP model by just clicking the UI. Regardless of the solution, you must follow a step-by-step workflow from data preparation to model training and remodel serving. The whole process is similar to serving food in a restaurant. In data preparation, you upload data and engineered data features, which is like preparing raw ingredients. In model training, you train and evaluate NLP models, which is like cooking and experimenting with recipes. In model serving, you deploy the model and monitor the entire workflow, which is like serving dishes on the table. You then applied this end-to-end workflow in an AutoML lab to build an NLP project for text classification. In the next three modules, you advanced to the backend of the NLP development and used TensorFlow coating with Vertex AI workbench a notebook tool. In Module 3, you solve the first challenge that you normally face when building an NLP model. How to represent text in a numeric format while retaining its meaning. You were introduced to different techniques, including basic vectorization, word embeddings, and transfer learning. Basic vectorization is a simple but fundamental technique to encode texts to vectors. You explored two major methods, including one-hot encoding, which encodes the word to a vector where one corresponds to its position in the vocabulary and zeros to the rest, and bag-of-words, which encodes the word to the frequency and occurs in a sentence. You then examined word embeddings, which are an improvement from basic vectorization. You walked through Word2vec, a widely-used word embedding technique that applies neural network to let a machine learn the word embedding matrix E through a massive sample text. Although word embeddings are a breakthrough in the use of neural networks to learn text representation, they are expensive to train. The best practice and current NLP is to use transfer learning, which relies on the word embeddings that are pre-trained for general purposes and then fine-tuned for specific tasks. You practiced how to use transfer learning and reusable embeddings on TensorFlow hub in the hands-on lab. When the text data is ready, you feed it to model training and prediction. In Module 4, you proceeded to the NLP models where you learned about multiple neural networks. You started with ANN, a narrow neural network, which has a single hidden layer between input and output. You investigated the fundamentals of a neural network and explored how a neural network learns. You then proceeded to DNN, a deep neural network with multiple layers between input and output. These layers considerably improve the learning capability of a neural network. Although a DNN has a higher learning capability than an ANN, a DNN does not have a memory, which leads to text prediction inefficiency. Here comes RNN, which brings memory to neural networks. It saves the memory from the past by passing the message called hidden state between cells. Now, a neural network can remember, however, an RNN was found to only have a short-term memory due to a problem called vanishing gradients. To be able to keep a long-term memory, an LSTM creates a complex structure by using two pipelines, including both cell state C for the long-term memory and hidden state h for short-term memory. The two pipelines pass the messages across the LSTM network. Now, an LSTM learns what to forget and what to remember. An LSTM has a long-term memory. However, it's complex and computation consuming. A variant of LSTM called GRU merges the long-term and short-term memory into one pipeline and simplifies the neural network layers. GSU improves computational efficiency and has been increasingly used in NLP field. NLP models briefly explained and artificial neural network, a narrow neural network with a single hidden layer, DNN, deep neural network. A deep neural network with multiple hidden layers. RNN, recurrent neural network. A neural network with short-term memory, LSTM, long short-term memory, a neural network with long-term memory, GRU, gated recurrent unit, an improved and simplified variant of LSTM. In the hands-on lab, you use Keras to build different neural networks, including DNN, RNN, and CNN. After learning the commonly used NLP models, you finally proceeded to the advanced NLP models where you learned about the state of the art NLP technologies developed by Google. You started with a popular sequence to sequence architecture called encoder-decoder. This model receives a sentence, a sequence of words as input and outputs the translated sentence word by word. To improve the encoder-decoder model, you can add an attention mechanism to it. The major change is that this new model involves not just sending the last hidden vector to the decoder, but sending every hidden vector at each timestep. This approach improves the performance of the model by teaching it where it has to pay attention based on the attention weights. You then learned about transformer, which is built on the attention mechanism. Transformer introduces self-attention and feed-forward layers in both the encoder and the decoder. Additionally, it adds a layer called encoder-decoder attention in the decoder. This layer helps the decoder focus only on the relevant parts of the input sentence. Multiple models have been trained based on the transformer architecture since it was published. BERT is the most popular model. The major difference is that BERT considers the order of the words in a sentence, but transformer doesn't. The position embeddings incorporate the order of the input sequence, which allows BERT to learn a vector representation for each position. After learning different advanced NLP models and architectures, you were introduced to large language models, which are general-purpose language models that can be pre-trained and then fine-tuned for specific purposes. Benefiting from pre-trained and fine-tuned large language models for downstream tasks might be the future of NLP. In the end, you had a hands-on lab to create a translation function by using encoder-decoder network. We hope that this course enriches your journey exploring natural language processing. For more training and hands-on practice with Machine Learning and AI, please explore the options available at cloud.google.com/training/machinelearning-ai. If you're interested in validating your expertise and showcasing your ability to transform businesses with Google Cloud technology, you might consider working toward a Google Cloud certification. You can learn more about Google Cloud Certification offerings at cloud.google.com/certifications. Thanks for completing this course. We'll see you next time.