Hi, my name is [inaudible] and I want to tell you in this video why we made the machine learning library Trax. This is a bit of a personal story for me. I've been at Google for about seven years. I was a researcher in the Google Brain team, but before that, I was a software engineer and I worked on a lot of machine learning projects and frameworks. The journey for me ended in the Trax library. I believe Trax is currently the best library to learn and productionize machine learning research and to understand machine learning models, especially sequence models, models like transformers and models that are used in NLP. The reasons I believe come from a personal journey that took me here. I will tell you a little bit about myself and how I came here. Then I'll tell you why Trax is the best library to use currently for machine learning, especially in natural language processing. My journey with machine learning frameworks started around 2013, 15 when we were making TensorFlow. Tensorflow is, you've probably known, is big machine learning systems it can support 100 million downloads we know, and it was released in November 2015. It was a very emotional moment for all of us when we were releasing it. At that point, we were not sure if deep-learning will become as big as it did. We were not sure how many users there will be. What we wanted to do was a system that's primarily very fast, that can run distributed machine learning systems, large-scale fast training. The main focus was speed. A secondary focus was to make it easy to program these systems, that was on a radar, but it was not the most important thing. After releasing TensorFlow, I worked on machine translation, and especially on the Google's neural machine translation system. This was the first system using deep sequence models that was used by the Google Translate thing that was actually released as a product. It's handling all of Google translations these days. Every language that we have has a neural model. It started with LSTMs and RNN models, and now it's a lot of transformers. We released that in 2016 based on the TensorFlow framework. These models, they're amazing. They're much better than the previous phrase-based translation models that they took a long time to train. They were training for days on clusters of GPUs at that time. This was not practical for anyone else to do, rather than Google. This was only because we had this TensorFlow system, a large group of engineers who went very well and we were training for days and days. That was great. But I felt like this is not satisfactory because no one else can do that. It's not possible to be done at the university. You cannot launch a startup doing that because it was impossible if you are not Google or maybe from Microsoft, but no one else. I wanted to change that. To do that, we created the Tensor2 Tensor library. The Tensor2Tensor library, which was released in 2017, started with the thought that we should make this deep learning research, especially for sequence models, widely accessible. This was not working with these large RNm but while writing the library, we created this transformer model. This transformer has taken NLP based farm because it allows you to train much faster in the matter at that time, a few days. Now, it's less than a day in a matter of hours on an 8 GPU system. You can create translationals that surpassed any other models. The Tensor2Tensor library has become really, widely used. It's used in production Google systems. It's used by some very large companies in the world and it has led to a number of startups to think about that basically exists thanks to this library. You can say, well, this is done, this is good, but the problem is, it's become complicated and it's not nice to learn and it's become very hard to do new researcher. Around 2018, we decided it's time to improve as time moves on, we need to do even better. This is how we created Trax. Trax is a deep-learning library that's focused on clear code and speed. Let me tell you why. If you think carefully what you want from a deep-learning library, there are really two things that matters. You want the programmers to be efficient and you want the code to run fast. This is because what costs you is the time of the programmer and the money you need to pay for running to your training book. Programmers time is very important. You need to use it efficiently. But in deep learning to your training big models, and this costs money too, for example, using the 8 GPUs on-demand on the Cloud can cost $20 an hour, almost, but using the preemptible eight 8 TPU costs only about $1.40. With Trax, you can use one or the other without changing a single character in your code. How does Trax make programmers efficient? Well, it was redesigned from the bottom up to the easy to debug and understand. You can literally read Trax code and understand what's going on. This is not the case in some other libraries. This is unlikely not the case anymore in TensorFlow, and you can say, "Well, it used to be the case." But nowadays, TensorFlow, even when we clean up the code, it needs to be backwards compatible. It carries the weight of these years of development, and this has crazy errors in machine learning. There is a lot of baggage that it just has to carry because it's backward compatible. What we do in Trax is we break the backwards compatibility. Yes, this means you need to learn new things. This carries some price. But what you get for that price is that it's a newly cleanly designed library which has four models, not just primitives to build them, but also four models with dataset bindings. We regression test these models daily because we use this libraries so we know every day these models are running. It's like a new programming language. It costs a little bit to learn because it's a new thing, but it makes your life much more efficient. To make this point clear, the Adam optimizer, the most popular optimizer in machine learning these days. On the left you see screen-sharing from the paper that introduced data. You see it has about seven lines. Next is just a part of the Adam implementation in PitRouge, which is one of the cleanest ones actually. You need to know way more. You need to know what are parameter groups. You need to know secret keys into these groups. That key parameters based on means. You need to do some stick initialization and some conditional. You need to introduce other and other things. On the right, you see the Adam optimizer in TensorFlow and Keras. As you will see, it's even longer. You need to apply it to resource variables and to non-resource variables, and you need to know what these are. The reason they exist is historical. Currently we only use resource variables, but we have to support people who use the old non resource variables too. There are a lot of things that in 2020 you actually don't need anymore, but they have to be there in PitRouge and in TensorFlow code. While if you go to Trax code, this is the full code of aliment tracks. It's very similar for the paper. That's the whole point. Because if you're implementing a new paper or if you're learning and you want to find in the code of the framework, where are the equations from the paper, you can really do with this here. That is the benefit of Trax. The price of this benefit is that you're using a new thing. But there is a huge game that comes to you when you're actually debugging your code. When you're debugging your code, you will hit lines that are in the framework. You will actually need to understand these lines, which means you need to understand all of these PitRouge lines and all of these TensorFlow lines if you use those. But in Trax, you only need to understand these testlets. It's much easier to debug, which makes programmers more efficient. Now this efficiency would not be worth that much of the code is running slow. There's a lot of beautiful things where you can program things in a few line, but they run so slowly that it's actually useless. Not so in Trax. Because we use the Just-In-Time compiler technology that was built in the last six years of TensorFlow. It's called XLA, and we use it on top of JAX. These teams have put tremendous effort to make this code the fastest code on the planet. There is an industry competition called MLPerf, and in 2020, JAX actually won this competition, being the fastest transformer to ever be bench marked independently. JAX transformer arrived in 0.26 of a minutes, in about 16 seconds I think, while the fastest TensorFlow transformer on the same hardware took 0.35 minutes. We see it's almost 50 percent slower. The fastest PyTorch, but this was not in TPU took 0.62. Being two times faster is a significant gain. It's not clear you don't get the same gain in any model on other hardware. There was a lot of work to do unit for this particular model hardware but in general, Trax runs fast and this means you'll pay less for the TPUs and GPUs you will be renting on Cloud. It's also tested with TPUs on colab. Colab's are high Python non books that Uber gives you for free. You can select a hardware accelerator. You can select TPU and run the same code with no changes, whether it's GPU, TPU or CPU on this colab where you're getting an eight card, TPU for free. You can test your code there and then run it on Cloud for much cheaper than other frameworks and it really runs fast. These are the reasons to use Trax, for me, Trax is also super fun. It's super fun to learn. It's super fun to use. Because we had the liberty to do things from scratch using many years of experience now. You can write model using combinators. This is a whole transformer language model on the left. On the right you can see it's from a ReadMe. This is everything you need to run a pre-trained model and get your translations. This gave us the opportunity to clean up the framework, clean up the code, make sure it runs really fast. It's a lot of fun to use. I encourage you to check it out. See how you can use trucks for your own machine learning [inaudible] You can use it both for research if you want to start a startup or if you want to run it for a big company, I think Trax will be there for you.