I'm Andrew Barto, and I want to introduce Rich Sutton, who was my first student, and we spent many years working together. Both of us are amazed I think as the app [inaudible] to the outgrowth of that early work. And rich, is an amazingly focused persistent, man of genius, I think and fancy. I'm embarrassing him. A lot of London the core ideas I really owe an enormous amount to Rich, so that's Rich. Well, I feel like I've learned an enormous amount from you Andy. You have just been essential to making our work scholarly and relevant to the modern age and to the old. I feel enormously grounded by all things I've learned as your students in those years. I want to say that the main thing we did where we wrote that textbook, but the main thing we did was we rediscovered the field of reinforcement. I'd like something I'd like to say that we discovered it, but I know that you would react to that because there's nothing new under the sun, you used to tell me. Great. If you do find something, you read, there's nothing wrong with rediscovering something old, and you should embrace that. If you rediscovered the wheel, as you say, we should call it the wheel. There may be a better way. Maybe a better way, but I do feel we rediscovered reinforcement. We woke it up because it would add fallings on collect, and we clarified what it was and how it's different from supervised learning. This is sort of what I was characterizing as the origin story of reinforcement learning. It was an obvious idea, Marvin Minsky knew it in 1960 or '59. It's so obvious that everyone knew it, but then it became overshadowed by supervised learning until, Eric Loft started knocking on people's doors and saying, "Hey, this is something that's been neglected." And as real and as important and tasked us to figure it out. In fact, the very first neural network simulation on a digital computer by farmland Clark was a- 1954. Was a reinforcement learning system. In their second paper, Clark and Farley a year later, it was the same system but they focused on generalization. Certainly it departed from the root zone. After that, your perceptron and withdraw Hoff was Error correction. Even though, some of the words they used were trial and error, they really worked. Became supervised. Yes, and there were exceptions, but as we just said, Minsky, in admins case, thesis and his steps paper, is full of prediction, TV like things, certainly the credit assignment problem. Then his thesis at Princeton with a reinforcement learning physical network, that learn to go through a maze. I think Minsky lost interest in it for a number of reasons, or maybe it was embarrassed by it. I don't know, but that is a reinforcement learning system. Andy, typical of yourself you are talking about what other people did historically, but maybe typical of me I wanted to bring it back to what we did, because I think we did do something. I almost did very little but we recognized, we just stood up and said, "This is a thing. This is the thing that hasn't been investigated and it's deserving an investigation." And we wrote papers on associative search networks, and just really simple ideas, and made the claim that this cannot be done. The combination of association and trial and error. I think as you said that I think at the time, search for something that works and then you will remember, combining search and memory. That is the essence of reinforcing, then strangely had been wrong. I discovered something relevant to that. Mentalization, that problem [inaudible] who was my colleague at the University of Massachusetts for a while. Donald Mickey, talked about mentalization which is one, RL is a memorized search. You do some operation, and then you remember the results, and the next time you have to do it. You look it up instead of recomputing, and it saves a lot of time and so on. In a sense, our RL at its root is memorized context-sensitive search. Pople stone, and actually at the end of one of his paper on this 50th, I forget the date, but talks about interpolator, like using polynomials instead of Lookup table to look up something with generalization. That's what neural networks do for example. So we didn't- We discovered. We didn't invent memorization, but through a new use for it. I don't know if people were doing memorized search the way reinforcement learner's. Here he had this idea of a distributed approach. Gold seeking systems made up of gold seeking components. He also had the idea of what you call a generalized reinforcement, that one of these units could be reinforced by all kinds of signals and not just our binary signal. The way I remember it, the essential, I know I'm in column, so it's like an insight but it's like that on an absence of insight. We decided one day, well maybe just the goal seeking unit thing. Maybe that's an interesting idea all on its own. Without making goal-seeking systems out of ballsy proponents, and without having a generalized reinforcement, would have just have a specialized reward signal. We just study that Learning, and maybe that's the thing. Maybe that's something that needs to be worked out. Like. To me, it was remedies like maybe just that. It was like a lethal level and fancy stuff, or you're confusing stuff aside. For the moment, there's a point to be made, and I think that's what reinforcement learning. It is just focusing on a learning system that actually wants something that does trial and error and remembers it, and has to specialized reward signal.