Hello, and welcome to our lecture on SQL. So this is a very, very short introduction to SQL. I love SQL. I was talking to a student the other day and they were like, "SQL is my favorite language." And I said, "I have to agree that SQL is my favorite language too." Because SQL is so beautiful, you don't even need to write loops in SQL, we write loops in languages like Python just because we have to. The problem is that SQL is perfect, but it's not the way we could write something as complex as Linux or Python. And so SQL is not really the way computers think, we have these things called database engines, and these database engines make things work really, really well. And so SQL is a great way to organize data, but a lot of work has gone into keeping that data organized and making it simple. So it's like there's this abstraction, and it's a lot of work to make that abstraction work. So I want to start out with a little bit of history before databases, and you might think, "Well, before databases, how could that be?" Well, in the old days, and if you watch old movies featuring computers, you see pictures that have these tape machines in the background and they're spinning. And the spinning of the tape machine is the sort of physical manifestation of a computation. And it had to do with the fact that in the old days, computers did not have enough data to store all of their data in the memory or on a disk, we didn't even have disks in the early days. In early days, tapes were what we had, and we used kind of a variation of audio tapes, and instead of audio, they contained data, zeros and ones. And so the way things would work in the old days so that you could have a database, as it were, was the database was all of your data recorded onto a tape. And the key thing to a tape is that it was a physical thing, and so you had to rewind it, go forwards and backwards. And most of you probably never even have seen a cassette tape or an eight-track tape or a reel-to-reel tape, but it's a physical long thousand-foot piece of plastic, and to get the data at the end, you had to spin it until you got there. And so it wasn't what we call random access, which means that in a random access, whether it's memory or the disk drive, random access allows you to get to any place roughly in a fixed amount of time. Whereas, if you're close to something on a tape, you can get to it very quickly. If you're far away, you got to go, and then wait, and then come back. So this led to a pattern where every night we would take the tape from yesterday's account balances. Let's just say we're a bank, and we're having people coming up to the window, putting in money, taking out money. And then what we would do is we would put all of the transactions, sometimes on physically punched cards, and then we would sort them, right? And let's just say we have 26 accounts and person C shows up adds $10 and person F shows up and subtracts $10, and etc., etc. And then what we would have at night, we would get these all sorted, and there were physical machines that would actually sort these in the earliest of days, and then we would have the old balance, A, B, C, D, E, F, sitting on a tape in order, sorted in order. And we would start a process after the bank closed where we would read the first record off the tape like A, we would read the first transaction off the cards, and we would compare them. And if the tape was earlier than the transaction or lower, we would just copy it, so we'd copy it. And then so the B would compare to the C, and B would get copied, and then C would come across, and then we'd realize that the C data, we would have to update the C data, and add $10 to the balance, and then we'd write that. And you'll notice as we're writing this, we are sorting, it's in sorted order. Then we would copy D, we have nothing for D, and it would just get copied. We'd copy E, we have nothing for E, and it would just get copied, and then we would have, oops, forgot to put F on this picture, we would have F and we'd subtract 10, and we would copy it. And then we would be done with the kind of stuff that we had to do that night, and then what we would do is we'd have this new updated data that was all sorted in account order. And then the next night, we would move the tape, the new tape would be the old tape, and we would do it over and over and over again. And sometimes these processes would run for four or five hours because the tapes were long or it got to be multiple tapes, and so a lot of computing was done after the banks closed at five o'clock, then a group of people would come in, we'd call them operators, and they would mount these tapes and run these jobs and just sit there and watch these tapes spin as the data was being copied from one to the other. So yeah, that's pretty cool. Thankfully, that's not how we do it any more. The thing that made it so we don't have to do this is the invention of disk drives. And disk drives are magnetic media, just like tape drives, but the difference was is they were on these spinning platters. And so what you could do is the data was in a series of concentric circles on these platters, and then you had a head that could move in and out rapidly. And so the speed at which you could access data was a combination with the speed at which the head could move to a certain cylinder, and you had to wait until the data came around. And so sometimes you would have 5,400 revolutions per minute. So that means that 5,400, 5,000 times per minute, which is 100 times a second. You would wait 100th of a second for it. And so this is way better than waiting like a minute and half, two minutes, even 10 minutes to spin up a tape. So the problem was, is then we're like, "Okay, now that we have this wonderful technology, we can store all of our nightly data on a disk because it's enough space, how do we build technology to make really good use of this?" And a lot of research, a lot of computer science, a lot of vendor research went into understanding how to best use this new disk thing. I remember talking to people like, because we used to use a floppy disk, which they were much slower, so they didn't go 5,400. So it might take a quarter of a second to move the head and it might take almost a second to spin it around, which was still way faster than 10 minutes, right? So it's like about a second to get to anywhere on the floppy disk. And we would build approaches to storing data, and we would start with this physical thing and you'd hear it grunting and moving and grunting and moving, and then you would come up with a better way to store the data, and then it grunt and moved less, and your thing would go faster. So this was like a research thing. It was good old days when computers actually made noises, and you could change your code inside computers. And the first thing, of course, we did was just took those sequential master updates and put them on, but that turned out to not be the best. And that was sort of the research is, not just how can we do what we used to do on data that couldn't be randomly accessed, and instead, say let's take advantage of it. And so a lot of research, a lot of vendors, there was a lot of very expensive software that people bought to say, "Look, you're a bank, and you really and truly want to know how this works." And so what happened was in the '60s and '70s, the computer vendors were fighting, and they would come up with a technique, which compared to our current database technique, was just trivially simple, but it took a lot of work back then, and they would fight with all the other computer vendors, and they would say, "Our index sequential mechanism is better." And Vendor 2 would say, "Our network approach is better." And then they would sit, and they would sort of struggle with each other, and then the poor customers would be like, "I don't even know what you're talking about." And so then they would just buy the stuff from the vendor who took them out to golf or for the best lunch or took them on a trip to Florida. And then they would just adapt to whatever that vendor did. And so it was really kind of scary because we were moving into a place where you were locked into a vendor, and then once you're locked into a vendor, you're in bad shape. And so the US government, in particular, had a ton of money and were buying tons of computers, both for normal data processing and for strategic things like military stuff, and so they were buying tons of computers and they knew that these vendors, if they got to the point where they locked them into a single vendor, that the vendor would just be able to dictate the price. And so the federal government, through NIST, the National Institute of Standards and Technology, decided that databases needed a standard. They needed a way so that we could talk to any database from any vendor using the same technique and not have to do one vendor, and then if you switched vendors, you had to completely rewrite your software. So NIST didn't specify how this abstraction, this standard, was going to work. They just brought the vendors in and said, "We're going to stop using your software if you don't come up with a standard." And it was a perfect time because even though the vendors thought everything was perfect and wonderful, there was a lot of upside. And in particular, there was this concept of relational databases, which back then was more of a mathematical theory of a better way to store and retrieve data than it was a practically implemented available technology. And so SQL, and this is Liz Fong, who I went and I did an interview at NIST, she talks about the early days, it's a great interview, the early days of how this SQL standard really made so much sense and really fixed the industry and got the industry working really nicely. And I recommend that you take a look at this video just so you can kind of see. Now, what SQL basically said was that let's not worry so much about where that data is actually on disk, because the earlier database approaches were we programmers would know that there was cylinders and positions, and we would place things in certain ways, and we would make indexes and we would look at the indexes ourselves, all kind of of different things, or we would read to one place on the disk and that would have an address of the next place we're supposed to read on the disk. And they said, You know what, let's not worry so much about the physical storage structure on the desk, but instead, let's come up with a language by which we can express what we want to read, write, update, and delete on the disk in a really simple language, and then we can build a piece of software that there could be many disk formats underneath that. And then we could all just sort of use this language and you could buy Vendor A and use this language and Vendor B and use this language, and then you could even benchmark it and say, "Here's a bunch of stuff in this language. Which vendor can run this faster?" And so the NIST also had a series of benchmarks, TPC were some of the benchmarks. And so what they did is they basically made it so that this was like a beautiful and simple language, and it's what we call SQL. I think at some point it was called the Simple Query Language, and it is the language that we use to talk to databases. And the interesting thing is what SQL allowed us to do really in the '60s and '70s when it first came out, was to talk to the old-style databases like index, sequential, and network-style databases, and as this new form called relational databases, which had like this awesome mathematical underpinning, but was implemented badly in the first time, we could just say, "Oh weird." When the performance of relational databases got to the point where it was as good as the other databases, then we could just switch and we could talk the same SQL. So it allowed an amazing innovation over a 10 or even 15-year period, where the abstraction allowed us to improve databases in a wonderful way without really having to change how we as programmers used them. Pretty dang awesome, I'll be honest. And so I've been talking about this idea of relational databases, and relational databases, and I'll admit that when I first saw relational databases in the early 1980s, I'm like, this is crap, because again, I could hear floppy disks and I knew that we were moving a thing and I had so much awareness of what was going on, and I loved that awareness. And the relational databases sort of like hid what was going on behind this layer, and it turned out that whenever I used it, it was slower because it wasn't yet clever. And so relational databases started out as a mathematical idea as the theoretical best way to represent networks of data with columns, and rows, and connections, and joins, and we'll talk about all that eventually. And it was a beautiful idea and the mathematics was right. It just took a long time to get to the point where the superior mathematics of relational databases worked out better than the highly cleverly optimized previous generation database systems. So there's a lot of common database systems that you'll run across. Some of them are open source, some of them are commercial. Postgres and MySQL are open source. Postgres is probably the more intricate and complex and rich, Oracle is an intricate, complex, and rich database that's available for lots and lots of money, and Microsoft has a database called SQL Server. And all of these databases that we're seeing evolved in a sense after the SQL spec existed, so they just kind of built SQL. There were legacy databases that kind of learned SQL. And there's lots of other databases. SQLite is what we're actually going to use with this course, and it's super cool, super fast. On your cell phone, you probably have 40 copies of SQLite, and in your car, you probably have 40 copies of SQLite, because it's a real tiny embedded database. It's not really good for production scale. When you got 100 users simultaneously hitting a website, Postgres, MySQL, or Oracle or SQL Server are really kind of online real multiprocessing databases. But SQLite is great because it's really simple. It stores everything in one file, we can move the file around, etc. So I love SQLite, and in this class and in this thing we're going to talk about SQLite. But everything we talk about applies to all of these different database systems. So the first thing that you've got to do when you are going to work with the database is you've got to come up with a contract between you and the database as to what the shape of the columns are, how wide the columns are, what kind of data you're going to store in this. And it then informs how the database software is going to structure that database like on the disk. And part of what you're doing is you're allowing the database software, MySQL, or Postgres, or SQLite, to super-optimize its storage. It's so much smarter than you. Something like Oracle, or MySQL, or Postgres probably represents a billion dollars, if not more, of research and development about the best way to store and retrieve data. And you just don't even need to know any of that. You just say, look, I'm going to have these columns, it's going to have a 100 characters of text, then I'm going to have an integer and a floating point number, then 3,000 characters of text, and you worry about all that detail. But the key is if you just say, "I'll store anything I want," then it can't optimize, and so there's this thing where you have to make a contract at the beginning and say, "This is exactly how I'm going to lay out the data as I talk to the database system. and then you can figure out how to lay out the data in the actual disk." And so that's called the schema or the database model, and building a database model is itself a cool thing. So if you're building a complex application, you have these models and they have connections, and we'll eventually talk about all that stuff. But it's a critical thing to say that that's a contract between you and the database software so the database software can optimize the storage of the database. Now, once you have given the schema, then you can write queries that depend on that schema. So you might have a column called email and a column called name, and then you can insert data into the database using that schema, you can retrieve data using the schema, you can change data sort of somewhere deep inside the database, and then you can delete data. Hang on a sec. Whoa. Hi there. What are you doing down there? Oh, dang it. Sorry, sorry. You are a bad creature. Look at that. I was wondering what that noise was. Come here. I've got to introduce you to all these folks. Come here. You are not behaving. So this is Shelby. Hi Shelby. So this is Shelby. This is my puppy. I'm trying to keep her out of the cage and let her sit and get by my feet, and she's got some chew toys. Hi Shelby. This is Shelby. She is interrupting. You're interrupting my lecture. But you're cute. So quit chewing on the pen, chew on your chew toys that I gave you, okay? Let me get through this lecture. Okay, you can get down. Just quit chewing on it. Sorry about that dog interruption. I was just hearing chewing noises and I thought, I took everything but then there was one pen on the floor, and so the pen, poor pen meets Mr. Puppy. Oh, well. Okay, where was I? SQL. CRUD, create, read, update, and delete data. And so our next step is to actually do some SQL and create a database. [MUSIC]