And here we are with lab six, where we're going to ingest and query those new datasets. And the main things we're going to do here is ingest data from Google Cloud Storage in the BigQuery. And then also set up that external data source from a CSV in cloud storage directly, as you saw in the demo that we did earlier. So if you haven't attempted this lab yet, go ahead and pause this video and give it a try. And then come back for this quick walkthrough when you're done. Okay, so first and foremost, there's a lot of other interesting data that goes along with our IRS datasets. So there is this NAICS codes CSV which provides a lot of interesting lookup values for a lot of the ones, twos, threes, fours, fives. Those codes that exist in the IRS dataset, that provide a little bit more color. So we want to actually ingest this data and then be able to query it inside of BigQuery. So first download that CSV, I've downloaded that. And what we need to do is we need to actually store that in a Google Cloud Storage bucket. So I've got my bucket open that we have earlier and I'm going to upload a file to my bucket. I've got this CSV, Uploading it to my bucket, and I'm just going to copy the link address from here. I'm going to paste it into BigQuery, clean it up a little bit, remove any of the Google Analytics question marks at the end, and then remove the direct hyperlink. I'm just going to actually change that to gs. So you've got gs:// the bucket name and then your file name, including the extension. So now that we have that on our clipboard, let's go back and we need to actually ingest this. So in our irs_990 dataset that we've created, we want to create a new native BigQuery table. So, this table's going to be called, naics_digit2017_codes. So as you saw a little bit earlier in the lecture, creating a new table is just as easy as hovering over the dataset, create new table. And it's going to come from Google Cloud Storage. Pasting in that long URI for the bucket and the file name, it is a CSV so we can leave that. It's going to be a native table instead of an external data connection. I'm going to head back to the lab, and just copy in the table name to make sure I don't misspell anything. Next codes and we're going to have it automatically detect the schema, and leave all the rest of the options as default. As you saw on the demo, the automatic detection of this schema can work but it could also present some pitfalls, so it's always good to check what actually does get imported. And the rest of the options are defaulted, so we're loading this in. And it's going to be this table when it's finished loading. And we click here and we can see, all right, we got a sequence number, we've got this code, got this title. And then we have this string field 3, 4 and 5 that may not actually be of any use to us. So previewing that data because we've ingested it permanently, you can see some of the different data values. And for these fields, there might have been just a blank character that was included in the CSV, that then triggered all these three fields to load. So you can drop them as part of your additional SQL query or if you're ingesting it through cloud data prep, you can automatically drop those fields. Or if you're ingesting this and you don't use the automatic schema adjustment, you could actually specify just those three columns and nothing thereafter. So just keep in mind when you're using the automatic schema detection that it does the best that it can. But you may need to actually really look to see if there's those anomalous blank spaces that folks have entered into your CSVs. So we saw that, we previewed the values. Take a look at some of those sample data values. Okay, so that table's permanently there, and again, you can validate that it's permanently there because you could preview those data values. It's stored permanently. And now, what we want to do is read a CSV as an external data source. Set up that pointer table that then reaches out to an external data source to actually query that data. All right, so same process before except we're going to change the table path, right. So this is actually going to be a new table. And, create a new table on your data set exactly as we did before. And, it also is going to come from Google Cloud Storage. Paste in the path. And it is a CSV, that's great. And, let's copy over our new table name, and this is IRS code lookups. And the only thing I'm going to change here instead of a native BigQuery table, we're actually going to reach out to the CSV through an external data connection directly. And we're going to populate the schema ourselves, Just simple copy and pasting here, add two more fields. We've got the code in the code description. And these are optional fields because they're all knowable and they're all string fields. And let's see if we have any header rows to skip. All the other options, leave as default except we do have some header rows at the top. So I'm going to skip that first row because those are going to be headers, and go ahead and create that table. And immediately look how fast they're created, and again, that's just because it's not actually doing any ingestion of permanent data. It's just storing that URI lookup to where it is on Google Cloud Storage. So if you view the details to that, you could actually scroll all the way to the bottom, and this is your external data configuration. So every time you query from this table, and that's why it can't be cached. Because you're looking out to Google Cloud Storage and then querying that CSV every single time. So let's actually do that. So we're going to query that table. Confirm that we're going to remove everything in our query editor that was there before and run this query. And now you can get a little bit of some interesting look up information. So if you didn't know what elf was, you can basically say that, all right, well, elf means that it was actually an electronically filed return if it's e or a paper file return if it's b. And then this table that we just ingested provides a lot of other neat look up fields. So you can see actually for this sub-section code, if it's 6, it's a business league. So you can imagine this look up table will be very useful for doing something like a join or a look up against the tables that we already have stored. So we're getting in some additional data, ingesting external data sources for enrichment that we're going to be doing a little bit later. Okay, the table has been created. And we've done the querying. And again, be mindful of that performance pitfall, anytime you're doing something like an external data connection. In the one edge case that I can think of is doing that one time extract transform load and pre-processing job in your SQL script. But then storing the results of your SQL script in to BigQuery managed storage as a permanent table. All right, nice work with this lab. Between this lab and the last one, you now know how to query and store the results of your queries as tables and bring in new datasets. So that's fantastic.