Now, so far we've only queried datasets that already exist within BigQuery. The next logical step after you're finished with all of these courses is to load your own datasets into BigQuery and analyze them. That's why in this module, we'll cover how you can load external data into BigQuery and create your very own datasets. First, let's cover the difference between loading data into BigQuery versus querying it directly from an external data source. As you can see on the left, there's a lot of different file format types and even systems that you can actually ingest and grab data from, and then load permanently into BigQuery-managed storage. Just to name of few very common staging areas, Google Cloud Storage, we could have your massive CSV files stored into Cloud Storage buckets, which is very common, or Cloud Dataflow jobs. Your data engineering team has set up these beautiful pipelines, and it's part of one of the steps in the pipelines. You can have that data right out or materialize itself into a BigQuery table for analysis. That's very common. and as you saw as one of the UI layers for Cloud Dataflow, that Cloud Dataprep tool, that you got a lot of practice with in the last course, does exactly that. It will invoke that materialization step for Cloud Dataflow and then write that out to BigQuery-managed storage. Other Google Cloud platform tools, big data tools like Cloud Bigtable, you can export or copy that data from Bigtable into BigQuery-managed storage. Of course, you can manually upload through your desktop or a file browser and ingest those tables into BigQuery-managed storage. Why do we keep mentioning the word managed. That big concept, that big icon that you see there in the middle, is a key core component of the BigQuery service. As you mentioned in one of the earlier courses, BigQuery is two components. It's the query engine that will process your queries, and it's also the data management piece behind the scenes, that handles, and stores, and optimizes all of your data. Things like caching your data, storing it into column format, and compressing those columns, which we're going to talk a little bit more about in the advanced course on the architecture of BigQuery, and expanding the data, making sure that it's replicated, and all of these things that are traditional, a database administrator would handle for you. The BigQuery team here at Google manage that for you behind the scenes. Why am I making such a big deal about managed storage? Because you might guess, hey, all right, cool, it's managed storage, I don't have worry about that. When is my data never going to be in managed storage? The answer is, it could quite possibly never even hit managed storage if you connect directly to the external data source. This is the mind-blowing concept. You can write a SQL query and that SQL query can be passed through, and underlying your actual data source, could be a Google Drive spreadsheet that someone is maintaining, and that data is not ingested and permanently stored inside of BigQuery. That's an extreme case, because naturally you can see the caveats of relying on a collaborative spreadsheet as your system of record for a lot of your data. But this is a common occurrence for things like one time extract transform, load jobs, where you have a CSV that's stored in Cloud Storage. Instead of ingesting that data and storing that raw data inside of BigQuery, storing it in two places, Cloud Storage and BigQuery, you instead query it, performed some pre-processing steps, cleaned it all up, and then at the end of that query, store the results of the query as a permanent table inside of BigQuery. That's one of the common use cases that I can think for creating or establishing this pointer or this external connection. Now, as you see that big arrow over BigQuery-managed storage, you're just and the query engine, you get none of the performance advantages from BigQuery, the managed storage piece, and a lot of other drawbacks. Let's cover some of those limitations. It's performance disadvantages, there's a lot that goes into the special sauce of the BigQuery architecture behind the scenes. What makes it much more performant to store your CSV data, ingest it permanently to BigQuery, as opposed to keeping it out on say, a Google spreadsheet or Google Cloud storage. A lot of those compression algorithms and the architecture of BigQuery, how it stores data in column format, we'll cover a lot in the architecture lecture coming up in the next course on advanced insights. But one of the key things that should hopefully scare a lot of you away from using Google spreadsheets as your source of truth for your underlying datastore, could be data consistency. If you're writing out of BigQuery query, as we mentioned in the previous slide, you have a BigQuery query that then is reaching out to a Google Drive spreadsheet. If you have folks that are editing that spreadsheet, the query doesn't necessarily know, hey, this is when I accessed it at this particular timestamp, and this is what the data was. If you have data influx or inflate, since it's not managed natively by BigQuery itself, there are few checks in place on whether or not the data that you're pulling was the data you've expected when it was last updated in that particular spreadsheet as well. A lot of features that you can actually enable inside of BigQuery like these table wildcards, when we uncover unions and joins inside of our merging datasets lecture, are unavailable outside of storing your data directly inside of BigQuery. We've largely discussed batch loading a CSV or massive CSVs into BigQuery, but know that there is a streaming option available through the API, where you can actually set it up, where you can ingest individual records at a time into BigQuery-managed storage, and then run queries on those as well. The streaming API is well documented, and you guys can access that if you have a streaming or near real-time data need for your application.