So far in this course we have looked at streaming data how to build resilient streaming pipelines. We looked at how to create variable volume ingest. We looked at how to process data that could be late or unordered using data flow. And then we looked at how to do queries and data even as it's streaming in using BigQuery. And displaying that data with Data Studio. But what we haven't yet looked at is other options, as far as the sync is concerned. So BigQuery is a very good general purpose solution. Something that would work in most cases that you're worried about. But every once in a while, you will come across a situation where the latency of BigQuery is going to be problematic. In BigQuery, the data that's streaming in is available in a matter of seconds, and sometimes you will want lower latency than that. You'll want your information to be available in a matter of milliseconds for example or microseconds. You may also have run into issues where the throughput of BigQuery, which is about 100,000 records a second may not be enough, and you may want to deal with a higher throughput. And so what we will be looking at in this final chapter is how to handle such throughput and latency requirements when BigQuery's not enough, where do you go? So, we will talk about Cloud Spanner and we'll talk about Bigtable. These are going to be two of our options that we could consider. And then we'll spend a lot of time looking at Bigtable. We'll look at how to design for Bigtable, specifically how to design schemas, how to design the row key from Bigtable. We'll look at how to ingest data into Bigtable. We'll do a lab that essentially takes data flow pipeline and that is currently streaming into BigQuery and modifies it. So that it is streaming the average speeds into BigQuery, but it's streaming the current conditions which is 30 times more data the current conditions, we will stream it into Bigtable. And then finally, we'll look at some performance considerations. So, if you're trying to choose where should I store data on GCP, this is a set of questions that you could consider. So for example, first question that you might want to consider is, is the data that you're trying to store, is it structured data or is it unstructured data? So if it is structured data, then the next question that you want to answer is are transactions important to you? Or is your workload primarily read only, is your workload primarily around data analytics? If you are thinking about transactional work loads, then the next question is, do you want those transactional work loads you are querying to be via SQL? Or are you okay with the no SQL querying? So in other words, is your data relational? In which case you want to do SQL queries on your data. Or is your data not relational in which case you want to do object stores for example. So now for example, if you have structured data you need transactions and you want to be able to query it with SQL, then you have two options. The most common option would be to go ahead and use Cloud SQL. Cloud SQL is if one database is going to be enough. But if one database is not enough, if you need multiple databases, you need horizontal scalability, then Cloud Spanner is a good solution. Both Cloud Spanner and Cloud SQL will give you millisecond latency. But if your workload is not around transactions, but if your workload is around data analytics, if your workload is around analyzing data, then this whole idea of transactions and locking, etc., is just overheard that you don't want to pay. And at that point, your question now becomes, do you need updates? Do you need low latency? If you don't need either of them, if a latency of seconds is enough, then BigQuery is your most cost effective solution, so use BigQuery. It's a SQL database. It gives you reasonable latency but not very low latency. But if on the other hand you want millisecond latency, but you are primarily worried about analytics workloads, then Bigtable is a good solution. Now let's look at the second part of this. If it's unstructured data that you are essentially dealing with, then unstructured data ideally goes into Cloud storage. But if you need mobile SDKs, then fire-based storage would be a good solution. Similarly, if you have no SQL data, if you need mobile, put it on fire-based. If you don't need mobile, if it's primarily web applications, put it into data store. But what we are looking at in this course is then Cloud Spanner, Cloud SQL, Bigtable, and BigQuery. And the way you think about these things is that BigQuery is your most common. Very cost effective, latency of seconds. Cloud SQL, very common again. Relational data, transactional data, backed by either a mySQL database or PostgreSQL database. And then you have two other solutions. And those are the two solutions that we look at in this chapter. If you have transactional SQL data, and your workload is much larger than you can fit on a single database. If you need horizontal scalability in other words, go for Cloud Spanner. If you are doing data analytics but your needs are more than what BigQuery can support, go for Bigtable. So again, both Spanner and Bigtable are when BigQuery and Coud SQL are not going to be enough for your needs. So let's look at both Bigtable and Spanner, starting with Spanner. >> What is Cloud Spanner? Cloud Spanner is the first horizontally scalable, globally consistent database. It's proprietary, not open source. Consider what it means to have a relational database that's consistent but also distributed and global. Think about what might be involved in coordinating transactions on components of relational database located around the world. It seems like a very difficult problem to solve. Cloud Spanner's not for all applications. There are times you'll want to use Cloud SQL and other services. Cloud Spanner is suited for applications that require relational database support, strong consistency, transactions, and horizontal scalability. Natural use cases include financial applications and inventory applications traditionally served by relational database technology. Here's some example, mission critical use cases. Powering customer authentication and provisioning for multinational businesses. Building consistent systems for transactions and inventory management and the financial services in retail industries. Supporting high volume systems that require low latency and high throughput in the advertising and media industries. Cloud SQL is fine if you can get by with a single database. But if your needs are such that you need multiple databases, Cloud Spanner is a great choice. The graph shown illustrate this point. Cloud SQL hits a wall at around 8,000 queries per second. If you look at the 99th percentile of latency, it's clear that performance degrades beyond 5,000 queries per second. Horizontal scaling through distributed processing is complicated and difficult for most relational database systems such as MySQL. However, Spanner distributes work easily. It distributes globally, if that's needed and it provides consistent performance. To support more throughput in Spanner, you just add more nodes. Now the information shown in this graph is from a blog. So it's not an official statement of service level or reliable performance figures. It's just an illustration. If you're dealing with a borderline case you'll want to run your own test and look at published figures and current service level agreements. The Spanner architecture allows high availability and global placement. Data is replicated in multiple Cloud zones which can be within one region or across several regions. Database placement is configurable. You can choose the region to host your database. Rights are synchronous. Data is always consistent and has asset properties like any other relational database. There's a lot to be learned about Spanner. There are white papers and resources online. A few main points to consider about Spanner, it uses familiar relational semantics, so traditional database analysts will adapt to it easily. Data is sharded within the zone, providing high throughput. And it provides high availability by design, so there's no manual intervention required to deal with a zone failure.