In this video, we will look at disaster recovery methods with Dataflow. This methods only apply to streaming pipelines. Data is your most-prized asset, which is why it is essential to have a disaster recovery strategy in place for your production systems. One way is to take snapshots of your data source. This capability is supported in many popular relational databases and data warehouses. But what if you are using a messaging application? Google Cloud Pub/Sub offers this capability. You can implement the disaster recovery strategies with two features: Pub/Sub snapshots, which allows you to capture the message acknowledgment state of a subscription, and Pub/Sub seek, which allows you to alter the acknowledgment state of messages invoked. If you are using this strategy, you will have to reprocess messages in the event of a pipeline failure. This means you will have to consider how to reconcile this in your data sync and duplicate some records that have been written twice. Let's go over what we need to do to use Pub/Sub snapshot to support our disaster recovery requirements. First, you should take snapshots of the Pub/Sub subscription. To do this, you can use the command line interface, CLI in short, or the Cloud console. After your Pub/Sub snapshot has been created, you can stop and drain your Dataflow pipeline. You can do this using the command line interface or in the Job Details page on Dataflow UI. Once your pipeline has stopped processing messages, you can use Pub/Sub seek functionality to revert the acknowledgment of messages in your subscription. Again, you can achieve this using the command line tool. Finally, you are ready to resubmit your pipeline. You can launch your pipeline using any of the ways that you use to deploy your Dataflow job, either directly from the environment or by using the command line tool to launch your template. The example on this line shows a simple command for a templated job that has been launched with a command line interface. An important caveat to consider is that Pub/Sub messages have a maximum data retention of seven days. This means that after seven days, a Pub/Sub snapshot no longer has any use for your stream processing. If you choose to use Pub/Sub snapshots for your disaster recovery, we recommend that you take snapshots weekly, at a minimum, to ensure that you do not lose any data in the event of a pipeline failure. Using Pub/Sub snapshots in conjunction with seek is a good starting point. But when you are using Pub/Sub and Dataflow for your streaming analytics, there are important things to consider. When you use Pub/Sub seek to restart your data pipeline from a Pub/Sub snapshot, messages will be reprocessed. This creates a few challenges. First, you might have some duplicate records in your sink. The amount of duplication depends on how many messages were processed between the time when the snapshot was taken and the time the project line was terminated. In addition to that, data that has been read by your pipeline, but yet to be processed and written to sink, will need to be processed over again. Remember that Dataflow acknowledges a message from Pub/Sub when it has read the message, not when their record has being written to the sink. This presents a challenge for pipelines with complex transformation logic. For example, if your pipeline is processing millions of messages per second and goes to multiple processing steps, having to reprocess the data represents a significant amount of loss compute. Lastly, if your pipeline has implemented exactly-once processing, windowing logic will be interrupted when you drain and restart your pipeline. Since you have to lose the buffer state when you drain your pipeline, you must conduct a tedious reconciliation exercise if exactly-once processing is a requirement for your use case. Luckily, Dataflow also has an abstract capabilities. If you recall, we introduced Dataflow snapshots as a useful tool for testing and rolling back updates to streaming pipelines in our testing and CICD module. Dataflow snapshots can also be used over disaster recovery scenarios. Since Dataflow snapshots save streaming pipeline state, we can restart the pipeline without reprocessing in-flight data. This saves you money whenever you have to restart your pipeline. Moreover, you can restore your pipeline much faster than using the Pub/Sub snapshots and seek strategy. This ensures that you have minimal downtime. Dataflow snapshots can be created with a corresponding Pub/Sub source snapshot. This helps you coordinate the snapshot of your pipeline with your source. In other words, you can pick up your processing where you left off when you restarted the pipeline. This saves you the hassle of having to manage Pub/Sub snapshots. Let's take a look at how we can use Dataflow snapshots for disaster recovery scenarios. Our first step involves creating the snapshot of Dataflow pipeline. We can do this directly in the UI with the Create Snapshot button in the menu bar. You will be prompted to create a snapshot with or without sources. If your pipeline is using Pub/Sub, we recommend that you select the with sources option. You can also create a snapshot using the command line interface. Next, we need to stop and drain your Dataflow pipeline. This is also possible in both the UI and using the command line interface. Lastly, we create a new job from the snapshot. This is accomplished by passing in the snapshot IDE into a parameter when you deploy your job from your deployment environment. Since Dataflow snapshots, like it's Pub/Sub counterpart, has a maximum retention of seven days, we recommend scheduling a coordinated Dataflow and Pub/Sub snapshot at least once a week. This means that if your pipeline goes down, you have a point in time, in the past seven days, from which you can restart processing to ensure you can almost always avoid any data loss scenario. You can use Cloud Composer or Cloud Scheduler to schedule this weekly snapshot. Snapshots are located in the region of the origin job. When you create that job from a snapshot, you must launch the job in the same region. This is useful for zonal outages. If a zone goes down, you can relaunch the job from a snapshot in a different zone in the same region. This protects your workloads against zonal outages. However, Dataflow snapshots cannot help in migrating a different region in the event of a regional outage. The best action to take in that event is to wait for the region to come back online or to relaunch the job in a new region without the snapshot. If you've taken a snapshot though, you can ensure that your data is not lost.