The other page available on the data flow UI is the job metrics tab. This shows us time series data for our job, this page varies between batch and streaming. Let's look at the job metrics for the big query to tensor flow records batch pipeline we ran earlier. The first graph shows the number of workers that ran across the lifetime of the job with auto scaling enabled. At certain points during the job, we can see that the data flow service decided that more workers were needed to increase the job throughput. The green line shows how many workers are needed and the blue line shows the current number of workers. There will be a small time gap between the two as each new worker needs time to spin up and for work to be assigned to it. The second graph shows the throughput for each sub step versus time. Recall that your beam steps are made up of sub steps and here we see the throughput of each one. In the job graph tab, we discussed how batch pipelines do not run all the steps concurrently, we can see that on this graph here. The first hump shows the records being read and the second one shows the records being partitioned and safe to google cloud storage. The third graph shows each CPU utilization percentage, in our job run we see that all workers reach near 100% CPU utilization. Healthy pipeline should have all the workers running around the same CPU utilization rate. If you see that a couple of your workers are running at 100% and the rest of the workers have low utilization, your pipeline is likely unhealthy and suffering from an uneven distribution of workload. Some beam operations like group Vicky cannot be split across workers. Each worker will be assigned a range of keys to group, if your data is heavily skewed, one worker could end up doing all the work while the others do nothing. On the CPU utilization graph, we see this as a couple of workers having a high CPU utilization while the others have low CPU utilization. The last graph in batch pipelines is the worker airlock count. As the name suggests, this shows the number of log entries from the workers that had a level of care. In batch jobs, if processing an element fails four times in a row, the whole batch pipeline fails. Let us now look at a streaming pipelines job metrics page, this is for a pipeline Iran that reads from pub sub and sinks to be query. Just like batch pipelines, there are graphs for auto scaling through CPU utilization and worker era La County. In addition to these graphs, there are a few graphs for streaming jobs. Let us start with the first two, the data freshness and system laden see graphs. These graphs are great to measure the health of a streaming pipeline. The data freshness graph shows the difference between real time and the output watermark. The output watermark is a timestamp where any time step prior to the watermark is nearly guaranteed to have been processed. For example, if the current time is 9:26 a.m and the data freshness graphs value at that time is six minutes. That means that all elements with a time step of 9:20 a.m were earlier have arrived and have been processed by the pipeline. The system laden see graph shows how long it takes elements to go through the pipeline. If the pipeline is blocked at any stage, the latency will increase. For example, imagine our pipeline reads from pub sub, does some beam transformation on the elements, then sinks them into spanner Suddenly spanner goes down for five minutes. When this happens, pops up won't receive confirmation from data flow that an element has been sunk into spanner. This confirmation is needed for pumps up to delete that element as there is no confirmation. The system latency and data freshness crafts will both rise to five minutes. Once the Spanner service comes back all the elements will be written into spanner and data flow will confirm that with pops up returning the system latency and data freshness graphs to normal. In addition to the data freshness and system latency graphs, streaming jobs can also have an input and output metrics at the bottom of the metrics page. Input metrics and output metrics are displayed if you're streaming data flow job has read or written records using pops up. In my case I only had pops up as an input so I can only see input metrics. If I have more than one pops up source or sink, I can view the metrics of any one of them by clicking on the drop down and choosing the pump sub source or sink I want. In my case I only have one pops up source and that is my subscription name data flow front. The first graph we talk about is the requests per second, graf request per seconds Is the rate of api request to read or write data by the source or sink over time. If this rate drops to zero or decreases significantly for an extended period relative to your expected behavior, then the pipeline might be blocked performing certain operations or there is no data to read. If this happens, you should review steps that have a high system watermark to see where the blockage is happening. Also examine the worker logs for errors or indications that slow processing is occurring. The second graph is the response errors per seconds by air type graph, response airs per second by type air, is the rate of failed api requests to read or write data by the source or sink over time. If errors occur frequently and repeatedly see what they are and cross reference them to the specific air code documentation on pup sub area codes. For all pipelines, you can restrict the timeline for the graphs and logs using the time selector tool. Right now, I have a job that has been running for a few hours, how do I focus on a specific time interval? This is where the time selector tool comes in and I'll show you how to use it. Open the time selector tool by pressing on the button showing the current time range selected. This will open a drop down menu, you can select a time range for the charts and logs ranging from hours to the maximum lifetime of the pipeline. You can even choose a custom time range by setting the start and end time you want to view. Let's click the max time for the pipeline to see how the graphs change across the pipelines entire time and press apply to see the change. Keep an eye on the data, freshness and system laden see graphs, at the beginning of IRAN, the pipeline had a lot of data to read. If I bring the cursor near the peak of the data freshness graph, we can see the pipeline was approximately 16 hours behind wall time when it started. This is because I first sent data to a pub subs subscription for 16 hours before starting the pipeline. If I want to zoom into a specific time period from the graph, I press on the start point I am interested in and drag and hold to the end of the time period I am interested in. Once I release the pointer, all the graphs will be zoomed into the time range, highlighted. If I want to exit the zoomed view, I press on the reset zoom button at the top.