Hello there. My name is Omar Ismail, a Solutions Developer at Google Cloud. In this module, we'll learn about the logs panel at the bottom of the Job Graph and Job Metrics pages, as well as the Centralized Error Reporting page. Let's get started with logging. I've created and canceled the streaming pipeline that I designed not to work so we can see some error logs. I set the sync to a bucket I do not have access to. How can I get logging info about this job? Let's start by expanding the logs panel at the bottom of the page. The first tab is the Job Logs tab. These are messages that are from the Dataflow service. We can filter to show a minimum log level. For example, if I want to see logs with the severity of error above, I would select Error. If I'm looking for a specific message, I can type it in the filter text-box. The next tab I can click is the Worker Logs tab. Worker logs are from the VMs running the worker. Just like in the Job Logs tab, we can filter by log level and by message. If I want to see logs from a specific step or any of its sub-steps, I click on the sub-step or step in the Jobs Graph page. Here, I have selected the right charted bundles to temp file subset that is part of the text io.right step. If I want to return to looking at the general log view, I click on the white-space outside of the step. This returns us to the Worker Logs tab. We now move on to the Diagnostic tab. This tab shows the frequency of each error across time in your entire project, as well as when it was first seen and last seen. Clicking on the error takes you to the Error Reporting page, which provides more detailed information. We'll talk about the Error Reporting page in the next module. Not only does a Diagnostic tab shows errors coming from user exceptions, it also provides job insights for you. During the pipeline's life, Dataflow analyses the logs that are part of your job and highlights important ones in the Diagnostics tab. A few of these are listed on this slide. If the jar file provided to the worker were missing required classes, you will get a worker jar file misconfiguration error message. The Diagnostic tab also shows if the worker VM had to kill a process or shut down due to the JVM crashing. If your code is running a step that is taking a long time to perform operations, you might see a lengthy operation message in the Diagnostic tab. If the slow processing is due to a hotkey, then the Diagnostic tab will show you that a hotkey was detected. In streaming scenarios, your pipeline might fail to process if you are grouping a huge amount of data without using a combined transform, or are producing a large amount of data from a single input element. If this happens, the Diagnostic tab will tell you that the commit key request exceeds the size limit. Finally, if there was a high rate of log messages from the job and some of them were not sent to cloud logging, throttling logger worker will appear in your Diagnostic tab. In this example, I have a batch job that failed. Upon checking the Diagnostic tab, I can see that the JVM crashed due to memory pressure. Without sifting through the logs, I can find the cause of failure just by looking at the Diagnostic tab. If your pipeline reads from or loads data into BigQuery, there is one more tab that can be viewed. This is the BigQuery Jobs tab. It can be used for troubleshooting and monitoring BigQuery jobs that are part of your pipeline. This tab will appear if you are using Beam 2.24 and larger and have the BigQuery admin role. Your beam code can either read an entire BigQuery table or issue a query to read parts of a table. When the former is used, BigQuery exports the table as a JSON file to GCS using an extract job. When the latter is used, BigQuery exports the selected rows as JSON files to GCS using a query job. If either of the two read methods are used, they will appear in the BigQuery Jobs tab. The BigQuery IO supports two methods of inserting data into BigQuery, load jobs, and streaming inserts. By default, the BigQuery IO uses load jobs when you sync bounded PCollections and streaming inserts when you sync unbounded PCollections. Only load jobs will appear in the BigQuery Jobs tab. Let's look at a batch job that read and wrote data using BigQuery. The pipeline I ran read from a BigQuery table with stats on tornadoes, computed the number of tornadoes in each month, and outputted the results to a different BigQuery table. The pipeline read from BigQuery using an extract job and wrote to BigQuery using a load job. Let's view them in the BigQuery Jobs tab. First, select the location to pull BQ jobs from. BigQuery jobs run in the same location as the data set they read from or write to. Let's retrieve the jobs. Depending on how many BigQuery jobs the pipeline run, it may take a few minutes to retrieve the job list. As my job was quick, the jobs are retrieved almost immediately. As we can see, the pipeline ran to BigQuery jobs, the extract job read the BigQuery table and exported the results to GCS, and the load job wrote to BigQuery via GCS as well. If I want more detailed information about each job, I can press on the Command Line button and a pop-up window will appear showing a gcloud command to run. Let us run it in cloud shell and view the results. Some of the statistics available are the destination URI, the table we are reading from, and the length of time the job took to run. We can also see how many bytes were read, the timeline of the job, and if it ran in a BigQuery reservation.