Person: A key concept to understand when working with explores and join logic is symmetric aggregation. This is a great feature of Looker that ensures accurate aggregation of data for counts, sums and averages. Let's explore in detail how symmetric aggregation works in Looker. Symmetric aggregation addresses a common problem in data and analytics called the fan-out problem. Imagine that you have two tables, customers and orders. The customers table stores customer data, such as customer ID, first name, last name, and the number of visits they made to your e-commerce platform. The orders table stores order data, such as order ID, the amount of the order, and the ID of the customer who placed it. The relationship between the tables is one to many, because a customer can make multiple orders, but each order can only be made by a single customer. When considering these tables individually, the count and the sum operations produce accurate results. There are three total customers, four orders, eight total visits, and $250 in total order spending. To join these two tables in an explore it's pretty clear that the customer ID should be the joint key, however given the one to many relationship, you actually produce what is called a fan-out when you join them on customer ID. This table illustrates the result of the one to many join, which fans out the customer stable. It introduces a duplicate row for customer ID one because Amelia Earhart made more than one order. When performing aggregations on the customer's side of the table, you would now get inaccurate results. For example, sum(visits) amounts to 10. Looking at this table you are capable of realizing that one of these total visits is a duplicate, and that the true answer is eight. Likewise, a count(*) of customers yields four rows, which you can identify should actually be three. On the order side of the table, the aggregations return accurate, as no fan-out occurs on the many side of the join. The fan-out problem is a long time database problem that is not specific to Looker. Looker has simply implemented symmetric aggregation to resolve this problem, and luckily it is not a feature that you, as a LookML developer need to be conscious of activating in your Looker instance. As long as symmetric aggregation is supported by your underlying database, Looker will automatically implement symmetric aggregation when the following two conditions are met. First, every view file on your model must have a primary key defined to join it to an explore. As a best practice, you should strive to define a primary key in every view regardless of whether it will be joined to an explore. Second, the relationship parameter needs to be specified accurately for every explore joined. An easy way to test if a primary key is functioning as a unique key is to compare the count of the field that you believe to be a primary key against the count on the table. If the two counts return the same number you have your primary key. If not, you can create a new dimension that concatenates all the columns necessary to make each row unique. Remember, you cannot assign the primary_key:yes parameter to more than one dimension in the view. To help you define accurate joint relationships, Looker provides four possible values. First, one-to-one, used when there is an exact one-to-one join between to tables. Second, one-to-many, used when one record in the left table joins to multiple records in the left table. Third, many-to-one, used when multiple records in the left table join to one record in the right table, and last, many-to-many. Used when multiple records in the left table join to multiple records in the right table. To better understand how joins work in Looker, let's examine the concept of left and right tables in more detail. The left side of the relationship value refers to the view that appears first in the explore, meaning in an earlier line of code. The right side is the view being joined in later, meaning further down in the code file, and it's the name next to the join parameter. In this example, the left side of the relationship identified as one refers to customers while the right side, or the many side, refers to orders. Note that the order in which the fields appear in the SQL_on parameter does not impact the join. To figure out which relationship occurs in your data, you can construct a sentence for each possibility and think about whether it makes sense such as one customer can relate to many orders. Another option is to think about the join keys. Is customer ID the primary key? Are the values unique in the first view? What about the other view? Is it possible that the same value could appear multiple times there? So what would happen if you specified the wrong relationship between customers and orders? Say one-to-one instead of one-to-many. Well in this case, the data on the customer's side would fan out. Remember you only had three customers, Amelia Earhart, Charles Lindbergh and Wilbur Wright. Their combined visits totaled eight, however, by using a one-to-one relationship instead of one-to-many, the explore returns four customers with a total of 10 visits because Amelia Earhart is counted twice along with her two visits. If you specified the correct relationship between customers and orders, which is one-to-many, then Looker will automatically apply symmetric aggregation, resulting in the correct output of three customers and eight total visits. So how does count actually work in symmetric aggregation? You might have already been wondering about count distinct. Since this query executes a one-to-many join, Looker indeed realizes that it needs to count the distinct customer ID values rather than a blanket count, which would include duplicate customer ID values from the order side. Sums and averages are a bit more complex, but function similarly in that distinct records are identified for calculations using the MD5 hash function. While it can be alarming to click on the SQL button in the explore and behold a query like the one seen in this example, you can look for instances of MD5 to know that Looker is doing symmetric aggregation for the summer average. As symmetric aggregation is already implemented by Looker, you do not need to fully learn when and why to use MD5 to be a successful LookML developer. What Looker is basically doing is generating a unique numeric amount for each primary key value using the MD5 hash function. So the two rows for Amelia Earhart would both get assigned a big unique number, then to calculate the true total visits, Looker uses basic arithmetic to determine whether or not it should count a given number of visits again. Something like SUM(DISTINCT (visits + big unique number)) - SUM(DISTINCT (big unique number))). You might wonder, why would I do a regular sum distinct without all the hashes? Well, the sum distinct function only looks at the distinct values to sum them up. In this case, it would take only one of the instances of the two visits, add it to the four to give you a total of six visits. However, you know you still need two instances of the two visits, both Amelia's and Charles' number visits, to give you eight as the accurate total of visits. In summary, symmetric aggregation is a great feature of Looker that addresses potential fan-out problems resulting from data joins. As long as symmetric aggregation is supported by your underlying database, Looker will automatically implement symmetric aggregation when the following two conditions are met. First, every view has a primary key to join it to an explore, and second, the relationship parameter for every explore join is accurate. With symmetric aggregation, Looker ensures that your counts, sums and averages are always accurate for your business users.