[MUSIC] In this video, we are going to talk about how we can relate the principles of visual perception to data visualizations, and what insights we can uncover there. Data visualization is a very special area, part of it is data-driven, which makes it included in the area of science. It must represent data accurately so that we can see the underlying trends and patterns. Because of this, selecting the right visualization for your content could be prescriptive based on what you want to show. On the other hand, the other part of it is graphics, a methodology closely used in design and animation. It is as a language because it uses diagrams to convey meaning. Data is encoded into a symbology and semiology. The syntax and conventions of these diagrams must be learned and are not inherent. This makes it have characteristics of the arts as well, so it is a combination of science and arts. With the combination of science and arts, a lot of researchers analyze whether human perceptions have an impact on visualizations and how perceptions influence the visualizations. We are introduced to the role of perceptions in visualizations and why it is important. Our brains are complex. Although there are many aspects of quantitative processing that we have yet to understand, the area of visual perception has been extensively researched. The effectiveness of data visualizations can be a largely attributed to the powerful processing mechanism of the human vision. Visual perception is "the ability to see and interpret the visual information that surrounds us." Visual perception is handled by the visual cortex located at the back of the brain, and it is extremely fast and efficient. Thinking is handled primarily by the cerebral cortex at the front of the brain. It is much slower and less efficient. Traditional data sense-making and presentation measures require conscious thinking for almost all of the work. Data visualization shifts the balance toward a greater use of visual perception taking advantage of our powerful eyes whenever possible. The understanding of perception can significantly improve both the quality and the quantity of information being displayed. Weber's law states that human judgements are relative, we judge based on relative, but not absolute, differences. The amount of perceived difference is relative to the object's magnitude. We have an example here, the checker shadow illusion published by Edward Allison, a professor at MIT in 1995. The image depicts a checkerboard with light and dark squares. When you first look at the optical illusion, you may first perceive that the area of the image labeled A appears to be a darker color than that of label B. However, they're actually the same color. You can verify it when a rectangle of the same color has been drawn connecting the two areas of the image. There are other factors affecting accuracy including alignment, common scale, and distance. Here is an example. The length of unframed, unaligned rectangles are slightly different sizes, but it's hard to compare them. Adding a frame allows us to compare the very different sizes of the unfilled rectangles between the bar and the frame tops. Similarly, aligning the rectangles also makes judgment much easier. Now, let us examine another aspect of perception that can help us in the design process. This is called preattentive processing, and it allows us to instantly recognize parts of a visualization. Let me show you what I mean. Take a look at this diagram and I want you to count how many times the digit 5 appears. How many 5s do you count, and how many seconds do you think it took you to count them? And now, do it again, but this time, I highlighted the fives in red. This is much easier, right? It just requires only a single glance, and then you know that there are a total of seven 5s there. This represents an accurate visual system detection called preattentive processing. It occurs without our consciousness at an extremely high speed, usually under 250 milliseconds. Basic attributes include differences in shape, size, color, and angle can all trigger preattentive processing. As you can see, the dot with the different shape, color, or fill pops out. The line that is tilted or included in the brackets pops out. And then the circle that is separate from the rest of the others also pops out. Designers can use the fundamentals of preattentive processing to make information easier to understand. When used correctly preattentive attributes can be extremely useful for creating a visual hierarchy of information and drawing your audience's attention quickly to where you want them to look, and therefore, follow the narrative of your story. Let's look at the following example to see how to implement preattentive processing in a graph. Imagine you're an economist and you want to analyze the profit of different sectors from the last quarter. You create a simple chart that looks like this. With different lengths of a bar chart, readers are very likely to draw their attention to the bar that is longest or shortest in the graph. For example, when your stories want to focus on a sector that does exceptionally well, you want to highlight the bar representing the sector that is making high profits. And on the contrary, you want to highlight the sector that makes lower profits when you want to draw readers' attentions to sectors that failed in their annual goals. Notice how other categories have been pushed to the background? With toned down colors, green or red stands out more effectively. The strategic use of preattentive attributes can make important information pop out and can be very powerful in framing our story. Now we want to know how to choose the most effective way to visualize data. The topics we are going to cover is the concept of marks and channels. Information visualization and academic disciplines devoted to the study of data graphics, provide a language for describing the process of graphic creation, as well as have come up with guidelines to support design choices. In her influential book, Visualization Analysis and Design, Tamara Munzner, an expert in information visualization, talked about commonly used visual marks. They are basic graphical elements such as points, lines and areas and visual channels that determine how marks appear. The use of channels is based on attribute types that you have in your data and tasks or questions that you want be able to perform and answer. It could be the position, color, shape, orientation, or size. When creating data graphics, we specify our mapping of data items to visual channels. Let's look at this data collected by the Gapminder Foundation, which presents the global health and population measures for countries in the world. Here is the table. Each row here is a country and each column is some attribute, such as life expectancy and income. Typically, rows match up with visual marks on the screen. Marks could be points, lines or areas. The columns are usually visual channels that control the position, color, and size of the marks. Some visual channels are more appropriate at showing certain data types than others. Munzner differentiates channels that are appropriate for representing categorical attributes from those appropriate for ordinal and continuous attributes. And rank these visual channels according to their effectiveness, which measures if the information conveyed by one visualization is more readily perceived than the information in the other visualization. This ranking is empirically derived based on decades of perception-based research in information visualization. Let's go a little deeper into channels. On the top of this grid here, we have categorical, ordinal, and continuous as attribute types. Then, we have dots where it makes sense to use a given channel for a given type of attribute. When I say position, there is actually the x-position and the y-position; they are independent of one another. When I talk about size difference, it could be length, area, or volume differences. Similarly, when we talk about color, in fact, you can break down color into three dimensions. One is luminance, the variation of brightness, which is like light blue or dark blue. Another one is saturation, which is the intensity of a color. It's varied about the amount of gray in the color. And then the other is hue, and that is the variation of what people typically think of as color, like red, blue, and green. Notice that X and Y positions are the most powerful channels because they can encode all three types of attributes. Shape can normally only encode categorical attributes because different shapes, whether it's a square, triangle, or star shape, are simply different. Size only makes sense for ordinal and continuous things. You could technically encode categorical data with area, but it would be a little bit misleading because there's a natural ordering between different areas. You know something is larger than something else, but if the data has no intrinsic ordering, it's often misleading to use area, that's why there's not a dot there. Color luminance and saturation are used to encode ordinal and continuous data because color brightness and intensity have some forms of natural ordering within themselves. On the contrary, hue can only encode categorical attributes because different hues are simply different. No hue is greater or lesser than another hue; that's why it maps to categorical attributes really well. So that's an overview of what kind of channels match up with what kinds of attributes. Visualizations are combinations of marks and channels. Let's go through some concrete examples on how we might map data attributes to visual channels in a graphic. Let's consider an example of a scatter plot constructed to analyze the data set about the news habits of people in 35 countries. For 14 countries, half or more adults get news online daily. When presenting data in a scatter plot, we usually mark observations using points. These points usually map at least one visual channel position on the x-axis, which is the GDP per capita, and on the y-axis, the proportion of citizens that use the Internet to get news at least once a day. The hue of the color was used to indicate the continent or the location of the countries, which is categorical. Here is a bar chart showing the casualties due to flight crashes. Bar charts use lines as the marks. In this case, we have a vertical bar chart. X position represents a categorical attribute, the years, and the length of the bar represents a continuous attribute, the number deaths. I want to include a choropleth map as one of the examples as it uses luminance to encode an ordinal attribute, which is the drought severity, from low to extremely high. Notice, these colors are two divergent hues, the brightness varies. The darker red means extreme high drought and the darker blue means low level of drought. So the marks here, you could say, are the geographical areas which could represent as position X and Y. The position here is driven by geographical position, which we could say is a categorical attribute. For proportional maps, points are the visual marks. They are placed according to the county position. The area of the points is proportional to the amount of votes each county's leading candidate was ahead. They are two different hues: Red is for John McCain and blue is for Barack Obama. We have completed our tour of data types, encoding channels, and graphical marks. We will further explore the space of encodings, map types, and mark parameters in the reading materials. Making powerful data stories is about much more than creating pretty charts and graphs. It is about implementing elements and creating patterns that match to the data type. Understanding some basic principles of visual perception can help us to do that.