[SOUND] So, in order to do good, effective information visualization, there's a nice information visualization mantra that's been set up to remind us of the steps. And we'll walk through that mantra and each of the steps in the next three parts. So, when visualizing large amounts of abstract information, Ben Schneiderman has some really good advice on how to do that effectively, that he's formulated into a mantra. And this mantra goes like this, you overview first, then you zoom and filter, and then provide details only on demand. And so overview first. You provide a big picture of the data. You plot all of the data, but don't show any details about the data, giving you some big picture, be able to understand the data set as a whole. And then once you can see the whole data set, you can start to be curious and focus on a particular area of the data and so we zoom and filter, in order to focus on a particular portion of the data, subset of the data. And then, once we've found an appropriate subset to look at, we can provide details only on demand. So, details are the individual details of a single data item. But only when they're requested, otherwise plotting that many details about many items can easily be visually overwhelming. And so this mantra has been shown to work really well on a lot examples of large amounts of abstract information. So here's a example of Ben Shneiderman's mantra being used on the World Bank indicators, to display, for example, life expectancy and population for every country in the world. We start with an overview. We've got a plot of every country in the world plotted over population and life expectancy. I've zoomed in on the appropriate areas and ranges of the data, so the population goes from a minimum to a maximum. The life expectancy goes from its minimum and maximum, and we can zoom in further as we need to. I filtered out data, for example, you can filter based on continent and I've used that filter to then color the data based on continent, and that reveals other information for example, Africa in general has some low life expectancies and Asia, in green, has some high-population countries. And then you've got details on demand, for example, here is one country in Asia that has low life expectancy, and that's Afghanistan. And so, by proving those details on demand, they don't obscure, overwhelm you visually. They're the result of interactivity just as a result of a query, and so this is a good example of how you can use Ben Sheniderman's mantra for the visualizing abstract information from large data sets. So the mantra begins with an overview, to provide an overview of the data first, and so this overview's often a scatterplot of the entire dataset. In this case, I've shown an example. It's a scatterplot of all the countries of the world, which is a large amount of information, and I've plotted it by population and life expectancy. And so this is every single country. I've omitted the labels of each country, the name of each country, because that would be a bit overwhelming. So, we just have a data point for each country. And by using these axes we get a high-level view of all the countries. We can kind of get our head around this data set. And we can already start to see some interesting issues that we could be curious about. For example, some of the countries have low life expectancy. There doesn't seem to be a correlation between population and life expectancy. And the challenge for this broad overview, this ability to spread out your data and to see all of it at once, is to be able to find axes that do that. So in this case, I wanted to look at life expectancy, but then I've chosen population. And this choice of populations is a bit of an arbitrary choice, but it's one that spreads out the data well. So for life expectancy, you'll notice that we don't go from 0 to 85. We go from 40 to 85 because we've chosen tight minimum-maximum bounds to better spread out the data in this overview. And for population, I haven't chosen a linear scale for population, I've chosen a logarithmic scale for population that basically displays orders of magnitude, starting from a billion going down to 100,000 and less. And so, this is a logarithmic scale, because that better spreads out the data in this case. So, here's an example of plotting an overview of your data that allows you to see all of your data in one big picture. This is Tableau, and we're looking at the World Bank indicators, and we want to look at, for example, Life Expectancy. So if I drag life expectancy over here as a field, I get an average life expectancy over all countries. Now, if I want to see what the life expectancy is by country, I can just drag the country region into the columns, and now I get a individual life expectancy average over the years that were measured, by country. But this is quite a lot of countries, and so you don't get a big picture. This is observable but not displayable. I can observe all of the data, but I can't display all the data in a single picture and still see all of the countries. So we'd like to provide a broad overview of the life expectancy data over all of the countries. And in this case, we don't need to see every country's name, so instead of doing that, we can just drag the country region over here into the marks. And now I've got an individual mark for every country projected onto a single line, which doesn't spread the data out. So, we need a way of spreading it out. So we can pick, for example, population, as a way of spreading out the data horizontally. And so now, I've plotted the data by life expectancy vertically, and by population horizontally. And there are a few countries, for example India and China, that have around a billion people each, over a billion people each, and that's skewing the scale of the rest of these countries. So in order to spread this data out, we don't want to use a linear scale, we want to use a logarithmic scale. So, I can edit this axis and choose a logarithmic axis. And now when I plot logarithmically, the data gets spread out more evenly across population because I'm displaying population by order of magnitude, factors of ten, instead of linearly. But now the data is clumped up in this upper-right quadrant, so in order to see the data more clearly, I'll want to spread this out across the entire display. So life expectancy, instead of going from 0 years to 85 years, I'll want to change that to about 40 years, so I can set the. Minimum life expectancy to some lower bound value of say, 40 years. And now I can use my display resolution to display more of the items and to separate more of the items by that, by setting tighter bounds. And again by population, if I set the population, it looks like it goes from about 10,000 up to 1 billion. So, we'll set the lower end of population to 10,000, instead of zero. And now that spreads out the data, and I've got this nice overview of every single country in my data set that's spread out. It's plotting life expectancy, and it's using population, actually the logarithm of population has a way of spreading the data out so I can get a nice, big overview picture of the data, so I can further investigate it. [MUSIC]