[MUSIC] Hi, in this module I'm going to introduce aggregated data. I'll talk about some of the research question that we can investigate using aggregated data. And I'll talk about some of the challenges associated with making use of aggregated data. In the module that follows, I'll actually introduce you to some of the sources of aggregated data that you can access yourself if you want to start carrying out analysis on your own. Aggregated data is used heavily in macro-level quantitative studies that look at the relationship of indices with each other. These indices might be measures of the average level of education in a particular population or the life expectancy, or the per capita economic growth. One way or the other, these are units of analysis well above the level of the individual. So in the studies, the unit of analysis, these macro-level studies, the unit of analysis is the state, the society, the community or some other aggregation of people. Then the studies look at how the aggregate level characteristics of these units, whether they're countries, or provinces, or states, or counties. How these aggregate level characteristics are associated with each other or in some cases actually influence each other. We'll give some examples in just a moment. Now most of these studies rely heavily on published statistics. It's hard to come up with aggregated statistics on your own especially if you are talking about entire countries or entire regions. Here's a simple example of a relationship between aggregated indices that is actually quite important and it makes use of the sorts of aggregated data that we're talking about. So we're looking at life expectancy as a function of per capita income for countries around the world in 2015 and we can see here that there is a relationship. In fact according to the equation, every $10000 increase in per capita GDP is associated with roughly a three year increase in life expectancy. Now this is a very simple example but actually, back in the 1950's, such straight forward analysis were actually quit important in documenting the relationship between economic development and health. At the time, back in the 1950's, that was an era were it only had just become possible to collect these international sorts of data to do this sort of analysis. Of course, we've moved on in the intervening decades and we make use of aggregate indices at a number of different ways. And I'm going to talk a little bit about some of the major research questions that still make use of aggregate indices at the national or regional level. One of the really big ones is understanding the social and political factors that influence economic development and growth at the national or regional level. We know that before the 20th century and certainly during the 20th century different regions of the world and different countries within the same regions had very different trajectories in terms of their economic growth. Some countries gravidly in prospered. Some not so much, there's a lot of interest and try to understand what factors were that differentiate countries. Was it their cultural context, their history? Was a specific policies? The studies that try new understand the differences in this directories I'll make use of aggregate indices, national level indicators of policies of various kinds of outcomes and so forth. It's still a vibrant field with a lot of work going on. Another area that has become increasingly important in recent decades. Is understanding the political, social, and economic causes and consequences of inequality. Especially in the last few decades, there's been renewed attention to the fact that there is inequality and that it seems to be growing. And we're trying to understand why at a national level Inequality seems to be increasing and increasing more in some places when in others. So there's a lot of interest in understanding whether it's political organization, or social and cultural factors, or economic factors that contribute to rising inequality. And conversely there's a lot attention to understanding the consequences of inequality. There's been a lot of recent research on whether or not inequality at the national level actually worsens health. There's also a lot of interest in whether inequality promotes growth, or reduces economic growth. Some people argue that inequality is, unappealing as it is, has some benefits in that it leads to faster economic growth in the long run that lifts all boats. Other people argue that inequality may actually reduce economic growth by changing the distribution of resources within the economy and changing consumption and investment behaviors. Another area of interest is understanding the long-term impact of investments in health Infrastructure, education. We know that over the 20th century, especially during the middle of the twentieth century up to the present, countries put a lot of money into health, improving health, expanding education and building infrastructure. While it's common sense that these should all have some kind of payoff and should promote economic growth over the long run. There's a lot of interest in trying to measure the impact of these investments in a more precise fashion to help decide which kinds of investments are actually the most effective. Which have the most bang for the buck in terms of promoting economic growth. And finally in the policy arena, there's a lot of interest in measuring the effects of specific national policies regarding trade, innovation, sustainability, the environment, and so forth. So around the world, countries are experimenting with different policies related to all of these areas and as other countries make choices about whether or not to adopt these policies to change their trade policies to introduce new environmental policies. They want look at experience of other countries and see whether new policies in each of these domains have had the desire to facts or have had other consequences. All of this questions, required data that is a aggregate level that is measurements of the characteristic of entire country's so regions If we're going to make use of aggregated data, we have to keep in mind that there are certain challenges associated with using it that are unique to aggregated data as opposed to micro-level data. One is that we have to worry a lot about whether or not the data can be trusted. There's a lot of variation from country to country around the world in terms of the capacity of governments to generate the statistical data that we want to use for analysis. Macro-level analysis of aggregated data. Some wealthy countries have large bureaucracies with many trained employees that spend a lot of time compiling data, cleaning it before it's reported. Other countries are not so well equipped so we have to be mindful of these sorts of problems when we're looking at aggregated data. A related issue is that even when countries have the capacity to compile high quality data, they may have incentives or desire to adjusted to conceal your problems or to exaggerate achievements. We need to keep those sorts of problems in mind as well and be skeptical whenever we're making use of aggregated data. Related issue is that it's not always as easy as one might think to compare aggregate level indices across different settings. Some things that might seem fairly straightforward, like infant mortality, may be defined differently in very subtle ways across different countries, which affect the statistics that are compiled. For example, certainly in the past, some countries had different definitions of what a live birth was. And in fact the regarded many live births or what would be considered live births in other countries as still births. Whereas other countries had a very broad definition of what a live birth was. And so in some countries still births but in other countries those still births might be reported as a live birth followed by a death thereby leading to a increase in the reported infant mortality rate. Divorce rates are notoriously difficult to compare across settings because of great differences between countries. In the legal and cultural context of divorce. Results of the danger of the ecological fallacy, as long as you're making use of aggregated data to look at relationships at the macro-level and you really want to just look at associations. Say at the country level you're okay, but be careful about trying to generalize from associations at the national or regional level in aggregated data down to relationships at the individual level. So just because some relationship is apparent when you compare countries. For example, a relationship between income and health, it might be a different relationship when we look at individuals within the same country. Another issue when we're making use of aggregated data is that Establishing causality is, in many ways, much more difficult than it is when we're working with micro-level data. Establishing cause and effect is difficult even when we're working with micro-level data. And when we move to aggregate level data, then the problems are multiplied. So even when we make use of some advanced techniques we will talk about in the next module, for analyzing aggregated data, we often cannot completely establish cause and effect beyond reasonable doubt. Finally, there are subtle problems that experts are aware of. One is that the choice of beginning and end dates for a aggregate level analysis may influence the results that, there have been some controversies where studies that reported results based on data from a certain range of years, later pointed out to possibly yield different results if a different range of years was made use of. So these are all things that we need to keep in mind if we're making use of aggregated data. Now in the next module, I'm going to show you where to obtain aggregated data that you can use, right away for an analysis that you might want to carry out.