Hi. In this module I'm going to talk about multi-generational microdata. Multi-generational microdata are administrative and archival data that follow families and their communities over multiple generations. Sometimes decades, sometimes centuries. These allow for studies that relate individual outcomes to characteristics of distant relatives, ancestors and lineages. Most contemporary sources, surveys that were used to, even the population databases that are constructed in contemporary Scandinavian and some other places gave very short time depths. They only offer studies of associations across the life span. That is individual life histories, birth to death. Or in some cases associations between parents and children. That is how parents characteristics affect the outcomes of children. Now some multi-generational databases that, again, follow families for many generations, sometimes centuries, are publicly available. They're an exciting opportunity to conduct advance studies looking at families over many generations. To help clarify what's special about these databases, let's start by thinking about what we might have in a typical survey data set. So the simplest sort of survey will typically ask people about themselves. And so we get information about the respondent. Some surveys were asked respondents about their spouse, so we get information not only about respondents but theirselves, spouse. And then they may also in some cases ask about offspring, so we get two generations. In a few cases, surveys will ask people about their parents. And so we got three generations the parents, the respondent, and then their children. Now very few surveys, some but not many actually collect information from respondents about their grandparents. And if they do collect information about grandparents, it's typically only the father's parents or the mother's parents, but rarely both. And this is where the multi-generational databases that I'm talking about are really unique, because they actually allow for individuals to be linked, not only to their parents but to their grandparents. Sometimes only their patrilineal parents but in some cases both sides. And as well as linking to their children. We're multi-generational databases really begin to stand out is with information about siblings. So it's very rare for surveys to collect information about a respondents siblings. Let alone a respondent's sibling's children. However, multi-generational databases through record linkage allow for collection of information about an individual's siblings and their sibling's children. As well, multi-generational databases through record linkage may allow for the collection of information about, not only an individual spouse, but their spouse's parents and their grandparents. Now, not every multi-generational database will have this information, but at least some of them do have this information. Certain among the generational databases will actually include information as well about a spouse's siblings and a spouses siblings children. Now not all multi-generational databases will allow for this sort of linkage to a spouses family, but at least some of them do have this possibility. And then the best of the, and most remarkable of the multi-generational databases, will actually allow for individuals to be located within the context of their larger family network. So locating them and linking people to cousins, uncles, and various distant kin and then allowing us to look at how people's outcomes are shaped by their networks of kin around them. There's a lot of research questions that we can address using multi-generation databases. A basic one is do individual outcomes depend on the characteristics of distant kin and if so, what are the mechanisms? Are people's outcomes in life, their education, their income, their lifespan, their marriage, are they related to the characteristics not just of their parents as we might look at with a survey, but to the characteristics of their grandparents, their siblings, their cousins. Do specific events have consequences for multiple generations? So does something that happened to a particular person at a particular point in time, perhaps somebody becoming especially wealthy or especially poor have effects on their descendants many generations later. Are outcomes associated across large networks of distant kin? So if you look at kin networks, clans, lineages, do they have perhaps persistently higher or lower death rates, or birth rate, or chances of marriage, or other characteristics? One thing that we are especially interested in and people have been using multi-generational databases to study, is whether there are families that are persistently successful or unsuccessful over multiple generations. So are there families that one generation after another are especially good at say, attaining high education for their members or perhaps persistently less successful? I'd like to provide some examples of these sorts of multi-generational databases that are either publicly available or maybe available by application. One that we'll talk about in another lecture. One that I've been involved with for many years is the China Multi-generational Panel Database, CMGPD. Which is constructed from annual or triennial population registers for rural populations in Northeast China. These have detailed records of demographic behavior, household context over a 150 year period for farmers in northeast China in the 18th and 19th century. The TRA database which provides a link to archival records from France of individuals who's surname begin with TRA. There's a specific reason that they focused on this to make it easy to locate related people in different archives. And this has been used in studies of wealth inequality, as well as other studies of economic performance of individuals over generations. Utah Population Database, which started as family genealogies constructed by members of the Church of Latter Day Saints and turned into databases, and then linked to administrative records. These provide detailed records of demographic and health outcomes, and increasingly are being linked to other sources. These have been used extensively not only in studies of population and family, but in genetic studies of health. The Panel Study of Income Dynamics actually started as a longitudinal survey, but it's actually been running long enough. And following the descendants of its original respondents, that it's now in its third generation, with extensive records of family behavior and economic outcomes. Just to give you some sense of the generational linkage, the multi-generational aspect of some of these data sets, I provide an example from the CMGPD or China Multi-generational Panel Database. Here we have for men born in different time periods the percentages who can be linked to their fathers, their grandfathers, their great-grandfathers, and so forth. All men, or almost all men, can be linked to their fathers because that's straightforward. What's important though is that especially when we get to say 1900, 60% of the men born in that period can be linked to their great, great, great, great, grandfathers. So, we can actually not only link them back to their distant ancestors but then by additional linkage connect them to their first, second, third and even more distant cousins. And then study how the characteristics of these distant relatives and distant ancestors influence their outcomes. Here's an overview of some of the databases that are presently available, sometimes publicly or more likely by application. So the Utah Population Database, UPDB that I just mentioned, has millions of individual over seven generations with extensive detail and it's actually being supplemented all the time of new records. The China Multi-generational Panel Database have several hundred thousands individuals in the 18th and 19th centuries and even the early 20th centuries. The SEDD, the Swedish Economic and Demographic Database follows 150,000 people over four generations from the 17th century up to the present. POPLINK, another Swedish database, from northern Sweden, covers 350,000 people. TRA, the archives from France that I talked about, records 70,000 people whose surnames began with TRA, over seven generations. The PRDH and the BALSAC, these are French databases that record hundreds of thousands of people, between 9 and 12 generations in Quebec. The PSID that I mentioned now has 4 generations. And also I mentioned the Wisconsin Longitudinal Survey, which a longitudinal survey which started in Wisconsin in the 1950's and which now has been following it's correspondents and their descendants into the third generation. Here's where you can go for more information about some of these databases. I won't go into them in detail because I just Introduced them, but each of the databases that I mentioned there are websites available or you can go for much more information including color books, description of the contents and so forth. So here are the UPDB, CMGPD, SEDD and POPLINK and here are the links to the TRA, the PRDH, and BALSAC, that is the Quebec databases. The PSID and the WLS, the Wisconsin database. So overall, we're entering a new era in social science research with the availability of massive multi-generational databases that follow families over many, many generations. Allowing for the study of brand new topics in social science over the very long term.