This whole unit is about asking factual questions or quasi-facts, some of them at least will be such, and we'll introduce you to that concept and start with a couple of examples. So factual questions can be questions about behaviors: use of online shopping, cigarette smoking, doctor visits, exercises, food intake, any of those activities. Some of them are sensitive. We'll talk about how to ask sensitive questions. Some of them are hard to remember. We'll talk about recall problems and errors of that nature. And some of them are just facts and you would think they should be easy to ask and even more easy to answer. But, some of these facts are actually quasi-facts that are defined as something that is commonly seen as concrete and objective but has a large subjective component. So it shares features of a factual question and an attitudinal question. This is our question to you, would you say this is a fact or a quasi-fact? Just click for each of the items listed here what you think this is. Now, the first three are permanent, unchangeable facts. Your date of birth, you might not know it but it is a fact. So is your native language or your country of birth. Other facts are much more fluctuating. Age for example. You know if you're a certain age it gets really hard to remember how old you actually are, right? It's easy to remember your date of birth, but how old you are, changing all the time. I mean time goes by quickly, very hard to answer for certain respondents. Marital status obviously fluctuating, depending on the country at least. Many allow divorces and such. So, this is a fact but it might change. Income is fluctuating and for that nature already harder to answer but it also might be a sensitive question and as I said we'll have a segment on those, actually several segments. And then there are true quasi-facts. True, I mean what we call quasi-facts and those would be questions on race and ethnicity. Why is that? Well, just think about this for a moment. What attributes define race? Let me give you a little bit of history on the race question. One goal was, or is still, to monitor and expose oppression and its result. The question of course is, is a survey question the right tool to measure that. And how this is measured and whether it's measured at all varies across countries and across time. The US Census has actually not used the same definition in more than two censuses. Nowadays race and ethnicity questions are collected through self-reports and self-identification, but prior to it was collected based on enumerator observation. You might be taken aback by this and say, "Well wait a second, how can they know that?" But, you could argue that if you want to measure what people experience, it might not be a bad idea to actually measure what other people see there. So there's some argument to be made for that. Great Britain did not collect data on ethnic groups until 1991. There are a lot of other countries that don't do that either and in the US, given its history, there has always been a big debate on how to best measure this. An authority on this debate is the Office of Management and Budget that oversees all of the Federal Statistical agencies' data collection and they have a policy directive, or they had a policy directive in 1978 that stipulated federal agencies, that they were to collect and present data on at least four racial groups: American Indian or Alaskan Natives, Asian or Pacific Islander, Black, and White. Back then, no reporting of multiple races was possible. It was preferred that people self-identify and nobody should tell them how they classify themselves. This is a little unreadable and you'll see a link on the website to these old census forms, but basically what you see here, it says, "If Indian, print the name of the - now I cant even really read it - enrolled or principal tribe", and that can be listed here. And then, if Asian or Pacific Islander, you know, specifications can be made here and here. And then the respondent was asked to check off which of these categories applied to them, for some of them to give additional information. Now it was a bit confusing of a form, you know, here's the actual question, here's some explanation of what should be done. But you see, the categories here and interestingly, well, you know, that might not be any longer in the common language use. There was no possibility to mention multiple races on this form. That was changed in the OMB directive from 1997. More in the spirit that the social definition of race should be recognized and that doesn't conform to any biological, anthropological, or genetic criteria. So here, you know, the directive suggests to include the following race groups as listed here but it also allowed for multiple reporting of race. In its 2000 and 2010 census form you see the race question as follows. Now we have the categories White, Black, African-American, or Negro - the old term was kept because there are still older African-Americans for whom this term is actually not offensive but the way they would self-describe. There's a big debate every census on how this should be done, interesting to follow up on this on the web entries we have to this one. Again, "Print name of enrolled or principal tribe" as a further specification, the different categories, and so on. But, while in spirit this still looks like this very old census form with a better layout, you see a big change: "Mark (X) one or more races." So, here now people can self-identify themselves as having multiple races, which with you know, mixed parents and the like makes a lot of sense. This is the ethnic question used in the census form. Here there's a separate note that both question 7 and question 8 should be answered if the person is Hispanic or Latino and what's the person's first race. So this is a further specification that is asked here. Now there are some challenges to measuring race and ethnicity. One has to do with the validity of the concept, with the reliability of the concept - we can have changes in self-perception over time. We have challenges with the response selection. You know, are these categories really mutually exclusive? Are they meaningful for all respondents? And how can you compare this over time? Keep in mind these questions might be asked in context. So, early on we talked about Grice's Maxims. So, the context creates a conversational norm and this is true also for the race and ethnicity question. That would determine, for example, the flow of the Hispanic vs. non-Hispanic question and additionally raise questions because in the order that is issued now, context effects can be mitigated. Context in general provides an interpretive framework and we have seen in many other examples that these can resolve ambiguities. The work from Fred Conrad and Michael Schober is a good example here. And it can of course prime the respondent to relevant items and while, you know, you might first think of only attitudinal questions being affected by context, this is true also for factual questions. Now, sometimes, just like with the old way of measuring race, you would say, "Well, can't we just guess, for example, what the respondent's gender is? Why do we need to ask the respondent's gender?" - another factual question. In CATI surveys, computer aided telephone survey interviews, gender is often guessed. Interviewers are asked to listen and only ask if unclear or not even encouraged to ask. These interviewer guesses are then used for a variety of purposes, sometimes to screen for eligibility, sometimes to filter for, you know, gender related items, and in rare cases also for non-response adjustment or post stratifcation. Actually not that rare, if there isn't an additional demographic section in the survey where the respondent self-reports. So here's an example from the Health Information National Trends Study (HINTS). If not obvious, interviewers guess respondents' gender and then the instruction to the interviewer ask, if not obvious, "Are you male or female?" Now, we did a little research study with two research questions. A) We wanted to know how good are interviewers at guessing respondents' gender and B) we wanted to know if there are any predictors of wrong guesses, if they might appear. Why would we think they might appear? Because linguists find that pitch tones allow listeners to discriminate between man and woman voices. But there's overlap in these pitch tones and so the question is, how good are we really in teasing apart males and females and do these male and female classifications through the voice really match what the respondent self-reports. Here to you could argue that maybe gender is more quasi-fact than a factual question. In order to conduct the study we used 28 public opinion phone surveys conducted between the years 2008. When I say we, I mean Susan Kenny McCollough and myself. She was a student in the Joint Program in Survey Methodology and had worked at Marist and was able to analyze or re-analyze these data that Marist had collected. In total, there were over 25,000 respondent data that we could use for this study, all collected in a centralized facility in Poughkeepsie, New York. And these surveys had been national surveys, some just in the state of New York, and many of them are landline samples. Here in this database there are no cell phone data. Here you see a description of the respondent demographics, so you know, reasonably as you would expect, and a description of the interviewer demographics that conducted the interviews in the phone survey, a total of 475. All these interviewers were trained in Marist College, students between the ages of 18 and 23. So now you know how many interviewers there are and how many respondents, the distribution, and the like. Each of these Marist interviewers was asked to guess the respondents' gender and then later on the respondents were asked. My question to you now is how big do you think or how small is a misclassification error? How many of these guesses might be wrong? Well, eight percent. Overall, eight percent of the gender was guessed incorrectly. We have a different range for measurement error across the gender groups, which is interesting. We have among the female respondents 12.6 percent that were guessed to be male and among the male respondents 2.6 percent they were guessed to be female. That is because the vocal chords of the females have a wider range and so it's much easier to misclassify the females than the males. Interestingly this misclassification happens much more so among blacks than among other racial groups. So you can imagine in particular among black females they will be misclassified the most. We also look at predictors of wrong guesses in some multivariate models, hierarchical linear probability models. The paper that related to this research is on the course website. Here you see a list of the dependent variables, our error, the guess between interviewer and respondent report, and then a bunch of interviewer variables predicting the error probability. Listed for you to look at and for, you know, us to discuss only one single thing which is that, interestingly enough, interviewer experience, so experience level three and four, higher levels of experience, seemed to increase the error in the interviewer guesses and we have since then confirmed this with other survey houses. So here can be an adverse effect of interviewer experience, maybe paying less attention or having heard too many of these voices, we don't know for sure. But it is an interesting line of research and we put a couple of manuscripts out there on the course website if you want to see for yourself what these effects are in other studies. To sum this up, we didn't see any main effects for interviewer gender and race as predictors for what is a wrong guess, but if we include in action effects we saw significant effects that females are more likely to be miscoded by female interviewers than male interviewers, African Americans were more likely to be miscoded than non-blacks when being interviewed by a non-black interviewer, and African American interviewer respondent pairs have a higher probability of being miscoded than white/white pairs.