Hello. In this slide, I'll talk about the rest of the common data types specifically procedures, surveys and utilization data. So, the key takeaway is that throughout these videos I'm talking about the traditional and non-traditional data types. In this video, we will talk about the rest of the common data types. Here are the list, we have already talked about demographics, diagnosis and medications in the previous video. In this video, we will try to give you more details on procedural codes, different surveys and utilization data that could be found in health databases. Most of the procedures are found in administrative databases, but also, you can find them in typical clinical databases such as EHRs. Certain procedures have a higher cost, some have a lower costs and that might be very important for certain outcome research topics. There are a lot of derived variables that you might calculate based on procedures such as, whether somebody had a missing diagnosis or you know a certain severity of a disease that it's not encoded in their diagnostics, but based on the procedure you might interfere with that. There are a number of available and commonly used coding standards. Here in the US we do use a modified version of ICD called the Clinical codes or ICD-9-CM codes for some of the procedures that occur in a hospital setting, but also, we have the current procedural terms or CPT that are used for outpatient settings. There are also other coding standards that I don't want to go through them. So, the typical data sources as I said, insurance claims have a multiple types of procedures like hospital facilities and professional procedures and EHRs might also have it because when a physician orders something like a lab test imaging or other things, then it will also show up and they EHR side. In terms of data quality, they usually have an acceptable data quality because of the reimbursements that are attached to it. However, the quality differs across data sources. We also have the typical interoperability issues when you want to translate these data types across different coding standards or trying to merge them across different databases. There might be some legal considerations, but not as much as the other common datatypes. Here is a screenshot of a table in a database that holds procedural data. You can see each row has a patient ID, their date of service or for that procedure and then a code. In this case, these are CPT codes that are maintained and that they are copyrighted to the American Medical Association or AMA. Then you can see even the descriptions. So, that's how you can find out what happened to the patient, what type of treatment intervention was applied and also sometimes in the databases you can see a column called the order of or the sequence of these procedures. That is shown by arrow number one. So, the next common data type surveys and there are a variety of them out there. There are not necessarily stored all of them in an EHR or in a specific database depending on what was the different purpose of that survey depending on the research questions that the survey have tried to answer, they might be in different databases. Some of these would be very helpful to find the general trend of a population, for example, there are surveys on risk factors and behavioral issues that might increase utilization in a given sub population. There might be thousands of different variables that you can derive from it depending on that survey of course. There is no coding standards per se although, the coding standards such as lowing and snow MED are starting to code a lot of these so called surveys. But, a lot of them are just custom made and some of them are standardized questionnaires that are commonly used out there, but again, coding standards are not that well developed in this area. Some of the standard questionnaires are the health risk assessment questionnaire, the patient health questionnaire or PHQ which has multiple versions two, seven, nine and so on. Drug and alcohol use questionnaire, suicide risk questionnaire and like GAD generalized anxiety disorder questionnaire. Again, there might be thousands of different surveys or questionnaires collected in a health setting and sometimes stored in an EHR or in other databases as needed. Data sources could be anything, again, depending on which group run what surveyed, you can find it in EHR or other data sources. Data quality is highly dependent on how they applied the survey. So, there are a lot of biases with surveys such as sampling, selection biases, non response biases, social desirability bias and validity and reliability of the actual instrument or the questionnaire. So, you have to be very careful how you use survey data. I don't know about any data interoperability issues because nobody has even started to make them interoperable. Depending on the questions, some of the surveys may include HIPAA protected health information, but some may not include such information. Here's an example of a survey. This is called the PHQ-9. As you can see there are nine questions. For each of the questions, the patient may answer, may choose one of these answers and then this will go into a database and structured as a spreadsheet or a relational database. Here it's just a bunch of surveys. You can see the list. I've tried just to give you some of the very common surveys here in the US such as the, personal disease history, family history, health screening and immunization, surveys, alcohol consumption service, injury prevention surveys and so on. You can see at the end of the list we have surveys on stress, tobacco, weight and women's health as well. Common data types that we will talk in this video is utilizations. There are multiple outcomes that you can measure as utilization. If it's patient level you can look at mortality and morbidity, but also, on a health system level, you can look at the cost, emergency room admissions, hospitalization and readmission. All of these variables are commonly used for health services research outcome. Now, in terms of the background, there are different patterns of utilization, different sub populations and usually researchers like to find those patterns. There are a long list of derived variables that you can make out of utilization rates like 30 day readmission, like 60 day readmission, the different ways to characterize those utilization variables. They are not really coding standards except from what the insurance companies require in terms of reporting these utilization events or outcomes. Now, data sources are usually insurance claims because it shows exactly what happened to the patient across all of the different health care providers. But sometimes people also use EHRs to mine this utilization data, although EHRs have this issue with so-called data leakage where you know a lot of patient events may not happen in your health settings, so you don't have it in your EHR. So, EHRs are not the best source, but it's at least one source. There are usually acceptable data quality for these events, again, because they're reimbursable sometimes or sometimes some health system might be penalized because of these utilization rates. No data interoperability issues as far as I know and in terms of legal consideration, again, certain utilization events like you know an admission to a psychiatric ward might be protected by hip hop. It's good to know how this happened. So, at least here in the US, when you submit a claim for reimbursement the insurance company might consider some things as ineligible charges, so they will not even consider those. Then they say item A is not considered and we are not paying for it. Item B, it's covered. We are going to pay, but then, they look up their negotiated ranges. We can sometimes is lower than what the provider wants and they say based on our contract, it should be lower. So, they will come down to something called allowed charges and that's what they want to pay but then in the US market the member or the patient also needs to pay something out of pocket which is sort of the coinsurance that they need to pay or the co pay and the deductible. At the end of the day the insurance pays the net incurred claim. So, if you look in a database like this one, you can see all of these costs listed as different columns. You can see in box number one, for example. Again, each row is one patient and one event. You can see the cost was sort of the cost that was reported by the provider was $153, but then their insurance only allowed a $145 ish for that cost. Patient number two you can see the initial claim was $22. The insurance said the contract rate is $20. By the way the patient needs to pay some deductible, in this case, it's 19.41 cents. So, the insurance only paid 39 cents of this claim. So, today we talked about three common data types of procedures, surveys and utilization. Thank you.