Hello everyone and welcome back. In this lesson, I changed the focus from the conceptual aspects of performing risk stratification and back to specific details about healthcare data. In particular, I want to provide you with information on the CMS 2008-2010 Data Entrepreneurs' Synthetic Public Use File. The acronym for this long name is simply the DE-SynPUF. Once I provide information about these data, you will have the opportunity to look at the data and then work on some assignments which encourage you to study and evaluate the data. After this lesson, you will be able to apply some analytical concepts, such as groupers, to large samples of medicare data. You will be able to then use the data dictionaries and codebooks to demonstrate why understanding the source and purpose of the data is so critical. Let's get started. In the module about healthcare datatypes, I briefly reviewed attributes about claims and encounter data. In review, remember that both datatypes are created so that providers can claim reimbursement for services rendered or document managed care encounters that occurred under some type of per member per month capitated rate. These data vary by the type of the healthcare payer. For example, medicare claims are different than commercial health insurance claims. Regardless of this variation, all the payers usually keep track of various domains such as eligibility, beneficiary demographics, services that occurred in inpatient, outpatient or the pharmacy. Let me talk a little bit about the CMS 2008-2010 Data Entrepreneurs' Synthetic Public Use File or the DE-SynPUF. These are neat datasets that I've used in my courses to give students experience with healthcare data. Although these data are synthetic, they were created from real medicare claims data and hence retain a lot of the features of the original data. Moreover, the data samples are very large, thus there is an opportunity to query and explore big data to some extent. I went to the DE-SynPUF website which you should all explore on your own. But let me quickly read some of the key phrases that CMS uses to describe the data. Here I quote, "The DE-SynPUF was created with the goal of providing a realistic set of claims data in the public domain while providing the very highest degree protection to the medicare beneficiaries protected health information." The purposes of the DE-SynPUF are to: one, allow data entrepreneurs to develop and create software that may eventually be applied to the actual CMS claims data. Two, train researchers on the use and complexity of conducting analyses using CMS claims data prior to providing access to the actual CMS data, and three, support safe data mining innovations that may reveal unanticipated knowledge gains while preserving patient privacy. The files of the designs are programs and procedures created on the DE-SynPUF will function on the CMS limited datasets. The data structure of the medicare DE-SynPUF is very similar to the CMS limited datasets, but with a small number of variables. The DE-SynPUF also provides a rough set of metadata on the CMS claims data that have not been previously available in the public domain. The DE-SynPUF has a limited influential research value to draw conclusions about medicare beneficiaries due to the synthetic processes used to create the file. However, the medicare DE-SynPUF does increase access to a realistic medicare claims data file in a timely and less expensive manner to spur analytic innovation. The DE-SynPUF contains five types of data. Beneficiary summary, inpatient claims, outpatient claims, carrier claims, and prescription drug events. Thanks to CMS for their DE-SynPUF description. That ends my reading of their summary. To help students gain experience with healthcare data, I created data tables from the DE-SynPUF. I exported a sample of the data and transformed it to be in a format closer to what you'd see in an electronic health record system. I put the data into new tables and I renamed some of the fields. As an example, the fields in the chronic table come from the fields in the CMS files that have SP underscore. As a rule, I removed SP underscore from the field names. All the tables that I created will be made available for you to work with. As you work on the course exercises, it will be helpful to look at the original CMS DE-SynPUF data dictionary in codebook. You can compare this to how I transform the data and I started to summarize the data in my own version. Let me describe the data so that you have more context about how you can analyze the data. The main field to link the tables is PAT_ID. There is a unique value for each unique patient. Thus you can use this to link tables based on patients. Another important field for some of the tables is PAT Encount ID or PAT_ENC_ID. This is a unique identifier that defines a specific patient encounter or visit to the health system. For some analyses, you may want to join data based on patients. In other analyses, you may want to use the encounter ID. Now, I want to briefly mention the tables that I provided to you. Patient. This table has one unique row per patient. The table includes demographic and Medicare related fields. Chronic. This table provides information about chronic diseases for the patient's. ORDER_MED, has medications that were ordered for the patients. PAT encounter or PAT_ENC, includes all of the patient encounters. PAT-ENC_HSP. Is a special table that is a subset of the PAT_ENC table. This table only includes hospital encounters. PatEncDX documents all diagnoses defined by ICD-9 codes that occurred for specific patient encounters. ICD-9 DX, provides tables for diagnoses as defined by ICD-9 diagnosis codes that occurred for specific patient encounter. PatEncPR, documents all the procedures as defined by ICD-9 codes that occurred for specific patient encounters. ICDPR provides labels for the procedures as defined by ICD-9 procedure codes that occur for specific patient encounters. RxNorm. This includes NDC codes and associated labels and information about specific drugs. The NPPES_SAMPLE. This is a database of active National Provider Identifiers or NPIs. Healthcare providers who want to bill for services for medicare patients must obtain an NPI. Pat_ID means the patient ID and it has unique values and is used as a key for joining tables. Death date is the death date of the patient. If null, there's no evidence that the patient has died. Sex code is gender where one equals male and two equals female. Race code is race where one is White, two is Black, three is other and four Hispanic. ESRN_IND is end-stage renal disease indicator where zero does not have and one equals ESRD is present. I will let you study the documentation and look at the data on your own. Of course, feel free to point out errors and problems you have when working to find meaning in these data. This is part of the exercise. This summary of medicare claim status completes. In the next lesson, I will provide some final tips about interpreting healthcare data. See you soon.