[MUSIC] Perhaps it'll be easier to integrate this framework in a real world example. So here's a real world example that I got as a data analyst for a group of analysts that solved research questions. The most common type of research question is, give me a cohort of patients who fit a certain criteria. So in this case we got a physician who requested, I'd like to see a list of patients who visited my clinic last month. What could be easier than that? Well, let's see how it breaks down. Let's go back to the framework. Number one, sources. Okay, I'd like to see a list of my patients. Well, what are the sources? Where are we going to find that data? So I guess you would look at the patients, if there's a table, or a patients database, or depending on your situation, a patients source. Visit, visits, or I guess encounters, or visit data, and clinics. So you're going to need at least those three. You might need a link over to providers, because this person was a provider. My clinic, so I guess unless he's some random person who happens to have a clinic, he's a provider. So we need maybe three, maybe four elements here, okay? So we have to know where to get those. Let's say we figure that out, how do they correspond? Are those elements, or are those tables of data, aligned the way we think they are? The patient table, is the table current? How do I know? Is the data, if we look in the fine print, is the data, this is of last year and he meant last month, okay, so that's a problem. Or maybe it's a subset of data. So we have to make sure number one is the inclusion. Is this a fitting source for the question? Those are all the kind of basic thoughts you need to go through. And you would think, okay, yeah, not such a big deal. But the truth is you do need to think through those things and revisit even things that might be obvious. Because you would be amazed how often there are fundamental problems with data questions because of missed assumptions like that. Furthermore, a visit. What is considered a visit? When he said, visited my clinic. Is a phone call a visit? Is a telemedicine event a visit? Is a mailing a visit? Some databases store any contact with the patient as a visit, or as an encounter. So the informatics discussion about what is a visit is critical. And when a customer or a requester comes to you as an analyst and asks you that question, this becomes very, very practical in real world. And finally, what is my clinic? Is it a department? Is it a business unit? Often, I have found that people who work in a hospital kind of have this basic understanding of what their department is. But when you actually look in the data, it's more complex. It might be a collection of floors, or a collection of units, or it might be all units starting with the word cardio, or something like that. So it's sometimes difficult and you have to be very upfront with asking those kinds of questions. So again, the framework forces you into that discipline. The from or sources step says, okay, let's do a thorough, rigorous evaluation of the sources against the question. Step number two, the filter. And the question we have is, I want to see my clinic. Well, does that lend itself to a DepartmentID? Or what other columns, or what other expressions, might I bring to bear to filter the data? We don't want all data. We don't want every visit ever. He said maybe last month, just last month. So, now, how do you evaluate last month? Is it they were admitted last month? They were discharged last month? If it was an inpatient visit, what about an outpatient? What if it was a series of visits that actually started before the month? And this is, again, where you have a simple request, I want to see patients who visited my clinic last month. But when you get into it through the framework, it becomes much more complex. Last, what is a date range for last month? Is it a rolling month, meaning that they last 30 days, or the last calendar month? What did he mean by that? So it's almost like you have to be a bit of a lawyer to be a good data analyst. Step number three, the sort by. In this case, we're just doing a sort. We're not going to aggregate, so we are skipping over that. A sort here, what order to you want it in? Does he want it, he didn't specify. Maybe he wanted it in terms of the most recent day they visited, or by zip code, maybe he's doing a mailing. Maybe he wants it by street address. So, again, forcing the articulation of what sort do you want the output in. Using the framework will help you not miss that question. Nothing is more frustrating to a data requester, a busy physician or something like that, then having to call them back 20 times to say, did you mean this, did you mean this? And by using this framework, you can articulate all these things up front and reduce that kind of round trip iteration. Finally step six, the output. What kind of columns? When he said, give me a list, well, which columns? Do you want, as I said, the square feet they live in? Do you want their tobacco usage? Do you want their age, or do you just want their name and address, or phone number, or Social Security number if you have it. What do you want? So you have to force the articulation, what exactly do you want in your output. And also, very critical, what kind of delivery format? Do you want it printed? Does he want a piece a paper? Does he want a Excel spreadsheet, or Google Drive, or some kind of other corollary. Does he want it as a web page, a comma-separated value file, so all these kinds of options. Using this framework forces you to articulate it and have the values up front, you know what you're supposed to do So, in summary, let's look at how that broke out. Defining, number one, the SOURCES. We have to have Patients, Visits, and Clinic data at least. FILTER, we have filter it on what last month means, and what My clinic means. So we need the clinic ID or the provider ID. We didn't do an aggregation in this example. A list order, what kind of order? Is it alphabetic? And finally, what kind of list columns?