Hi, welcome back. I'd now like to show you some real-world examples where you can formulate a data question with some readily available free online tools for biomedical informatics or clinical informatics. First example I'd like to show you is a software called i2b2. i2b2 is a product that was developed from the i2b2 trans-smart foundation that's founded out of Harvard. To simplify it, it need rows for researchers to be able to share clinical data sets that were properly anonymized. That means, there's no hippo exposure, but also the data had to have enough quality and quantity to have meaningful results. The i2b2 software platform was developed to navigate that edge in between anonymized data, but yet quality, clinical data and make it readily available for sharing between different research institutions. The goal of this is that the data should be immediately translatable. It's translatable, transformational research that should translate into immediate clinical purposes. So, from their website, the i2b2.org website, they explained the i2b2 software packages that modular open-source software platform for feasibility queries. Not necessarily the end research query, but a feasibility, how many patients or how many situations exist under a certain clinical guideline and analysis of clinical transformational genomics data allowing the blending of this different data to identify different patient data sets. The data in an i2b2 installation will vary from institution to institution. However, they typically will follow EHR, Electronic Health Record data of diagnosis, labs, procedures, medications, research study, and some genomics data, as well. All this is integrated in a very easy to use interface, where there's no patient identity exposure, but yet people can drill down and find information about patients who meet various trends. The i2b2 architecture is what they call a series of hives or other people call modules where people can install various aspects of the platform and it's all open source. So, it's shareable, it's very easy to work with as far as understanding what the code does and it's available on the database level on several different platforms, Oracle, Microsoft SQL Server, and Postgres among other attempts, I think they've done some other platforms as well has been attempted on. Regardless, even though it has this module architecture, the core architecture from one of our earlier discussions is still that of a client-server framework, where the database server receives requests from these various different application modules. The one that is most visual is the web interface and I'm now going to show you, I'm speaking over a screen demonstration where I'm using that web interface to formulate a data query. Last, we had mentioned this in a broad description. There's two main types of queries, where you take a large data set and get narrow results or a narrow set and blend it to get wider results. i2b2 is a definite example of taking a large data set and pulling out of it a few narrow results. I hope you enjoy the demonstration. We will now demonstrate how a data query is formulated using the i2b2 demo web client interface. When you go to this URL, https://www.i2b2.orgwebclient, which should be provided to you in the accompanying material, you will come to a page that looks like this. It should automatically populate the username, password, and host as a Harvard demo. Go ahead and click the login button. This is the main i2b2 interface page. On your left, you will see the navigate terms tree-view display. In the navigate terms tree-view display, the top level node of the tree can really be thought of as our step number one in formulating a data query, the sources. What are the types of sources we want to use to create our query? At this level, the most prominent are diagnosis, demographics, laboratory tests, medications procedures. These are the standard EHR focused type of categories or sources that we can bring to bear to form a query. Next, if we open up one of these top nodes by pressing the plus sign next to them, you will see a variety of child nodes. These really represent more of our step number two, a filter. We can filter the broader category of diagnoses by including or not including various pieces of this tree. For example, we can hone in on people with diabetes by going to the endocrine disorders folder, highlighting other endocrine gland diseases and selecting diabetes mellitus. If you click on that and grab it with a left mouse and drag it to the inclusion criteria, this will now be included in the filter of patients with the criteria of diabetes mellitus and their diagnostic codes. Next, if we hit Run Query, that will present us with some options. The options, the default one is just a number of patients. Now, that as some other examples here, are examples of an aggregate where we actually do not display the raw data but we calculate a computed number. In this case, a sum of the number of patients with that criteria. In addition, this display allows you to get more detail by also displaying the patient set. A related encounter set and giving you various analytics of gender breakdown, vital status breakdown, race, age, length of stay. I'll come back to this timeline in a second. Top 20 medications and top 20 diagnosis and inpatient and outpatient. Let's just select them all and run a query. As the query runs, you will see this status update and results are listed in a plain text manner here. So, first is the number of patients. We got 11 patients. Gender patient breakdown, vital status, race, age, top 20 medications, length of stay. A slightly more readable example is in the Query Report tab and I'm going to hit this expand icon to make it a little more readable. This gives you the sum results of what we've requested. We requested a filter of diabetes mellitus. All patients in this demo database, who have a diagnostic criteria of anything with diabetes mellitus. The aggregate results, where we process the results, we gave a total count. We gave a breakdown here of gender and a graph, vital status, they're all living in this database. By race, by age or age bracket and by medications. What are the unique medications the associated patients are on and length of stay in a particular facility. Also, we have a co-morbidity or other diagnoses associated with this diagnosis of diabetes. Next, there is a listing of inpatient versus outpatient encounters or visits with the associated patient population. Again, this is a demo dataset. So, there's an even distribution of inpatient and outpatient visits. But in the real world, you may not see that exact perfect match. There might be more inpatient data or outpatient data based on your data source. Last, there is a breakdown or an analytic grouping of the number of patients who had various length of stay between their admit and discharge dates. So, this has been pretty straightforward. But now, let's see what we can do if we want to look for multiple filters or relationships between a particular filter. If we look at the top 20 diagnosis breakdown, we'll see that a large, relatively large, group of patients has something called Huchard's disease. This is a case of hypertension that may be associated with their diabetes or not associated with their diabetes. How would we look at the relationship between people with general diabetes and who also have this disease, using the I2B2 interface. So, the first most direct way we can link these two diagnoses, is by simply including both in a filter criteria. So, let's go back and go to endocrine disorders, other endocrine gland diseases. This was our original query. We're going to drag diabetes mellitus, to criteria, a group number one. But what we want to do, is make sure that we want to find the group of patients that have both these criteria, not either or. So, I'm now going to go look for Huchard's disease. I can type in the word Huchard here and do a search and it's found here. Then I can drag it directly from the "Find" interface. I'm going to drag that to group number two, and this query again, we're modifying the filter and saying we want patients who have both diabetes and Huchard's disease. Let's go ahead and run that, and we'll pick all the detailed results that we like. We'll hit "Okay" and run that. Now, let's take a look at the results. I'm going to go ahead and expand this results area again so we can view it all in one area. Let's take a look at the Query Report, and now notice how it wrote our filter criteria. We want people with diabetes and with Huchard's disease. Instead of 11, we now have the 9 which by the way ties out with the original diagnostic criteria we saw in the first query. Here, you'll see differences in the aggregate results. When we look at this subset of patients. So, we've applied a filter and a sub filter. There is another way that the I2B2 interface allows you to see the actual interrelationship between two different filter expressions. We're starting off again with our diabetes mellitus and Huchard's disease dual filter expression. But this time when we run the query, let's pick what we didn't pick before, the timeline and hit "Okay". Now, a timeline is being drawn. It will show the interrelationship between the two different diagnostic facts. So, in this patient list, which has, of course, been anonymized or de-identified. You can actually see when the diagnosis of Huchard's disease versus that of diabetes occurred for each patient, and you can see a trend. So, for example, on many patients, they are co-related but sometimes they are not. Sometimes you'll get a diabetes of one without the other and sometimes you get both. This allows you that visualization of that pattern.