In this lesson, we will discuss the SummarizedExperiment data container. This is a more modern or new version of the expression set. We will use an example data package to illustrate it. The example data package is called airway. And we need to load the data into r and we can print it. In many ways this looks superficially somewhat similar to an expression set, where it has a dimension, 65,000 rows and 8 columns. So in this case it means 8 samples and 65,000 genes. What is that we have on the other dimension. There's a lot of things that are very similar expression set, but has a slightly different syntax and output. So what we are used to knowing as P-data from the expression set we access using call data in the summarized experiment. And it returns not a data frame but a capital data frame. The new type of data frame that was introduced in bio conductor. As before we can see that there is some sample identifiers, have some information about specific details of the experiment. We can get a column like for expressing self using the dollar operator. I can say airway$cell and get back that specific column for the thing. For expressing self, this is mighty useful. We can look at experiment data, details about how the experiment was made, and in this case, it's pretty empty. We can try to see is anything in here? There is some information here. We can see that, we can see which table the data comes from. We can see a puppet ID, and so on and so forth. We no longer have sample names. We only have column names, which are really the names for the different samples. And again, we don't have feature names. We have row names. We can see that the feature names looks like our sample gene identifiers. So how do we actually get the expression data. On this case, that's not obvious, but this is RNA sequence data. How do we get the expression measurements? Well, in summarized experiment, we used the assay accessor. In order to use the assay accessor, we need to know what kind of assay it is. First we can look at the printout of the object itself, and you can see that there is one assay and it is called counts. You know the case to list all the assays by riding assay names. Airway, which gives us back just counts. So how do we get it? We say assay, airway, and we give it the name of the essay we're trying to extract. So this is counts. So this is going to be a big matrix. I'm going to subset it, so we just get the first four genes in the first four samples. And here it is. This is on RNA scene count data. The new thing in summarized experiment that you didn't have an expression set was that, each row or each feature, has an associated g range or g ranges list with it. So we access that using row ranges, and let me first get the length of this, row ranges. The 64,000 different features we have. We have one range for each thing. So in this case here, the real range is not a simple G-range, it's a G-ranges list. The idea being here that each row is a gene. And we get a g range list and each g range in the g ranges list gives us the exon's of the gene. So here we can see that the very first gene, the gene that is being mentioned in the first row. Has 17 exons and is located on chromosome x. Here, we have the coordinates for the xsomes. This is very useful because often in next generation sequencing we have missing things over genomic intervals. Here we can keep the genomic intervals together with the experimental data. So for example, we can look at how many exons do we have per gene, or how many exons do we have in total. We use elementLengths to give us the length for each GRange. And we can sum them up. Oops. That was not rowData, I meant rowRanges. Okay, we have a little tabbing mistake. So we can see that we have 64,000 G's and we have almost 750,000 exons. Now, there are some of the standard G ranges functions you can just access directly onto the airways data set. For example, if you want to get the start coordinates of all the exons. We can do like this. This gives us a list with a stop date of each exon in each element of the list. In a similar way, we can use subsets by overlaps which is quite useful. So let's say we have a GRange that gives us some an interval on chromosome one, and it shouldn't be chromosome one because it uses a different name. So ranges start one and [INAUDIBLE]. Okay, so this is a standard GRanges, and now I can say let me just get the genes inside an airway that overlaps a specific genomic interval. And its subset by overlaps. So there's 329 genes inside this 10 megabase genomic interval, and here we have the data for it. So this summarizes a summarized experiment data class. So we access, then, using row and call. We have the g ranges we can access. And, we have the assay function for getting the expression measures or whatever is measured in this summarized experiment.