So, in this lecture, we're going to talk about next generation sequencing applications. The introduction of next generation sequencing technology, which has made sequencing so fast and so cheap, has allowed scientists to come up with all sorts of creative new types of experiments that they can use sequencing to do. So, another way to think about it is that we can now ask scientific questions and answer them with sequencing. Questions, that we've, we've had for decades in many cases, but sequencing was simply too expensive or too slow to answer them before. Well, today through next generation sequencing applications, and through some clever new experimental methodologies, we can answer all kinds of interesting questions using sequencing alone. So let's, let's look at some of those some of those methods right now. So the basic idea is we need to create DNA, because if we're going to sequence it, we need DNA. To convert a molecule into DNA, we might start DNA and just sequence it or we might start with RNA and convert it to DNA and sequence it. And then apply second generation sequencing to measure something. So, let's look at a few applications like this. So, one very, very popular application today is exome sequencing. So, whats the exome? The exome is the collection of all the exons in your genome. So, what are exons? We've talked about that in other lectures, but let me just quickly review. Your DNA gets transcribed into RNA. And the RNA then gets chopped up into exons and introns. The introns get thrown away, the exons that remain are concatenated together, and those exons then get translated into proteins. So if you want to know what the proteins are that are being turned on in a cell that are, that are in a, in a collection of cells, you need to know what the exons are. So, in particular, in the world of genetics, when we're looking for genetic mutations, we're mostly, we're mostly interested, or usually interested first in, in mutations that affect proteins. So, those mutations should occur in exons. So, we can capture just the exons in a cell and sequence those. And you might say, well, why would we do that, we can sequence the whole genome? Well the exons only comprise about 1.5% of your genome. So you can sequence much less DNA and still get a picture of your entire exon, your entire exome. So how do we do that? We take the DNA molecule, so only some parts of the DNA like I said about a 1.5% of your genome or maybe on the order of 30 to 60 million bases will be captured as exons. And there are kits that do this. They will, they will capture this for you. So we want to take that, that protein coding exon, you want to fragment your DNA, the whole genome, whole genomic DNA from a person whose exome we're sequencing and some of that will be exons and fragments of exons and some of it won't. And we want to just capture the exons. And so the kits that have been developed are kits where you have a a bead, a magnetic bead, typically. And on that bead you'll have pieces of DNA that are only found in exons attached, and this is single stranded DNA. When you're preparing your DNA for sequencing you make it single stranded by heating it up a little bit. And then the, the DNA that belongs to exons will hybridize to the complementary DNA that's attached to those chips. And then you can pull those chips down, and then remove the active of DNA attached to them, and sequence it. And that way you just have sequenced he exons. So that's exome sequencing, you only sequence the exonic parts of DNA, and kids today will capture around the order of 50 to 60 million base pairs out of a person's genome when they're doing exome sequencing. Another technology is RNA-seq, or RNA sequencing. So, this, this involves trying to capture all the genes that are being turned on in a cell, or in a collection of cells. So as I just said, to, to produce a protein, DNA first gets transcribed into RNA, and then translated into proteins. So if we can capture the RNA, that gives us a picture of which genes are being expressed or turned on, in a particular cells, set of cells or cell type. So a very important feature of the RNA molecule, is that after transcription, the cell attaches a long string of A's to it. So we can use that, and that's sort of the basis of RNA-seq technology is that all the molecules that we're interested in have these long stretches of A's on the end. Anything that doesn't have a long stretch of A's we can ignore. So we capture that poly A tail by various techniques. Basically, we use a string of T's that we know will stick to all those A's. And we, we, attach those T's to something we can grab a hold of, and through that we capture the mature mRNA by it's poly A tail. And once we've done that, we would have to then, we can't sequence RNA. We have to turn it into DNA. So fortunately we have a very, a very useful molecular mechanism, invented by evolution that, that does reverse transcription. So, there's, there's, the number of virus that do this as their way of, of surviving. So we have reverse transcriptase, and there's a number of different reverse transcriptase molecules we can use that will take RNA and copy it back into DNA. So rather than going from DNA to RNA, you can actually go from RNA to DNA using this special molecule called reverse transcriptase, and that gives us the DNA that matches the RNA that, that is that we've just captured. Once we have DNA, we just sequence it. And from then, from that point on, it's a, it's a computational problem to figure out which cells, or which genes would turn on those cells. A very complicated computational problem, but it's important in trying to solve that problem that you realize where this data came from. A third technology that's become very popular through since Next Generation Sequencing Technology was introduced is ChIP-Seq. So ChIP-Seq is trying to address a different problem. Which is trying, which is the problem of understanding where on the DNA certain proteins might bind. Now the way that DNA controls gene expression, the way that our cells control gene expression so that cells can behave differently from one another, is that some genes are, are inhibited in certain cell types or or enhanced in certain cell types. And the way that happens is that you have transcription factors that is other genes themselves, proteins themselves, that bind to the DNA and control the expression of the genes that are near the place where the protein is binding. And we'd like to know where that's happening. Now of course we don't have today, microscopes that'll let us just look at the, at the chromosome and see where there are proteins bound to it, but we can do something indirect, again, using sequencing. We can, we can, link the proteins, basically freeze the protein right onto the DNA through a process called cross-linking. So we can take a set of cells that we're interested in, a particular cell type. We can cross-link the proteins to the DNA in those cells, so now the protein is basically stuck there. We want to know okay well where was it stuck? So what we can do is we can then take that DNA where the proteins are stuck, we can fragment it lots, millions and millions of fragments. Most of them do not have any protein stuck to them, but some of them these are sharp fragments, will have protein stuck to them. We can then grab these proteins, we've, we've, we've designed in ChIP-Seq we've designed antibodies that will, that will pull those proteins out of the mixture. So it can pull those proteins out. And when we pull them out, because they're frozen to the DNA we'll pull out these little fragments of DNA that those proteins were bound to at the same time. We can then remove the protein and sequence the DNA. And again, we've turned, now we've turned our problem into a sequencing problem. The sequences that come out are short fragments and we know because of the way we do the experiment that those fragments were protein binding sites for whatever we were, we were targeting with that antibody. Then finally let me talk about one more technique which is called bisulfate sequencing or methelsync. This is a way of determining where on the, the genome the DNA has been methylated. So methylation is this important epigenetic modification that also affects which proteins are, are being expressed in the cell. And this methylation, methylation marks, or methyl groups, can be passed on from one cell cycle to another as cells divide. So how do we figure out where the DNA was methylated? Well, one way to do this experiment is to split your DNA into two identical samples. You take two very small samples, or aliquots of DNA. And then you treat one of them in a special way, doing something called bisulfite conversion. And this biochemical process converts all of the C's that are not methylated to U's. Oh, one thing I didn't mention was that the methyl groups are always attached to C's, that's the only, that's the only new they get attached to. So this process now gives us, so now we have two identical samples, where we've converted in one sample, we've converted the, the, the DNA in a special way, so that many of the C's are now U's. And then, we sequence again, and we have to compare those, and this requires very specialized programs that can do not just the comparison but also the alignment because that, that converted DNA now doesn't really match the genome very well, the reference genome that we usually align to. So we need to use a special aligner that allows these, these U's to now match what would have been C's in the original genome. So that's methyl seek, It's a way of measuring methylation on a, on a set of cells, of, or tissues.