In this section we're going to talk about Next Generation Sequencing. We're just going to cover the, the basics, but it's important that you have an idea of how sequencing works so you'll understand the data that you're looking at. So next generation sequencing is a term we use to describe the very latest sequencing technology which has been around now since around 2007 and we'll probably come up with a better name for it, overtime. But for now we call it NGS or Next Generation Sequencing. So sequencing has undergone a number of different generations. Back in the, or, in the 70s, 80s and 90s we used a technology, it was called Sanger Sequencing because it was invented by a scientist named Fred Sanger. And those, that sequencing was the first very manual, very painstaking and slow. And in the 1980s a number of, a number of companies put together automated DNA sequencers and let people use that technology to sequence things much faster and more efficiently than had ever been possible before. And that's what we call Sanger Sequencing, and that's really the first generation of sequencing. As time went on, in the 1990's there was a technology called DNA Microarrays that was invented, which wasn't exactly sequencing. That involved attaching DNA to a, to a tiny slide and, and then measuring other genomes by, by letting their DNA or RNA stick to that DNA. But second generation sequencing or Next Generation Sequencing that I'm going to describe, came along in around 2007. And in fact this now already been in some ways, or some, it's been superseded by a new generation of sequencing that, that looks at single molecules. But I'm not going to talk about that. That's still very new technology, and not, not at all mature. Most new of the DNA being, the vast majority of DNA being generated around the world today comes from second generation sequencing. So let's just talk about how that works. So first for you to understand that all, that all these DNA sequencing technologies rely on taking a DNA template, the DNA that you want to sequence and copy. So use DNA at the D, the mole, the molecular machinery that, that cells use for copying is a key tool in this process. So DNA, it's copied by a molecule called DNA Polymerase which takes free nucleotides, those are As, Cs, Gs and Ts, that are just floating around in your cell or you can synthesize them so that they can be floating around in your test tube. And it uses them to copy the DNA that you're trying to sequence. And of course when we copy, we use this rule of DNA that gGs always bind to Cs and As always bind to Ts. So, you start with single strand DNA, and then with, you add lots of As, Gs, Cs, and Ts, and some DNA preliminaries and you can synthesize the, the complimentary strand. Now if you could just watch this happening, then you wouldn't need to do what we're calling sequencing. Somehow you get to observe this and measure your DNA while all this is going on. So how do you do that? The way Next Gen sequencing does, does it, is we take our, our template DNA, we chop it up into small pieces. Typically a few hundred bases long, maybe a little longer than that maybe as much as 1,000 bases long and we attach those, we chemically attach those to a slide. Now, on the slide, there'll be millions or tens of millions of these, of these fragments of DNA. But nevermind, we'll just have a, we'll just show you an example where there's two pieces of DNA that are just two random pieces that are attached to the slide. Then on the slide itself we use polymerase chain reaction or PCR to make many identical copies of those pieces. So then you get these little clusters that you're seeing here where you have a cluster of fragments that might in fact be a few million copies but it's a relatively small number, a few million, copies of these very short fragments DNA. And they're all single stranded and they're all stuck to these spots on a slide. So now you still have them sequencing, you just made copies of these fragments. You still don't have any idea what's on the slide. You need to read it off somehow. So how do we do that? So we're going to use again, the, the, the property that DNA when it's copying. Single strand of DNA will always use the complementary base. So we can mix in we can add to the slide some nucleotides, As, Cs, Gs, and Ts. That are labeled with a, with in a special ways so that we hit them with the light with a laser, they'll, they'll, fluoresce in four different colors. So we get, if we hit, say the Ts here, with the light, they'll, they'll fluoresce in red, and we'll know that, that we've looked at a T. So what we want to do is we want to add these to the slide, and have them hi, have them hybridized or base pair with the single stranded templates that we, that we have on the slide. So a very important property of Next Generation Sequencing is that these nucleotides that we're adding are specially modified in, in two different ways. One thing that, that happens is that they fluoresce. So each of them has, each of the four nucleotides is a different color that fluoresce it. But another very important property and this was a very clever invention is that they have a terminator modification. That is once you've added the DNA polymerase which is, which is attaching them to these templates. Once it's added one of these molecules, it can't add another one. It can only add one because they're chemically modified to have a terminator. However, it's a reversible modification. So after we take our picture, we can remove the modification and let the polymerase go on and add the next base. So what these next gen sequencers do is, in parallel, at millions of spots on the slide, we add a single base to that, to that spot. And every spot because it has a different template may have a different base added. Then, we just take a picture. And the way we take a picture is we shine a light while we're imaging the slide and that shows us at every spot what color is what color is the base that's, that's just been added to that template. So what happens is we go through these sort of cycles of sequencing. In each cycle, we will, we will sequence another base. We'll add one more base to the gro, to the, to the template strand that we're to the new strand that we're synthesizing from the template. So, we'll get pictures like what you're seeing here for cycles one through five where every picture we need to register these pictures so they're all lined up. And if we just look at the, at the spot in the lower left, we'll see okay it's green and then maybe in the second cycle it might be red and, and so on. And as we go through the cycles, we can call the bases because the colors tell us what those bases are. So we, we, at each spot on the slide, we're reading a different sequence off, and we read all of them, of course, in parallel, all sequences that are attached to this slide in parallel. And with face technology that there will be millions of these short fragments attached to the same slide. So when we're done, we'll have millions of sequences all of once. So an important the reason you need to understand this, and understand this data. Is that there are errors in it. So there are a number of sources of error. One source is that, one, one property is that errors increase in later cycles. And the reason this happens is that, remember we're making millions of a few million copies of identical copies in, in one of these clusters, and we're sequencing that cluster one base at a time. Well, this property of DNA polymerase adding a base isn't perfect. So the DNA polymerase, we hope, will add a single base to every fragment on that, in the little cluster in one cycle. But once in awhile, it'll add an extra fra, it'll add an extra base, and that, that fragment will get ahead. And once in awhile it'll fail to add a base, and that fragment will get behind. So what that means is that instead of having all these molecules show as the same color, a few of them, at first a very tiny number, will have the wrong color when we hit it with a light. But as time goes on the number of these fragments and these little template strands that are out of sync will increase. And that increases your error. So that's why we can't just read thousands of bases using this technology. Because eventually the error is too great and we can't read the sequence accurately. So at the end when you get from the, the read out you get from this, this process is, and every position on this slide, every spot you get a read. And a read is just a long streak sequence of, of As, Cs, Gs and Ts, and here's one of the formats we commonly see it in. And, associative with every one of those As, Cs, Gs and Ts is a quality value. Which is an estimate by the base calling software is estimating how, how likely it is that there's an error at that point. And, it does that by trying to compute with, how pure that color signal was. And as, as the sequence goes on, that, that color isn't quite as pure, and the base calling software can tell that. And so it'll get, it'll tell you what its best guess of the base is, but it'll have a higher likelihood of error so the quality will go down.