In our last video, we created four genome groups in preparation for building phylogenetic tree using Patrick's Codon tree pipeline. Let's build that tree. To find any of her services in Patrick, you go up into the "Services" tab and click on that, and then down to phylogenetic tree. Let's click on that. This is full, the codon trees pipeline. The codon trees pipeline uses both protein sequences and the corresponding nucleotide gene sequences, that go with those proteins. The way it finds proteins is it looks for Patrick's PG families, the global Patrick families. It finds those that are shared, and then it generates a tree with both amino acid and nucleotide sequences. It's pretty awesome because it tells you how many both amino acid and nucleotide characters were used to build the tree. The report on the tree will also tell you potentially problematic genomes. They'll tell you how many genes you looked for and how many it was able to find. It's a very powerful service and I think you can like it a lot. You can see here it requires at least four and up to 100 genomes. Here's the secret. Actually it can take 200, just don't tell anybody. First let's look for our genome groups. We recently created four different groups. It shows the most recent genome groups I created, but I could also search for them here. Like if I wanted to find some drusilla groups, I would start typing something like that. But I have my most recent groups, so I click and the last group I created was Blochmannia, those endosymbionts from ants, named for Friedrich Blochmann. So I select that, and I had to get those genomes into this box. It can't build a tree, this button will not turn blue until I filled everything in. I'm going to click "Add". Now I have seven genomes, I'm on my way. But I wanted to add more than that and Wigglesworth named first-serve Vincent Brian Wigglesworth, the insect physiologists. These are the tsetse fly endosymbionts. We have to include those. So let's have that, and then we had Raisa , who would not want to include endosymbionts, [inaudible]. We got to have that on the tree. So we click, add that. The last one was Buk Naira. So I'm going to click on that and add that group. Now these are all public genomes. But often one of the main reasons people come to Patrick and use this pipeline is so that they can add their own private genomes. I could add those genomes one at a time. Let's click on this box here. It allows you, if you want to add genomes in a singular fashion, you can filter down on reference genomes, those are from NCBI's reference genome distinctions. Representative, all the other public genomes are my genomes. Just to narrow the search, I'm just going to look at my genomes, and I guess you all can tell I'm a big fan of Wigglesworth and whenever I see that one has recently been annotated or that I can find sequences for did SRA, I would like to annotate it myself, so that I can include it. Wigglesworth, so I'm going to add that one. I clicked on it. Let me show you how I did that again because I got so excited about the Wigglesworth that I didn't do it slowly. So I start to write it, and Patrick's auto select will try to find the closest matches to it. Notice that little lock in front of it. It indicates that that's a private genome, not a public genome. So this is my data. I click on that and it'll auto-populate the box with that particular genome. But in order to include it in the tree, I have to add it here. Now I have these genomes that had been added and I can click on these to see who they are if I'm interested. Those are the genome IDs and at this point maybe I should just use a few words about genome IDs. Many genomes will share the same name, currently in Patrick now, we have eight different copies of Mycobacterium tuberculosis, each 37 RB genome, and people have deposited those to GenBank can we pick it up? On each lab is generally going in and they want to resequence that string to see if there have been snips or anything. Many genomes will share the same name, but the genome ID is a unique identifier that each of them has. So this is the most unique one that you have. We've got all our genomes, but we still can't submit it. Well, because we have to fill in the parameters. For any of these, you can Click on the "Information" icon that'll pop up a dialog box that tells you what's required here. First we need an output folder. I'm going to show you how to create a new folder. I click on this, the folder icon, and it opens up a folder window that's in my workspace. Then I'm going to create a new folder here. Here I could upload data, but I'm going to create a new folder so I click on that. It gives me a pop-up window where I can name the folder that I want it to go to. We'll just call it Endosymbiont Trees, and I'll create that folder. Then I click "OK". Look, it says home, that's wrong, Patrick. I click on the down arrow here and up at the top it shows me the most recent folder I created [inaudible] Trees, and I pick on that, click on that. You can also pick on it, pick and click. Now I need a name to the tree. Let's come back in a second and fill this in. Because, let me also talk about number of genes because I tried to include these things in the name because I run a lot of trees and I want to know how many genes I used and if there were different parameters in that. We're going to start with the simplest one, 10 genes. We're going to ask it for 10 genes. Notice you can do 10, 20, 50, 100, 500, or 1000. The more genes that you add, the longer it takes. We have the 10 maximum deletions. What this is telling me is that it's looking for protein families that everybody shares. If it can't find any shared protein families that won't build the tree. But let's say you have one genome in there that's problematic, so you could ask it to allow for one genome to have deletions or two up to 10. Then you also have duplications, let's say, and this actually happened. We had someone that wrote in to us that was unable to build a tree. The only reason they were unable to build a tree was believe it or not, every single gene in their genome have been duplicated. The codon trees pipeline looks for protein families that every genome has and that every genome only has one copy of. This guy was out of luck until I told him, "Oh, you can add the duplications of 0-10," so then it will say, "Oh, well, all of these genomes have one copy of the gene in this particular genome family, but this guy has two, it's okay, we'll choose that one." You may ask, which one would they choose of the two? It chooses the one that has the closest homology to the other two. We're going to start small, 10 genes, zero deletions, zero duplications. I still can't submit it because I haven't named it, so I'm going to do that, Endosymbiont_May 2020. I will run a lot of these, I just like to see it all in front of me. You can name your trees anything you want. But this is how I name it, 10 for the number of genes requested, zero for the number of deletions, and zero for the number of duplications. Now, the submit button is blue, we can launch that tree. To do that, we click "Submit" and it's off. You get a message here saying your job has been submitted. It can take a while and check your workspace to see the progress of your job, which would be the jobs here. If I click on this, it'll show me that my job is currently running. Now one thing about these jobs when they're running, expect a tree job may be to take 24 hours. You can let us know if it's longer than that. But sometimes these take 24 hours. We've submitted the job, now we got to wait for it to finish. I know that you are all on the edge of your seats as am I, to see what this endosymbiotic tree looks like, so join me for the next edition of the Codon Tree Webinar Series and we'll look at the job that's completed. Thanks a lot. Okay, here's your second assignment. Those groups that you created, you're going to make five different trees and some that contain only one of those groups and one that contains all of them. Another thing you're going to include, because this is really probably one of the main reasons people want to build trees is they have their own genome and they want to see where it piles in the tree. Remember back when we did that tests for the comprehensive genome analysis, and we annotated those contexts and I just called it bacteria, and then when we opened the genome record for the comprehensive genome analysis, it had created the tree that showed us that it was in the genus Brucella. I want you to include that genome with each of the trees, and in one it'll be that genome plus Brucella. The second tree will include the frog Brucella isolates. The third tree will expand up to the other members of the Brucellaceae. The fourth tree will include Bartonella, and the fifth tree will be Mesorhizobium. Good luck.