In the current section we'll be talking about how we can archive content in Unix. And we might want to do that for several purposes. For instance we might have very large data files that we would like to compressed. Or we might have several data files that might all belong to a common project. Or we might just want to exchange information with a collaborator. So the commands that we'll be talking about today will gzip, gunzip, bzip2, bunzip2 and tar, and perhaps another one, zcat. Let's get started. Within our directory, Coursera, we'll move into Plants. And for illustration purposes, let's say that we wish to compress the content of the apple.genome file, which can be quite much. So let's look at the file of the apple.genome file. And we can compress that, we can create an archive. Using the command gzip, gzip apple,genome. And these are two examples. So the operation was very fast. Let's look big comparison at the current size of the apple files. Actually, let's look at one of them. One observation is that you won't see the file apple.genome anymore, it has been compressed to a file that's called apple.genome.gz. And you can compare the size of the original apple.genome file. Which was 65 kilobases, to its current file current size compressed, 90. So that was a compression of about 3.5-fold, which will become very useful. I should mention that gzip uses a particular compression algorithm which is Lempel-Ziv Once you've compressed the file, we can obviously uncompress it and the command to do so, for gzip compress file is gunzip. Simply type gunzip apple.genome.gz, and this will reverse the operation. So now when you were releasing the files in this directory -dl. We will have retrieved your original apple.genome file. So there's no loss of information to the compression and then decompression. There's another one we used algorithm that perhaps have even more compressive power. Especially, for genomic data sets. And that is bzip2. So we can similarly type bzip2 apple.genome. And now we can view. This has created a file with the extension bz2. And we can look at the size of the new file and that 18kilob so that's sightly smaller than the one that you're paying with the Compression. Just for your information the bzip algorithm uses the Burrows-Wheeler compress representation of strings. In a similar way once we have the file zipped if we want to uncompress it we would use bunzip2 Apple.genome.visit2 and this will simply reverse the operation. So, now we retrieve the apple.genome file with it's original size and it's original content. So as I've shown you gzip and bzip only compress one file at a time. But sometimes we might want to compress multiple files together. To do so, first we have to create an archive that simply chains together, links together, all the files. And then a second operation would be to compress the resulting file. In order to create the archive we would simply tar, we would use the command tar, to put together several files. So, in this case [COUGH] let's say that we want to archive. All the parts in this directory. For actually apple.genes, apple.genome, and apple.samples. The command tar takes as arguments first a number of options. I will explain each one of them. CVF followed by the name of the target archive. So, let's call it apple, right, .tar, with the extension tar. And followed by the list of names, the list of files and or directories that will be included in this particular archive. Going back to the commands, back to the options, C specifies that tar is used to pull together and archive. V is for verbose to tell us the list of files, at one time that will be included in the archive. And -F simply means that there's going to be a file that indicates to the target file, that follows apple dot tar. So let's perform this archiving in which the three files, apple.genes, apple.genome, apple.samples, are going to be put together in one unique archive. So you can see the v option gave us that apple.genes has been added to the archive. Apple.genome has been added and apple.samples. And looking at the list of files, we have apple.tar. If we're looking at the list If were looking at the sizes you can see that apple.tar contains roughly the content and the size, the cumulative size of apple.genome, genes and sample. So there hasn't been any compression per say. For the compression part we are going to type gzip the command that we just We just analyzed. Apple.tar. And that will give us apple dot tar dot gz which is now much reduced in size so it's only 19 only 20 kilobytes. So, now we have an archive as I said you might want to use this when you use the compress files on your, in a systematic way on your system. When you want to share data with collaborators or simply when you're preparing a distribution software. But how do we open it up? Let's go back one directory And let's create a little directory that I'm going to call sandbox and we're going to put a copy, copy apple. A copy of the archive in the sandbox directory. We don't want to overwrite the previous files. Now let's change to sandbox and illustrate how we can open the archive. So first of all we have to unzip the file, gunzup apple.tar.gz. And that happens quietly and the result is the apple.tar file that we just looked at. Which has the same size as we saw earlier, 17 kilobytes. Now we use the tar command with a different set of options to untar to open your archive. Tar-x which tells to extract the files from the archive. B which says B for bows, so list all the names of the files that were included in the archive, and F that says from the following file apple.tar. And let's see what we get. So it says that it extracted the files apple.genes, apple.genome, and apple.samples. And we can verify that by listing the content of the current. And indeed we have apple.genes, apple.genome, and apple.samples. And we still have a copy of the apple.tar. So we were able to retrieve the content from the archive. I'm going to show you one more trick. So far we archived files along. So let me show you how we can tar and we can archive entire directories. And for that operation, first I'm going to clean up the current directory. So I'm going to remove everything in this directory. That's one operation you have to do carefully, but now I know that I don't need anything in the current directory, so rm start. We're going to remove everything in the local directory. Ls, okay, I want to move back to the plants directory and from there on. From which I can see the apple, peach, and pear directory. And let's say that I want to archive everything that's in the apple directory, in it's directory structure. I can do it in a similar way with tar. Tar.-c, which says archive, put together. V, which says be verbose, tell me everything that you're putting in. And f, which says now follows the name of the target file. I'm going to call this appled.tar. So everything will be stored in the appled.tar file. And then the name of the apple directory. So the content of the apple directory, including it's structure will be stored in the AppleD.tar file. And you can see that, so we open apple apple/apple.genes, and so on, including our previous tar. We can similarly, let's say basic to now the apple, first let's look at the size. Ok. And now let's bzip2 appleD.tar which produced the appleD.tar.bz2 5. We remove these into sandbox. And in the sandbox, I'm going to demonstrate how we open the archive and what we obtain. Cd to sandbox. The first operation again will be to decompress the file. And we're typing bunzip2. AppleD.tar.bz2. And now we can enter the file, tar -x, which means extract, v, be verbose, and f, from this file. AppleD.tar and this tells us that it takes after the directory apple and its contents. Apple genes, apple genomes, apple samples, apple samples sorted and the kind of genes. And indeed we can now look at the directory apple and we can see all those files represent. One last operation for working with archived content which becomes very convenient is ZCad. So that allows us to look inside a file that has already been compressed. So for that operation Let me gzip again just the apple.genome file. So now we have in the directory apple, the file apple.genome .gz. Which is compressed. So a simple look at this file will probably not show it once, it's a binary file. So this concludes our section on archiving content in Unix.