In this lesson, we will discuss S4 classes and methods. We're doing this because the bioconductor project is a heavy user of the S4 system, and normal R, or R packages available from CRAM, usually don't use this S4 system very much. So what is the S4 system? Well, in R, you have at least three different ways of doing object-oriented programming. There's the classic system, also known as S3. There's a more advanced system, known as S4, and there's a recently introduced system known as S5, or reference classes. So, what's the idea with S4? The first thing you have to bear in mind, and this is a little note primarily for the programmers out there, which is important if you've tried doing object-oriented programming in other languages. In the S4 system, S4 classes and S4 methods are two separate things. You could have a packet that uses S4 classes without using S4 methods, and vice versa. S4 classes is a way of representing complicated data structures. We have used this a lot already. We have seen things like expression sets and summarized experiments. And we have seen the wealth of information and links between different data structures like they have. And this is possible through the S4 system. So I would say that S4 classes have really proven their worth in the bioconductor project, and we are better off using them. So what problem are we solving with S4? Let's start by loading two packages. We are especially interested in getting an example data set. Well in beta, you can make any object into any kind of class. Let's make an example. Let's take a standard Lydia model. We have a data frame here with some y and x values, and I'm making an lm object by using the LM function, or the linear model function. And when I print the object on the command line, I see some information, my estimates and so on, and I can do things with it. You'll also see that if you do things like asking for its class,. It returns LM, and I can do things like names. Lm. Oh sorry, names(lm.object) where I get a things of like things that doesn't immediately make a whole lot of sense. Now, what was this I set with any object in in any object Lydia class. Well in S3, which is an example of. A class is really just a list with an attribute. So let me make an arbitrary list. Let me just say this is a list with two elements. I have some letters, and I have some numbers. We know this, and I just say class(xx) = "lm". So look here, now I turn this list into an object of class lm. But what I have here looks nothing like a linear model fit. It doesn't have these names like coefficients and residuals, and fitted values. If I try printing it, I actually get something out, not very, useful but at least it doesn't throw an error. But I was able to turn this list into a class of object lm without any kind of error message or warning that I was doing something crazy. So the S4 has a lot of validity checking built in. It means that I formally can define what a class contains, and I have a guarantee that if I have objects of that class, it actually looks like my class definition. This is really useful for complicated data structures, because as a programmer you know what you deliver to people, and what you get from people. When you say you have an object, you know it's an expression set, you know exactly what that means. But let's look a little at an expression set. So we had already loaded the library, and we load the old data set into memory, and we print it. We see it as an expression set, which is the same we get out of if we write class. Now we can see this particular expression set class is defined in the bio based packets. And there's a little helper function in R called, this S4, that tells that this is indeed an S4 object. Now, when you have a class of eight, when you have a class, when you have S4 class, the first thing you should worry about is stuff like, how do I get help? And the formal way to get help on a class is not by writing, help on expression set. Although, that'll work, and I'll explain a little bit why that'll would work. But, the true way of doing it is saying is using this construction there that seems rather cumbersome. I have a right class, question mark, ExpressionSet. You can see now I get a help page now on ExpressionSet. Or I write question mark, and then, I have to put quotes in. I say, expressionSet, like the name of the class, dash class. I have to put in quotes because of this dash. This here will give me the same webpage. That's just all I did was refresh here. So this here is a description of the class, what it contains, and so on and so forth. But this is how you get help on it. Now. Objects of this class here. So traditionally, by the doctrine, there's a couple of traditions or coding standards for when you make new classes. First thing is that a class ought to start with a capital letter. The next thing is that a class is supposed to have something known as a constructor. And it should have the same name as the class. So that sounds a little computer sciency, we actually use constructors all the time when we do something like making a list. Here I'm using the list constructor. I'm using a function called, list and it takes some arguments, and it returns the list with these arguments. So in the same way, we have constructor for expressionSet, which is basically the name of the class and a parentheses, and you can see I get something back when there's no data. But I get an expression set back. So traditional with bioconductors is that any class is supposed to have a constructor. And traditionally, the constructor is documented in the same page as the class. This is why, if I say, help on ExpressionSet, I actually get to the ExpressionSet Help page as before. Because this Help page here documents both the class and the constructor. You can see over here in the Help setting that it talks about instance creation. Off the you don't have a need for constructing these complicated classes yourself, although we have a done it a lot. We've used construction subsets I ranges and G ranges. But it's rare that you construct, say, an expression set from scratch. The classic way of defining an expression set is using a function called New. And you say new: ExpressionSet. This one also gives us an expression cell with no data in it. This construction that with New is something was was recommended say ten years ago on the project and we have sense got away from that recommendation. So if you read old documentation, you'll see this New function all over the place. And we now frown upon it. And new packages is not really supposed to have the user use this function. We've seen one example. When you want to run and apply over a BS Genome object, you use something called a BS supply. And you use an optic called BS params, which, at least at the time of recording these videos here, didn't have a constructor function. Okay, so this was constructing a class and getting help with it. So how do we see the definition of the class? Well, the definition of the class, as you can see, with getClass. Notice how we are using a CamelCase. We are using words, we have two words here. It starts with a lower case, and then we have an uppercase. We don't use get.Class, we use getClass and one cap with a capitalC. This is very common for us to use this CamelCase thing. So here we have the definition of the class. We can see that there's something here called slots. There's two main things that are getting printed out there. We get printed out some slots here. Some slots have some names we recognize from expression sets. Experiment data, featureData and so on. And each of these slots have a class like experimentData has class Miame. Lower case assayData has a class called upper case AssayData. And the phenaData slot has a class called AnnotatedDataFrame. At the bottom here we see the class extends eSet and VersionBiobase and Versioned. VersionBiobase and Versioned is not something that as a user you think too much about, but the first line tells us that this class here is also something that is known as an eSet. We'll discuss that in a little bit. Let's start with the slots. The slots is where the data usually is. And you access a slot by using the ampersand. Not the ampersand. The at sign. Or you use the slot function. So you could, for example, write ALL@annotation, and you get out of this little character strand. We can see up here, if I scroll a little bit to the right. That was interesting, it doesn't print it very well. Let's see, getClass, ExpressionSet, options(width = 80). Okay, now we are using a little bit less space. But we have annotation here, and it's a character. So I showed ALL@annotation. I could also use slot(ALL, "annotation") as a character string here. That gives me the same thing. Now as an end user, you're not really supposed to think about slots. You're not really supposed to access data in an S4 class using the at sign. You're supposed to use something called an Accessor function. And in this case here, the accessor function for this slot here is called annotation, and you get basically the same result out. Now why is this important? It's important because some of the slots we have in here are not really supposed to be accessed by users. Whereas these accessor functions are kind of supported by the people who wrote the packets. So the people who wrote the packets don't really want you to access the slots directly, but they want you to go through these excessive functions. We see that there's often accessor functions that are called the same as the slot. We've seen feature data, but if you look at the input here, you see that there's not a slot here, there's not a slot called P data, which we used a lot. But we have used feature data, we have used annotation, protocol data, experiment data, and they have all have accessor functions that are named like this. Traditionally accessor function are documented in the help page for the class. So, let's get that up. Let's make it a little bit bigger. And say, down here on the slots. There's a little bit of description. And then here under methods, there is a list, like featureNames, phenoData, varLabels. We've seen some of them, pData, and we even use pubMedIds. And some of these are accessor functions, and some are not. Sometimes an accessor function is named. Some people use this as a naming convention, where it's like, get something. So I have a package, for example, dealing with methylation data. The way you get the methylation data is you write getMeth, not meth. That's out of style or opinion, and different packages uses different conventions. Now sometimes, classes are getting updated in Bioconductor. You go from one version of Bioconductor to another version, and you decide as a programmer to change the class representation. And for a user it means when you save an object and you come back, you load it like a year later, because you're revising your favor. Hopefully that doesn't take a year, but often you go back and you load all objects. And sometimes, it doesn't happen very often, but sometimes you run into problems because the class definitions have changed. The way you deal with that is that there's a function called update object in Bioconductor. And if you have an old object you have loaded, you basically just say, update object, old object, and you tend to often call it NEW.OBJECT. Or in many cases, you don't really want the old object around anymore because it has been updated. You basically override it with this construction here. Now for update object to work, the guy who is, or the person who is responsible for the class definition, is supposed to write an update object function that works. This is not guaranteed to work, but it's supposed to work. And if you ever are in a situation where a class definition hasn't been updated, and it doesn't work with the Update Object function, you should complain loudly in the support forum. Finally, if you are ever in doubt, you have an object, you are a little bit in doubt as to whether it satisfied the class definition, perhaps you have done crazy things to it. There is a nice function that is worth knowing about called Valid Object, which basically performs a number of checks on a given object, and checks whether it is a valid version of the class. This seems a little counter intuitive. That why is this not being run all the time? And the answer to that is, it sometimes takes a long time to run valid object for really big objects. Yeah, basically we don't want, that function doesn't get run by R all the time. So it is possible to create invalid objects if you are really doing crazy things. That typically happens when you start playing with the ampersand, and you assign things to slots directly. This one was for classes. In the next session we are going to cover is for methods.