Welcome to Week 9 of our medical software class. This week we'll take a detour from the software lifecycle process to cover some mathematical background. We'll do some introduction to probability, statistics, detection theory, significance testing, and the mathematical techniques needed to design a validation study. Then we'll pick up the software lifecycle next week when we talk about software validation. The goal of this week, we have a detour from the lifecycle process, we'll introduce concepts from probability and statistics. We'll talk about the critical area of signal detection, and then we'll present some methods that are necessary to understand how clinical trials are designed. As an outline, we have eight short segments. We have the first two talking about basic probability theory and introduction, and then we'll talk about multiple events. Then we'll have two segments giving introduction to statistics, the basic introduction. Then the process of estimating probability density functions. Segment 5, we'll talk about the critical pressure of signal detection. Did something happen? Do we have an event here? Then the last three are going to be a little bit of the foundations for clinical trials, the discussion of statistical significance and randomization. This week's material, if you want to do some further reading, is based on Week 8 of this textbook. Here as a disclaimer, the next few slides compressed material that's usually taught over many, many semesters. This is a very high level introduction to the topic for those students that have no exposure to this before. There are many textbooks. This is the one I was taught from. This is a work by Papoulis and Pillai and Probability Random Variables and Stochastic Processes. There are many other books like this that you can also go back for more details. Let's begin with an introduction to probability theory. We're going to do this from an engineering perspective and not the formal introduction to the topic that you'd get in a math class, but more of a practical perspective. First we have deterministic systems. A deterministic system is something where we can compute the exact output given the input. Here's a plot, y of t equals a times x of t, x of t is the input, y of t is the output, a is the parameter. If I give you x of t and a, you can give me y of t precisely. This is a deterministic system. The other kind of system we have is a stochastic system, where now we have this extra component, n of t. This is random noise or unknown variation. Here, we can only estimate the output with some level of uncertainty. Can say that y of t is going to be close to this value within this bounds 95 percent of the time statements of this nature. We use this to describe processes that either we can understand things that are truly random, or things that even though we could understand, is far too complicated to do, and even if we tried, we may not get good enough. This is the basis for the use of probability, or this is where n of t comes in and the characterization of n of t there. A lot of probability theory comes from gambling. This is a picture of the famous casino, Monte Carlo in Monaco. There methods called Monte Carlo methods named after this place. A lot of the examples you've seen probably theory involve things like dice, cut games, casinos, and the famous Monte Carlo method for simulation, and the stock market and modern version of the casino, at least the way many people seem to think about it. To give you some medical examples of where probability comes up. Let's ask some questions here. Is the difference in life expectancy between cancer patients that were given an experimental drug and patients that were given a placebo real or just random luck? Did we see an improvement? Let's say a study showed a small improvement, was this something that we can trust or was it just got lucky with the patients that got the drug versus those that were not? How likely is it that with somebody tests positive for a particular condition, they actually have the condition as opposed to being a false positive? This a very common problem and we'll actually work through an example of this later in this week's segments. Let's ask a third question, what's the probability of getting COVID-19 after you're vaccinated. Those are things that we can only understand and model probabilistically by just doing experiments and learning from data. Let's start with the first example. This is where all probability classes start from, and what's called the fair coin. Again, remember this, the background of this field come from gambling. We have a coin is actually fair, not one that somebody cheated and will always come out heads. This is a coin. We have heads, heads of George Washington in this case, and tails. This is the other side of the coin, and that's typically the case. The word head comes of course, because coins historically would have the head of a king or queen. The American case here is Washington on it. A fair coin, if you flip it out in the air and lands, you will come down heads 50 percent of the time and tails 50 percent of the time. This is what fair means. Theoretically, of course, this is a deterministic process. There's nothing random about it. If you knew all the forces, you could actually figure out, at least theoretically, how the coin will land. But the best way to characterize this process is really probabilistically. Here's a problem. We could theoretically compute all the forces, figure out the angle, the friction, the humidity, all the factors that will affect how this coin will land and which way it would land on. There's nothing random about it. But the effort required to do this and to do this accurately is probably beyond our ability. At some level, this is the kind of thing that's just easier to characterize probabilistic review. Another example from some of our own work. We try to estimate how the process deforms during biopsy when it's pushed on. These are deterministic process. If we could model the mechanical properties of the prostate, the forces applied to it, all the surrounding structures, we could actually compute what the deformation was. But the other way to do it is to take a lot of examples of process deforming during this procedure, learn the probability density function of deformation and then use it to get just a probabilistic estimate. On average, it will be here plus or minus one millimeter and often that's good enough. Even though this looks less scientific, the actual deterministic full mechanical modeling may be less accurate because we may not understand every last detail of what theoretically is a deterministic process. The first element in statistics is what's called probability density functions. The domain of the function, the input to the function is the list of potential events. We either have heads or tails. The outcome is the variable x, and we write something called p(x equals outcome) to describe the probability density where outcomes is either heads or tails. That's the domain of the function. In our case, the probability x equals heads is 0.5 and the probability x equals tail is 0.5, and this is known as a binomial probability density function. This is a discrete probability density function because it only takes the finite number of values, heads or tails. It doesn't take a continuous value. Next step is to look at continuous probability density functions. Some of you have seen this in high school. These are variables that map the probability of getting a continuous value. For example, take x to be the height of adult men in the United States. This is one variable that we want to predict. We simplify the notation instead of writing p height equals x, we'll just write p(x) here. The pdf can be modeled using a normal distribution with mean Mu and standard deviation Sigma and for height of adult males in the United States, this means 1.75 meters and the Sigma 7.5 centimeters, and if you plot it, you get this plot on the top left. This is what a normal distribution looks like. Again, many of you would have seen this in your earlier schooling. If you vary the parameter Sigma, you can get either a narrow distribution or a fatter distribution, and that gives you a sense of the uncertainty involved. A large Sigma gives you a lower peak and wider distribution. A small Sigma gives you a higher peak and a narrow distribution. That's just how different variables vary. There are other kinds of distribution. There's more to life than the normal distribution. We can have an exponential distribution that models things that become increasingly unlikely as time goes by. This is often used for survival studies. How long will something go on for whether is a person's lifetime or something like that. We can have mixtures of Gaussians where we have two peaks. We have two populations that when mixed together, one population is here, the other population's there but when measured, we can actually see something like this, a mixture of Gaussians. We can have something like the t-distribution we should confine. We'll meet again when we talk about statistical significance and this is a distribution that has fatter tails at the normal distribution. One of the properties of the normal distribution is that as you go away from the mean, it dies a little quickly. This black curve here, which is the t-statistic with what's called degrees of freedom equals to infinity. We'll come back to that later. The probability of being very far from the mean is zero practically. Whereas other distributions have fatter tails with a probability of being away from the mean is actually non-zero. That causes all kinds of problems that we'll come back to later during this segment. The other things to keep in mind, there are other descriptions used. The first one is a cumulative distribution function. If we take this our normal distribution, the cumulative distribution is a probability of x being less than some value. It's the area of this integral here, it's the area under the curve. If this is x, the cumulative is the area under the curve before that point. If you plot that out, you end up with a sigmoid that starts at zero, the probability of being less than negative infinity, of course is zero. The probability less than infinity is one, and you get this kind of shape. This is a cumulative density function, is the probability of having value less than something as opposed to probably from that exact value. The opposite of this is what's called the survival function. The probability of having x greater than certain value. If you plot this, it has the opposite effect. In fact when you add those two, the cumulative and the survival, you get one. The name survival comes from using it for life expectancy studies. If x is h, what is the probability of somebody who lives beyond 80 years? It's the area under this curve to the right of x equals 80, or the value perhaps somewhere here, whatever that number happens to be. In addition to single-variable studies, and this is what we looked at so far, we're describing a single variable, x, weight, height, whatever it was, we can have vector distribution functions where multiple variables consider the vector x, where we'd describe the height and the weight of a person. These are two random variables and we define a distribution especially multivariate pdf. This is a multivariate normal with mean vector Mu and covariance matrix Sigma. This takes the place of the standard deviation. This takes case of the original mean and discuss this form is one we have square root of 2Pi times the determinant of the covariance matrix. You can see the exponential up there. These has pump like shape in two-dimensions shape, weight is this way and height is this way, it looks like a Gaussian in every profile, it's a pump coming out. This is a multivariate density function. So far we have talked about the probability of one event happening, whether that one event is a single number, the height, or that one event or that measurement is a person that has multiple attributes, such as height and weight. In the next segment, we'll expand this a little bit further and we'll talk about the probabilities of multiple events, event A followed by B, and how likely is it that those two things happen. What does knowledge at one event happened confer to us about the likelihood that the second one will happen later in time. With that, we'll stop here and we'll pick up the story in the next segment. Thank you.