[MUSIC] >> Metropolis-Hastings is an algorithm that allows us to sample from a generic probability distribution, which we'll call our target distribution, even if we don't know the normalizing constant. To do this, we construct and sample from a Markov chain whose stationary distribution is the target distribution that we're looking for. It consists of picking an arbitrary starting value and then iteratively accepting or rejecting candidate samples drawn from another distribution, one that is easy to sample. Let's say we want to produce samples from a target distribution. We're going to call it p of theta. But we only know it up to a normalizing constant or up to proportionality. What we have is g of theta. So we don't know the normalizing constant because perhaps this is difficult to integrate. So we only have g of theta to work with. The Metropolis Hastings Algorithm will proceed as follows. The first step is to select an initial value for theta. We're going to call it theta-naught. The next step is for a large number of iterations, so for i from 1 up to some large number m, we're going to repeat the following. The first thing we're going to do is draw a candidate. We'll call that theta-star as our candidate. And we're going to draw this from a proposal distribution. We're going to call the proposal distribution q of theta-star, given the previous iteration's value of theta. We'll take more about this q distribution soon. The next step is to compute the following ratio. We're going to call this alpha. It is this g function evaluated at the candidate divided by the distribution, or the density here of q, evaluated at the candidate given the previous iteration. And all of this will be divided by g evaluated at the old iteration. That divided by q, evaluated at the old iteration. Given the candidate value. If we rearrange this, it'll be g of the candidate times q of the previous value given the candidate divided by g at the previous value. And q evaluated at the candidate, given the previous value. The next step, once we've calculated alpha, is to check alpha. If it's greater than or equal to 1, then we're going to accept the candidate So we're going to accept the candidate theta-star, and set Our current iteration of theta. So, that'll be i. Will be that candidate value. If alpha is between zero and 1, Then what we're going to do is accept The candidate. And set Theta i equal to the candidate with probability alpha. Then, the other possibility is to reject the candidate. And set The current iteration of i equal to the previous iteration of i. And we do this with probability 1minus alpha. These steps, b and c, act as a correction since the proposal distribution q is not the target distribution p. At each step in the chain, we draw a candidate and decide whether to move the chain there or to remain where we are. If the proposed move to the candidate is advantageous, meaning if alpha is greater than 1, we will move there for sure. If it is not advantageous, we might still move there, but only with probability alpha. Since our decision to move to the candidate only depends on where the chain currently is, this is a Markov chain. One careful choice we must make is with the candidate generating distribution q. It may or may not depend on the previous iteration's value of theta. One example where it doesn't depend on the previous value would be if q of theta-star is always the same distribution. If we take this option, q of theta should be similar to p of theta, to approximate it. Another popular option, one that does depend on the previous iteration, is called random walk Metropolis-Hastings. Here, the proposal distribution is centered on the previous iteration. For instance, it might be a normal distribution where the mean is our previous iteration theta i minus 1. Because the normal distribution is symmetric around its mean, this example comes with another really nice advantage. This q evaluated at the candidate given the mean here, the density value for this q will be equal to this density value of the old value where the mean is the candidate. This causes these two qs to cancel out when we calculate alpha. So, in random walk Metropolis-Hastings, where the candidate is drawn from a normal distribution where the mean is the previous iteration's value and we use a constant variance in that normal distribution, the acceptance ratio, alpha, will be really easy. It'll just be g at the evaluated at the candidate divided by g evaluated at the previous iteration. Clearly, not all candidate draws are accepted. So, our Markov chain sometimes stays where it is, possibly for many iterations. How often you want the chain to accept candidates depends on the type of algorithm you use. If you approximate p with q and always draw candidates from that distribution, accepting candidates often is a good thing. It means that q is approximating p well. However, you still may want to have q have a larger variance than p, and see some rejection of candidates to be as an assurance that q is covering the space well. As we'll see in coming examples, a high acceptance rate for random walk Metropolis-Hastings samplers is not a good thing. If the random walk is taking too small of steps, it will accept candidates often, but will take a very long time to fully explore the posterior distribution. If the random walk is taking too large of steps, many of its proposals will have low probability and the acceptance rate will be low. That will cause us to waste many of the draws. Ideally, a random walk sampler should accept somewhere between 23 and 50% of the candidates proposed. In the next segment we're going to demonstrate this algorithm in the discrete case, where we can show mathematically that the Markov chain converges to the target distribution. In the following segment after that we will demonstrate coding a random walk Metropolis-Hastings algorithm in R to solve one of the problems from the end of lesson two. [MUSIC]