In a previous video, we carry out the Bayesian reference analysis to predict the percent of body fat in males based on their waist circumference. This figure shows the observations form the 252 men, plus our posterior mean of the regression line and 95% credible intervals for the mean and predicted body fat. The point circled in orange based on case 39 has a body fat measurement that is much lower than the model predicts. Our assumptions are that all the observations are from a single population where we expect body fat to be linear related to the waist measurements in this range of the data with all observations having a constant variants. Does case 39 belong to this population? The model description is that the observed body fat equals the mean given by the regression line plus the error or deviation from the line. Rearranging, the unobserved deviation is equal to the observation minus the unknown mean and x, this depends on alpha and beta. Using our posterior distribution for the parameters, we can find the posterior distribution of the deviation which'll be a student t distribution. It's mean will be the ordinary residual or observed value minus the fitted value, and its scale will be a function of the leverage of that case. Under the empirical rule, we expect that 95% of the errors will be within plus or minus 2 standard deviations from the mean or that almost all will be within plus or minus 3 standard deviations. Using this idea we may view deviations that exceed k standard deviations as potential outliers. We can then use our posterior distributions from alpha, beta and sigma to calculate the posterior probability that a deviation exceeds k standard deviations in absolute value. Now, there's no closed form expression for this, but we can easily evaluate this using r. Appealing to the empirical rule that almost all observation should be within three standard deviations, we can take k equal to three to define suspect observations. Plugging in the data, the posterior probability of case 39 is deviation exceeding three standard deviations as close to 1 or 0.9917. We focused attention on case 39, because the data suggested that the model was a poor fit. As the sample size increases the chance of deviations larger than three standard deviations will also increase. Instead, we can use a value of k such that the prior probability of seeing any outliers is 0.05 to adjust for the sample size. In a population of this size there would be a 5% chance of absolute deviations exceeding 3.71 standard deviations. The posterior probability is roughly 68% with posterior odds that 40 times the prior odds of exceeding 3.71 standard deviations. Similar to the Hulk, it seems highly likely that case 39 is from a different population. This case has a waist circumference that is larger than the other cases, but has a surprisingly low body mass. Using Bayesian regression, we can obtain posterior distributions for the parameters and other quantities of interest such as predictions. Checking models assumptions is always an important part of data analysis. In this video, we found that the posterior distribution for the deviation between the observation and model. And define suspect points in those cases where the absolute deviation exceed k standard deviations. We identify suspect cases by comparing the prior probability to the posterior probability. Analysis with or without the case may be reported or you may be able to find additional scientific grounds to support dropping the case from the analysis. A word of caution, if you find that you're continuing to drop cases as being suspicious, it may be time to step back and consider this problem is not with the data arising from different populations. But perhaps the current model should be thrown out and an expanded model could be used to accommodate all of the data.