Before we can go on to modeling the relationship between two numerical variables using a regression, we first need to define residuals. Residuals are basically leftovers from the model fit. So we can think about our observed data as the model fit plus the residuals. The residual is defined as the difference between the observed and the predicted Y. So the observed value and the predicted value of the response variable for a given data point in our dataset. So we can write the formula for the residual as and we denote it using an E for error, EI is equal to YI, the observed response variable. Minus Y hat I, the predicted response variable. We're going to focus on two data points, Rhode Island and DC. The observed poverty level in Rhode Island is around 10%, and the predicted poverty level is slightly over 14%. The difference between these two, which is shown with the yellow line on the plot, is the residual. And what the residual tells us that, the percentage of those living in poverty in Rhode Island is 4.16% less than predicted, or in other words, less than what this model predicts. Similarly in D.C., the observed value is high up around 17%. While the predicted value is much lower, around 11%, so this time the residual is telling us something slightly different. In this case, the percentage living in poverty in D.C. is 5.44% more than predicted. So the model overestimates the poverty level in Rhode Island and underestimates the poverty level in DC.