So I've talked about the formal aspects of correlation, and how we measure observations and try and look for relationships between these quantities or parameters or variables. But what about the underlying issue that we're trying to explain? This is the confusion between causation and correlation. It's a profound issue in science and it has actually lead to some famous mistakes in the history of science, because correlation does not in and of itself imply causation i.e. cause and effect for which there is a physical theory to explain it. As an almost trivial example, consider this relationship in observations made around the world between the average temperature and the number of pirates in that region of the world. It's a very strange kind of correlation but actually quite a good correlation. The higher the temperature, the more the number of pirates. Does heat make pirates? Turn normal people into pirates? Or do there happen to be more pirate activities in hot parts of the world? The correlation of course tells you nothing about what's going on, what underlies this relationship even though the relationship is present in the data. So the issue of inferring causation from correlation is a big step in science and that's the step that can't be taken lightly and must be taken carefully, because getting this wrong of course is a profound error. Another example comes from the history of philosophy. A famous story told by Bertrand Russell, one of the greatest philosophers of the 20th century. He talked about a chicken growing up in a farm. Every morning the sun would rise and the farmer would come out and scatter seed and the chicken would eat. The chicken being a small brain, naturally associated or correlated the rising sun with being fed, and eating, or good outcome. So over time, as days went by, and weeks, and months, the chicken made a quite plausible connection between this correlation and causation. The rising sun was obviously making the farmer come out and feed him. But one morning however, the sunrises, and the farmer comes out and rings the chicken's neck for the dinner table that night. The chicken has suffered a disastrous, a catastrophic failure of logic, because as confuse correlation with causation. Another example, comes from the era of Margaret Mead, and Anthropology in the 1930s in the South Pacific. Margaret Mead observed that there were some South Pacific Islanders that actually put head lice into their children's hair. This seemed like a very strange behavior one she couldn't explain made no sense until she realized that, when children have fever, their heads become hot, and the temperature range that lice will accommodate is fairly narrow. So when a child has a fever and is sick, they never have head lice. So the islanders in this case, we're using head lice to try and ward off a fever, a very strange behavior but a confusion of correlation and causation. As an example of unavoidable uncertainty, we should recognize the probability and chance play a role in science, in observation, and in data. A macroscopic example of that might involve the spinning of a coin sequentially or the roll of a dice. The probability of a coin toss is always 50 percent assuming the coin is unaltered. That is true no matter how many times the coin has been tossed. Understanding how these probabilities are uncertainties relate to each other is an important part of science. It's a common fallacy in the everyday world which places like Las Vegas make a lot of money. That if for example, you've had a series of rolls of the dice without a six, maybe a large number of them. A six is somehow more probable. A six is always a one in six chance with roll of the dice, just as a coin toss at least to 10 heads in a row still, has a one in two chance of producing the next toss to be a heads. So what we see is that the behavior of these probabilistic situations becomes extremely predictable with large number of events, but is unpredictable with one event. Indeed, to casting this coin thousands of times or rolling a die thousands of times, produces an extremely reliable within a narrow well-determined range outcome of one and two, or one and six. So when scientists are dealing with individually uncertain events they often gather lots of information to hone in on an average behavior which establishes that probability with increasingly smaller bounds. The microscopic situation dealing with the last layer of uncertainty, involves for example, radioactive decay. The half-life of a radioactive element is the well-determined time in a physics experiment within which half of the atoms in any sample will decay into a different by-product usually a lower mass element. That's a well-determined number accurate to several decimal places and precision. But if we look at any individual atom in that sample, we can say with no certainty at all when it will decay. It's completely unpredictable. Macroscopic example of this and more trivial one would be popcorn. You can take a set of popcorn kernels and predict by a series of observations and different experiments with quiet good reliability how long it will take half of those kernels to pop. But if you take any particular kernel, it's time to popping is extremely uncertain almost indeterminate. So scientists routinely work in this situation where there's either a quantum level, and fundamental uncertainty, or an operational probability attached to an event. They overcome these uncertainties or imprecisions by gathering more and more data. So the principle drawn from this, is that scientists are always data hungry, always wanting to make more and more observations to refine the inference they can draw from those observations. Science is rooted in causation, the idea that effects have causes and we can determine them logically and empirically. This is the basis for all science, but it's easy to be confused between causation and correlation. Correlation is where two quantities are associated without having an underlying mechanism that relates them. Disentangling causation and correlation is the hard work of science.