I’ve seen many posts in the past from rationally-minded authors describing the statement and applications of Bayes’ Theorem, but often the proof or the reasoning behind the theorem is omitted. This is somewhat understandable, but I also see it as unfortunate since the theorem is rather easy to explain and understand. Here I will attempt to prove Bayes’ Theorem as well as try to explain how you might apply the insights gained to real life situations.

First we start with some notation. I will let a capital letter, for example , denote an event. This could be any event: a coin flip landing on heads, winning the lotto, your house burning down, whatever. Then will denote the probability of the event happening. This is a number between 0 and 1, where a 0 means the event cannot happen and 1 means the event is certain to happen no matter what. If, for example, denotes a coin flip landing on heads, then since landing on heads is one of only two possible results when flipping a coin, each being equally likely. If denoted either of the other two example events above, then would be an extremely small (but still nonzero) number.

Now we want to look at two events at the same time. Say we have two events and . How can we combine their probabilities in a reasonable way? Say we want to consider the possibility that both events occur. If the events are *independent*, meaning the probability of one happening doesn’t affect the probability of the other, then the probability of both events happening is just the product of the individual events. We denote by the event that both and take place. This is read as the *intersection *of and . Then is the *joint probability* that both events occur. So if the two events are independent we have . It’s instructive to consider some examples of events that are independent and some that are not. For example, getting heads on a coin flip and stepping in a puddle when you leave your office are totally unrelated events. But stepping in a puddle after work and it raining in the afternoon are very related. If one occurs, the likelihood of the other happening is increased. Rain in the afternoon and stepping in puddles are *dependent* events and simply multiplying the probabilities of each individual event does not correctly give us the probability of both happening.

Now suppose that we know event has already occurred. What is the probability now that will happen? We denote by the *conditional probability* that will occur given that has occurred or is known to occur. From here we see that . In the same way, we can see that . Since the probability that both and will happen doesn’t depend on the order in which we write them, we have and by substitution, we get the nice equation

Now assuming there is a nonzero probability that event will occur, we can divide by and we get the celebrated Bayes’ Theorem:

Now that we have proven Bayes’ Theorem (or Bayes’ Rule as this equation is often called), let’s see how we can apply it. Possibly the most common application is in hypothesis testing, by a process called Bayesian inference. Let denote a hypothesis and some possible evidence supporting (or possibly discounting) your hypothesis. We want to know the probability of your hypothesis being correct, given the evidence you have. By Bayes’ rule, we see that this probability is the probability of the evidence being collected given the hypothesis is true multiplied by the prior probability that your hypothesis is true divided by the prior probability that the evidence would be found:

Here is one concrete example that is often given: Suppose that you have three identical looking coins, two fair – meaning equally likely to land on heads as tails – and one that only lands on heads (for some mysterious reason). If you pick one of the three coins at random, what should you assume is the probability of the coin you picked being the unfair coin? Well since the coins look identical and you haven’t yet tried the coin out, the probability you picked the unfair coin is one-in-three. Suppose you then flip the coin three times, landing on heads every time. What should you now calculate as the probability that your coin is unfair? Here is where Bayes’ Theorem comes into play.

Let denote the hypothesis that you have selected the unfair coin and the evidence given by your coin landing on heads after three successful flips. As stated above the prior probability of your coin being unfair is one-in-three: . What is the probability of your coin landing on heads three times in a row given that it is the unfair coin? That is, what is the probability of acquiring this evidence given your hypothesis is true? Well this is a certainty since the unfair coin can only land on heads. So we have . All we need now is to determine the prior probability of getting three heads in a row. For a fair coin, the probability of getting three heads is since each flip gives a 50-50 chance of landing on heads and each coin flip is independent of the previous ones (your coin does not have a memory). Again for the unfair coin the probability of getting heads three times in a row is 1, a certainty. Since two-thirds of the coins are fair and and one-third are unfair, we add the probabilities of the fair/unfair possibilities since these are mutually exclusive to determine the prior probability of getting three heads and apply Bayes’ Rule to see that

So after this simple experiment of flipping your coin three times, you have gone from a one-in-three probability to a four-in-five probability that you have chosen the unfair coin. You can continue your experiment and updating to get higher probabilities using Bayes’ Rule. For example, getting 8 heads in a row would increase the probability that you picked the unfair coin to over 99%. Of course, getting a single tails would drop the probability to 0 ( in the equation).

How can we apply this theorem to real life? Well for everyday events, it is certainly not feasible to calculate probabilities over and over in your head, but a lesson can still be learned. Bayes’ Theorem tells you how you should update your prior beliefs given new evidence. I will use an example to describe this process. Suppose you have the belief that touching a moth will kill it. (This example comes from recent conversations I have had with some friends, notably Ryan Carroll, who asked to be named.) You can’t recall ever touching a moth nor have you ever seen one drop dead after being touched by a human. But someone you respect has told you that touching a moth will kill it. You’re not too sure what the truth might be so your trust in this belief is about 50-50.

Now suppose you’re walking through the woods, a moth lands on your arm, and without realizing it, you swat the moth off your arm with the back of your hand. Terrified that you might have just sent the moth to an early grave (and because you have an inordinate amount of free time), you follow the moth for an hour or so. After a while you notice the moth seems to be perfectly healthy fluttering about. At this point you are **forced** to update your belief that touching a moth kills the creature. Maybe you now only have about a 15% trust in this belief. This percentage is of course imprecise, but since you started with a 50% trust in your belief and have observed evidence that seems to go against this belief, Bayes’ Theorem mandates that you must lower your trust in this belief. You cannot be 100% sure the belief is false, however; it is possible the moth dropped dead the moment you turned around and headed home. But this simple observation must at least to some degree force you to update your previous belief.

If this example seems a bit odd, I must admit I have partially used it as an exercise for myself. On several occasions I’ve heard the claim that touching a moth results in its death. Since I was unsure about the truth of this statement (maybe about 70% confidence in it being true), I set out to determine the answer once and for all. Appealing to the sage advice of the internet I can now update my belief in the death-by-contact hypothesis to somewhere closer to 99% confidence.

The goal of understanding Bayes’ Theorem isn’t so much to be able to think about running probabilities in real time. The lesson is that even small observations and pieces of evidence **should** (we proved the theorem after all) force you to be frequently updating your beliefs and your confidence in them. If we observe an event and simply think to ourselves “hmm that was odd” without considering its possible consequences, we certainly do ourselves a great disservice.