Updating Your Beliefs: An Explanation of Bayes’ Theorem

Bayes' Theorem

I’ve seen many posts in the past from rationally-minded authors describing the statement and applications of Bayes’ Theorem, but often the proof or the reasoning behind the theorem is omitted. This is somewhat understandable, but I also see it as unfortunate since the theorem is rather easy to explain and understand. Here I will attempt to prove Bayes’ Theorem as well as try to explain how you might apply the insights gained to real life situations.

First we start with some notation. I will let a capital letter, for example E, denote an event. This could be any event: a coin flip landing on heads, winning the lotto, your house burning down, whatever. Then P(E) will denote the probability of the event E happening. This is a number between 0 and 1, where a 0 means the event cannot happen and 1 means the event is certain to happen no matter what. If, for example, H denotes a coin flip landing on heads, then P(H)=0.5 since landing on heads is one of only two possible results when flipping a coin, each being equally likely. If E denoted either of the other two example events above, then P(E) would be an extremely small (but still nonzero) number.

Now we want to look at two events at the same time. Say we have two events A and B. How can we combine their probabilities in a reasonable way? Say we want to consider the possibility that both events occur. If the events are independent, meaning the probability of one happening doesn’t affect the probability of the other, then the probability of both events happening is just the product of the individual events. We denote by A\cap B the event that both A and B take place. This is read as the intersection of A and B. Then P(A\cap B) is the joint probability that both events occur. So if the two events are independent we have P(A\cap B)=P(A)P(B). It’s instructive to consider some examples of events that are independent and some that are not. For example, getting heads on a coin flip and stepping in a puddle when you leave your office are totally unrelated events. But stepping in a puddle after work and it raining in the afternoon are very related. If one occurs, the likelihood of the other happening is increased. Rain in the afternoon and stepping in puddles are dependent events and simply multiplying the probabilities of each individual event does not correctly give us the probability of both happening.

Now suppose that we know event B has already occurred. What is the probability now that A will happen? We denote by P(A|B) the conditional probability that A will occur given that B has occurred or is known to occur. From here we see that P(A|B)P(B)=P(A\cap B). In the same way, we can see that P(B|A)P(A)=P(B\cap A). Since the probability that both A and B will happen doesn’t depend on the order in which we write them, we have P(A\cap B)=P(B\cap A) and by substitution, we get the nice equation


Now assuming there is a nonzero probability that event B will occur, we can divide by P(B) and we get the celebrated Bayes’ Theorem:


Now that we have proven Bayes’ Theorem (or Bayes’ Rule as this equation is often called), let’s see how we can apply it. Possibly the most common application is in hypothesis testing, by a process called Bayesian inference. Let H denote a hypothesis and E some possible evidence supporting (or possibly discounting) your hypothesis. We want to know the probability of your hypothesis being correct, given the evidence you have. By Bayes’ rule, we see that this probability is the probability of the evidence being collected given the hypothesis is true multiplied by the prior probability that your hypothesis is true divided by the prior probability that the evidence would be found:


Here is one concrete example that is often given: Suppose that you have three identical looking coins, two fair – meaning equally likely to land on heads as tails – and one that only lands on heads (for some mysterious reason). If you pick one of the three coins at random, what should you assume is the probability of the coin you picked being the unfair coin? Well since the coins look identical and you haven’t yet tried the coin out, the probability you picked the unfair coin is one-in-three. Suppose you then flip the coin three times, landing on heads every time. What should you now calculate as the probability that your coin is unfair? Here is where Bayes’ Theorem comes into play.

Let H denote the hypothesis that you have selected the unfair coin and E the evidence given by your coin landing on heads after three successful flips. As stated above the prior probability of your coin being unfair is one-in-three: P(H)=1/3. What is the probability of your coin landing on heads three times in a row given that it is the unfair coin? That is, what is the probability of acquiring this evidence given your hypothesis is true? Well this is a certainty since the unfair coin can only land on heads. So we have P(E|H)=1. All we need now is to determine the prior probability of getting three heads in a row. For a fair coin, the probability of getting three heads is 1/2\times1/2\times1/2=1/8 since each flip gives a 50-50 chance of landing on heads and each coin flip is independent of the previous ones (your coin does not have a memory). Again for the unfair coin the probability of getting heads three times in a row is 1, a certainty. Since two-thirds of the coins are fair and and one-third are unfair, we add the probabilities of the fair/unfair possibilities since these are mutually exclusive to determine the prior probability of getting three heads and apply Bayes’ Rule to see that


So after this simple experiment of flipping your coin three times, you have gone from a one-in-three probability to a four-in-five probability that you have chosen the unfair coin. You can continue your experiment and updating to get higher probabilities using Bayes’ Rule. For example, getting 8 heads in a row would increase the probability that you picked the unfair coin to over 99%. Of course, getting a single tails would drop the probability to 0 (P(E|H)=0 in the equation).

How can we apply this theorem to real life? Well for everyday events, it is certainly not feasible to calculate probabilities over and over in your head, but a lesson can still be learned. Bayes’ Theorem tells you how you should update your prior beliefs given new evidence. I will use an example to describe this process. Suppose you have the belief that touching a moth will kill it. (This example comes from recent conversations I have had with some friends, notably Ryan Carroll, who asked to be named.) You can’t recall ever touching a moth nor have you ever seen one drop dead after being touched by a human. But someone you respect has told you that touching a moth will kill it. You’re not too sure what the truth might be so your trust in this belief is about 50-50.

Now suppose you’re walking through the woods, a moth lands on your arm, and without realizing it, you swat the moth off your arm with the back of your hand. Terrified that you might have just sent the moth to an early grave (and because you have an inordinate amount of free time), you follow the moth for an hour or so. After a while you notice the moth seems to be perfectly healthy fluttering about. At this point you are forced to update your belief that touching a moth kills the creature. Maybe you now only have about a 15% trust in this belief. This percentage is of course imprecise, but since you started with a 50% trust in your belief and have observed evidence that seems to go against this belief, Bayes’ Theorem mandates that you must lower your trust in this belief. You cannot be 100% sure the belief is false, however; it is possible the moth dropped dead the moment you turned around and headed home. But this simple observation must at least to some degree force you to update your previous belief.

If this example seems a bit odd, I must admit I have partially used it as an exercise for myself. On several occasions I’ve heard the claim that touching a moth results in its death. Since I was unsure about the truth of this statement (maybe about 70% confidence in it being true), I set out to determine the answer once and for all. Appealing to the sage advice of the internet I can now update my belief in the death-by-contact hypothesis to somewhere closer to 99% confidence.

The goal of understanding Bayes’ Theorem isn’t so much to be able to think about running probabilities in real time. The lesson is that even small observations and pieces of evidence should (we proved the theorem after all) force you to be frequently updating your beliefs and your confidence in them. If we observe an event and simply think to ourselves “hmm that was odd” without considering its possible consequences, we certainly do ourselves a great disservice.

Summer Reading List

I’m posting all the books I plan on reading in the next couple of months, listed in order from least academic to most academic. Anyone reading this should feel free to add reading suggestions or leave comments if you’ve read them also.

  • “Hitchhiker’s Guide to the Galaxy” – Douglas Adams
  • “The Girl with the Dragon Tattoo” – Stieg Larsson
  • “Surely You’re Joking, Mr. Feynman!” – Richard Feynman
  • “The Psycopath Test” – Jon Ronson
  • “Letters to a Young Mathematician” – Ian Stewart
  • “Stiff: The Curious Lives of Human Cadavers” – Mary Roach
  • “The Believing Brain” – Michael Shermer
  • “The Drunkard’s Walk: How Randomness Rules Our Lives” – Leonard Mlodinow
  • “Thinking, Fast and Slow” – Daniel Kahneman
  • “Atheism, Atheology, and Secular Philosophy” – John Shook
  • “Representations of Finite Groups” – Hiroshi Nagao, Yukio Tsushima

An explanation

Though I don’t think starting my own blog necessitates an explanation, I feel compelled to explain myself and my goals for this blog. So here it goes…

When I was younger, I saw myself as being highly capable in both mathematics and writing. While I have proven my mathematical ability to a high enough degree to warrant getting into a PhD program (where I’m still going strong!), somewhere along the wa, I seem to have let my writing ability fall behind. I can think of a couple of possible explanations for this, both stemming from the same root problem. First of all, I do continue to write quite a bit. The problem is however, that almost all of my writing is mathematical writing. The problem here is that very little of this writing is creative or even involves words; mostly it is proof writing involving abstract symbolism. A separate – though intimately related problem – is that I have noticed this issue in my writing ability and shrugged it off with the assumption that it is unimportant since my future seems to be headed towards math research and education. This thinking is flawed for several reasons. First of all, given the competition involved in the job market, it is possible I may never become a research mathematician (a horrible but realistic possibility). Secondly, there is no reason to assume a mathematician could not – or should not – be a proficient writer.

So what am I doing here? Well, as explained above, it’s obvious I need practice writing since practice is the surely the best way to improve. I see a blog as a constructive way to practice as well as allowing possible critiques and criticisms from anyone I might show this blog to or anyone that might stumble in on their own volition. This seems preferred over a personal journal since it could be otherwise hard to gauge improvements and the idea that other people can read what I write down will hopefully force me to put greater effort into expressing my thoughts. I must admit that this is my foremost reason for this blog. At times I find myself having very detailed and thought-out opinions about various issues. It is in expressing my opinions in words that I often fall flat. This will be an exercise in becoming better at expressing my thoughts and opinions through words. Hopefully this will prove to be an effective strategy.

Lastly, I’d like to discuss briefly what I intend to include for general content of this blog. I am foremost a mathematics student so it’s very probable some of my work will appear here, though likely in a non-rigorous fashion. The appeal of mathematics to me is both in its abstraction and its definitiveness. There is room for creative thinking, but in most situations a proposition can explicitly be shown to be true or false. Outside of mathematics this becomes increasingly difficult, but logic and mathematical thinking can be applied quite often in everyday life. I am a self-proclaimed skeptic and hope to use this page to turn a skeptical eye on issues of the day that interest me. Rationality is a tool I use frequently in attempts to better myself, and I hope to explain what this might look like in ordinary situations we encounter every day.

First Post

Hello Internet World. This is not my first attempt at a blog. However, I am often highly critical of my own work and have previously (I think) purged the Internet of anything resembling a blog from me. There have been many times where I have contemplated starting up a blog again, and I think I now have enough projects going on that it is almost become a necessity for organizing my own thoughts. More info on what I plan to do with this blog will be posted once I get some things in order and get a feel for this site.